dir-spec.txt 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260
  1. $Id$
  2. Tor network discovery protocol
  3. 0. Scope
  4. This document proposes a way of doing more distributed network discovery
  5. while maintaining some amount of admission control. We don't recommend
  6. you implement this as-is; it needs more discussion.
  7. Terminology:
  8. - Client: The Tor component that chooses paths.
  9. - Server: A relay node that passes traffic along.
  10. 1. Goals.
  11. We want more decentralized discovery for network topology and status.
  12. In particular:
  13. 1a. We want to let clients learn about new servers from anywhere
  14. and build circuits through them if they wish. This means that
  15. Tor nodes need to be able to Extend to nodes they don't already
  16. know about.
  17. 1b. We want to let servers limit the addresses and ports they're
  18. willing to extend to. This is necessary e.g. for middleman nodes
  19. who have jerks trying to extend from them to badmafia.com:80 all
  20. day long and it's drawing attention.
  21. 1b'. While we're at it, we also want to handle servers that *can't*
  22. extend to some addresses/ports, e.g. because they're behind NAT or
  23. otherwise firewalled. (See section 5 below.)
  24. 1c. We want to provide a robust (available) and not-too-centralized
  25. mechanism for tracking network status (which nodes are up and working)
  26. and admission (which nodes are "recommended" for certain uses).
  27. 2. Assumptions.
  28. 2a. People get the code from us, and they trust us (or our gpg keys, or
  29. something down the trust chain that's equivalent).
  30. 2b. Even if the software allows humans to change the client configuration,
  31. most of them will use the default that's provided. so we should
  32. provide one that is the right balance of robust and safe. That is,
  33. we need to hard-code enough "first introduction" locations that new
  34. clients will always have an available way to get connected.
  35. 2c. Assume that the current "ask them to email us and see if it seems
  36. suspiciously related to previous emails" approach will not catch
  37. the strong Sybil attackers. Therefore, assume the Sybil attackers
  38. we do want to defend against can produce only a limited number of
  39. not-obviously-on-the-same-subnet nodes.
  40. 2d. Roger has only a limited amount of time for approving nodes; shouldn't
  41. be the time bottleneck anyway; and is doing a poor job at keeping
  42. out some adversaries.
  43. 2e. Some people would be willing to offer servers but will be put off
  44. by the need to send us mail and identify themselves.
  45. 2e'. Some evil people will avoid doing evil things based on the perception
  46. (however true or false) that there are humans monitoring the network
  47. and discouraging evil behavior.
  48. 2e''. Some people will trust the network, and the code, more if they
  49. have the perception that there are trustworthy humans guiding the
  50. deployed network.
  51. 2f. We can trust servers to accurately report their characteristics
  52. (uptime, capacity, exit policies, etc), as long as we have some
  53. mechanism for notifying clients when we notice that they're lying.
  54. 2g. There exists a "main" core Internet in which most locations can access
  55. most locations. We'll focus on it (first).
  56. 3. Some notes on how to achieve.
  57. Piece one: (required)
  58. We ship with N (e.g. 20) directory server locations and fingerprints.
  59. Directory servers serve signed network-status pages, listing their
  60. opinions of network status and which routers are good (see 4a below).
  61. Dirservers collect and provide server descriptors as well. These don't
  62. need to be signed by the dirservers, since they're self-certifying
  63. and timestamped.
  64. (In theory the dirservers don't need to be the ones serving the
  65. descriptors, but in practice the dirservers would need to point people
  66. at the place that does, so for simplicity let's assume that they do.)
  67. Clients then get network-status pages from a threshold of dirservers,
  68. fetch enough of the corresponding server descriptors to make them happy,
  69. and proceed as now.
  70. Piece two: (optional)
  71. We ship with S (e.g. 3) seed keys (trust anchors), and ship with
  72. signed timestamped certs for each dirserver. Dirservers also serve a
  73. list of certs, maybe including a "publish all certs since time foo"
  74. functionality. If at least two seeds agree about something, then it
  75. is so.
  76. Now dirservers can be added, and revoked, without requiring users to
  77. upgrade to a new version. If we only ship with dirserver locations
  78. and not fingerprints, it also means that dirservers can rotate their
  79. signing keys transparently.
  80. But, keeping track of the seed keys becomes a critical security issue.
  81. And rotating them in a backward-compatible way adds complexity. Also,
  82. dirserver locations must be at least somewhere static, since each lost
  83. dirserver degrades reachability for old clients. So as the dirserver
  84. list rolls over we have no choice but to put out new versions.
  85. Piece three: (optional)
  86. Notice that this doesn't preclude other approaches to discovering
  87. different concurrent Tor networks. For example, a Tor network inside
  88. China could ship Tor with a different torrc and poof, they're using
  89. a different set of dirservers. Some smarter clients could be made to
  90. learn about both networks, and be told which nodes bridge the networks.
  91. ...
  92. 4. Unresolved issues.
  93. 4a. How do the dirservers decide whether to recommend a server? We
  94. could have them do it based on contact from the human, but by
  95. assumptions 2c and 2d above, that's going to be less effective, and
  96. more of a hassle, as we scale up. Thus I propose that they simply
  97. do some basic automatic measuring themselves, starting with the
  98. current "are they connected to me" measurement, and that's all
  99. that is done.
  100. We could blacklist as we notice evil servers, but then we're in
  101. the same boat all the irc networks are in. We could whitelist as we
  102. notice new servers, and stop whitelisting (maybe rolling back a bit)
  103. once an attack is in progress. If we assume humans aren't particularly
  104. good at this anyway, we could just do automated delayed whitelisting,
  105. and have a "you're under attack" switch the human can enable for a
  106. while to start acting more conservatively.
  107. Once upon a time we collected contact info for servers, which was
  108. mainly used to remind people that their servers are down and could
  109. they please restart. Now that we have a critical mass of servers,
  110. I've stopped doing that reminding. So contact info is less important.
  111. 4b. What do we do about recommended-versions? Do we need a threshold of
  112. dirservers to claim that your version is obsolete before you believe
  113. them? Or do we make it have less effect -- e.g. print a warning but
  114. never actually quit? Coordinating all the humans to upgrade their
  115. recommended-version strings at once seems bad. Maybe if we have
  116. seeds, the seeds can sign a recommended-version and upload it to
  117. the dirservers.
  118. 4c. What does it mean to bind a nickname to a key? What if each dirserver
  119. does it differently, so one nickname corresponds to several keys?
  120. Maybe the solution is that nickname<=>key bindings should be
  121. individually configured by clients in their torrc (if they want to
  122. refer to nicknames in their torrc), and we stop thinking of nicknames
  123. as globally unique.
  124. 4d. What new features need to be added to server descriptors so they
  125. remain compact yet support new functionality? Section 5 is a start
  126. of discussion of one answer to this.
  127. 5. Regarding "Blossom: an unstructured overlay network for end-to-end
  128. connectivity."
  129. In this section we address possible solutions to the problem of how to allow
  130. Tor routers in different transport domains to communicate.
  131. [Can we have a one-sentence definition of transport domain here? If there
  132. are 5 servers on the Internet as we know it and suddenly one link between
  133. a pair of them catches fire, how many transport domains are involved now?
  134. What if one link is down permanently but the rest work? Is "in the same
  135. transport domain as" a symmetric property?]
  136. First, we presume that for every interface between transport domains A and B,
  137. one Tor router T_A exists in transport domain A, one Tor router T_B exists in
  138. transport domain B, and (without loss of generality) T_A can open a persistent
  139. connection to T_B. Any Tor traffic between the two routers will occur over
  140. this connection, which effectively renders the routers equal partners in
  141. bridging between the two transport domains. We refer to the established link
  142. between two transport domains as a "bridge" (we use this term because there is
  143. no serious possibility of confusion with the notion of a layer 2 bridge).
  144. Next, suppose that the universe consists of transport domains connected by
  145. persistent connections in this manner. An individual router can open multiple
  146. connections to routers within the same foreign transport domain, and it can
  147. establish separate connections to routers within multiple foreign transport
  148. domains.
  149. As in regular Tor, each Blossom router pushes its descriptor to directory
  150. servers. These directory servers can be within the same transport domain, but
  151. they need not be. The trick is that if a directory server is in another
  152. transport domain, then that directory server must know through which Tor
  153. routers to send messages destined for the Tor router in question.
  154. [We are assuming that routers in the non-primary transport domain (the
  155. primary one being the one with dirservers) know how to get to the primary
  156. transport domain, either through Tor or other voodoo, to publish to the
  157. hard-coded dirservers.]
  158. Descriptors
  159. for Blossom routers held by the directory server must contain a special field
  160. for specifying a path through the overlay (i.e. an ordered list of router
  161. names/IDs) to a router in a foreign transport domain. (This field may be a set
  162. of paths rather than a single path.) A new router publishing to a directory
  163. server in a foreign transport should include a list of routers. This list
  164. should be either:
  165. a. ...a list of routers to which the router has persistent connections, or, if
  166. the new router does not have any persistent connections,
  167. b. ...a (not necessarily exhaustive) list of fellow routers that are in the
  168. same transport domain.
  169. The directory server will be able to use this information to derive a path to
  170. the new router, as follows. If the new router used approach (a), then the
  171. directory server will define the same path(s) in the descriptors for the
  172. router(s) specified in the list, with the corresponding specified router
  173. appended to each path. If the new router used approach (b), then the directory
  174. server will define the same path(s) in the descriptors for the routers
  175. specified in the list. The directory server will then insert the newly defined
  176. path into the descriptor from the router.
  177. [Dirservers can't modify server descriptors; they're self-certifying. -RD]
  178. If all directory servers are within the same transport domain, then the problem
  179. is solved: routers can exist within multiple transport domains, and as long as
  180. the network of transport domains is fully connected by bridges, any router will
  181. be able to access any other router in a foreign transport domain simply by
  182. extending along the path specified by the directory server. However, we want
  183. the system to be truly decentralized, which means not electing any particular
  184. transport domain to be the master domain in which entries are published.
  185. Generally speaking, directory servers share information with each other about
  186. routers. In order for a directory server to share information with a directory
  187. server in a foreign transport domain to which it cannot speak directly, it must
  188. use Tor, which means referring to the other directory server by using a router
  189. in the foreign transport domain. However, in order to use Tor, it must be able
  190. to reach that router, which means that a descriptor for that router must exist
  191. in its table, along with a means of reaching it. Therefore, in order for a
  192. mutual exchange of information between routers in transport domain A and those
  193. in transport domain B to be possible, when routers in transport domain A cannot
  194. establish direct connections with routers in transport domain B, then some
  195. router in transport domain B must have pushed its descriptor to a directory
  196. server in transport domain A, so that the directory server in transport domain
  197. A can use that router to reach the directory server in transport domain B.
  198. When confronted with the choice of multiple different paths to reach the same
  199. router, the Blossom nodes may use a route selection protocol similar in design
  200. to that used by BGP (may be a simple distance-vector route selection procedure
  201. that only takes into account path length, or may be more complex to avoid
  202. loops, cache results, etc.) in order to choose the best one.
  203. [How does this work with exit policies (how do we enumerate all resources
  204. in our transport domain?), and translating resources that we want to
  205. get to to servers that can reach them?]