141-jit-sd-downloads.txt 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277
  1. Filename: 141-jit-sd-downloads.txt
  2. Title: Download server descriptors on demand
  3. Version: $Revision$
  4. Last-Modified: $Date$
  5. Author: Peter Palfrader
  6. Created: 15-Jun-2008
  7. Status: Draft
  8. 1. Overview
  9. Downloading all server descriptors is the most expensive part
  10. of bootstrapping a Tor client. These server descriptors currently
  11. amount to about 1.5 Megabytes of data, and this size will grow
  12. linearly with network size.
  13. Fetching all these server descriptors takes a long while for people
  14. behind slow network connections. It is also a considerable load on
  15. our network of directory mirrors.
  16. This document describes proposed changes to the Tor network and
  17. directory protocol so that clients will no longer need to download
  18. all server descriptors.
  19. These changes consist of moving load balancing information into
  20. network status documents, implementing a means to download server
  21. descriptors on demand in an anonymity-preserving way, and dealing
  22. with exit node selection.
  23. 2. What is in a server descriptor
  24. When a Tor client starts the first thing it will try to get is a
  25. current network status document: a consensus signed by a majority
  26. of directory authorities. This document is currently about 100
  27. Kilobytes in size, tho it will grow linearly with network size.
  28. This document lists all servers currently running on the network.
  29. The Tor client will then try to get a server descriptor for each
  30. of the running servers. All server descriptors currently amount
  31. to about 1.5 Megabytes of downloads.
  32. A Tor client learns several things about a server from its descriptor.
  33. Some of these it already learned from the network status document
  34. published by the authorities, but the server descriptor contains it
  35. again in a single statement signed by the server itself, not just by
  36. the directory authorities.
  37. Tor clients use the information from server descriptors for
  38. different purposes, which are considered in the following sections.
  39. #three ways: One, to determine if a server will be able to handle
  40. #this client's request; two, to actually communicate or use the server;
  41. #three, for load balancing decisions.
  42. #
  43. #These three points are considered in the following subsections.
  44. 2.1 Load balancing
  45. The Tor load balancing mechanism is quite complex in its details, but
  46. it has a simple goal: The more traffic a server can handle the more
  47. traffic it should get. That means the more traffic a server can
  48. handle the more likely a client will use it.
  49. For this purpose each server descriptor has bandwidth information
  50. which tries to convey a server's capacity to clients.
  51. Currently we weigh servers differently for different purposes. There
  52. is a weigh for when we use a server as a guard node (our entry to the
  53. Tor network), there is one weigh we assign servers for exit duties,
  54. and a third for when we need intermediate (middle) nodes.
  55. 2.2 Exit information
  56. When a Tor wants to exit to some resource on the internet it will
  57. build a circuit to an exit node that allows access to that resource's
  58. IP address and TCP Port.
  59. When building that circuit the client can make sure that the circuit
  60. ends at a server that will be able to fulfill the request because the
  61. client already learned of all the servers' exit policies from their
  62. descriptors.
  63. 2.3 Capability information
  64. Server descriptors contain information about the specific version or
  65. the Tor protocol they understand [proposal 105].
  66. Furthermore the server descriptor also contains the exact version of
  67. the Tor software that the server is running and some decisions are
  68. made based on the server version number (for instance a Tor client
  69. will only make conditional consensus requests [proposal 139] when
  70. talking to Tor servers version 0.2.1.1-alpha or later).
  71. 2.4 Contact/key information
  72. A server descriptor lists a server's IP address and TCP ports on which
  73. it accepts onion and directory connections. Furthermore it contains
  74. the onion key (a short lived RSA key to which clients encrypt CREATE
  75. cells).
  76. 2.5 Identity information
  77. A Tor client learns the digest of a server's key from the network
  78. status document. Once it has a server descriptor this descriptor
  79. contains the full RSA identity key of the server. Clients verify
  80. that 1) the digest of the identity key matches the expected digest
  81. it got from the consensus, and 2) that the signature on the descriptor
  82. from that key is valid.
  83. 3. No longer require clients to have copies of all SDs
  84. 3.1 Load balancing info in consensus documents
  85. One of the reasons why clients download all server descriptors is for
  86. doing load proper load balancing as described in 2.1. In order for
  87. clients to not require all server descriptors this information will
  88. have to move into the network status document.
  89. Consensus documents will have a new line per router similar
  90. to the "r", "s", and "v" lines that already exist. This line
  91. will convey weight information to clients.
  92. "w Bandwidth=193671"
  93. The bandwidth number is the lesser of observed bandwidth and bandwidth
  94. rate limit from the server descriptor that the "r" line referenced by
  95. digest (1st and 3rd field of the bandwidth line in the descriptor).
  96. Authorities will cap the bandwidth number at some arbitrary value,
  97. currently 10MB/sec. If a router claims a larger bandwidth an
  98. authority's vote will still only show Bandwidth=10000000.
  99. The consensus value for bandwidth is the median of all bandwidth
  100. numbers given in votes. In case of an even number of votes we use
  101. the lower median. (Using this procedure allows us to change the
  102. cap value more easily.)
  103. Clients should believe the bandwidth as presented in the consensus,
  104. not capping it again.
  105. 3.2 Fetching descriptors on demand
  106. As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
  107. and the onion key for a server.
  108. A client already knows the IP address and the ports from the consensus
  109. documents, but without the onion key it will not be able to send
  110. CREATE/EXTEND cells for that server. Since the client needs the onion
  111. key it needs the descriptor.
  112. If a client only downloaded a few descriptors in an observable manner
  113. then that would leak which nodes it was going to use.
  114. This proposal suggests the following:
  115. 1) when connecting to a guard node for which the client does not
  116. yet have a cached descriptor it requests the descriptor it
  117. expects by hash. (The consensus document that the client holds
  118. has a hash for the descriptor of this server. We want exactly
  119. that descriptor, not a different one.)
  120. It does that by sending a RELAY_REQUEST_SD cell.
  121. A client MAY cache the descriptor of the guard node so that it does
  122. not need to request it every single time it contacts the guard.
  123. 2) when a client wants to extend a circuit that currently ends in
  124. server B to a new next server C, the client will send a
  125. RELAY_REQUEST_SD cell to server B. This cell contains in its
  126. payload the hash of a server descriptor the client would like
  127. to obtain (C's server descriptor). The server sends back the
  128. descriptor and the client can now form a valid EXTEND/CREATE cell
  129. encrypted to C's onion key.
  130. Clients MUST NOT cache such descriptors. If they did they might
  131. leak that they already extended to that server at least once
  132. before.
  133. Replies to RELAY_REQUEST_SD requests need to be padded to some
  134. constant upper limit in order to conceal a client's destination
  135. from anybody who might be counting cells/bytes.
  136. RELAY_REQUEST_SD cells contain the following information:
  137. - hash of the server descriptor requested
  138. - hash of the identity digest of the server for which we want the SD
  139. - IP address and OR-port or the server for which we want the SD
  140. - padding factor - the number of cells we want the answer
  141. padded to.
  142. [XXX this just occured to me and it might be smart. or it might
  143. be stupid. clients would learn the padding factor they want
  144. to use from the consensus document. This allows us to grow
  145. the replies later on should SDs become larger.]
  146. [XXX: figure out a decent padding size]
  147. 3.3 Protocol versions
  148. Server descriptors contain optional information of supported
  149. link-level and circuit-level protocols in the form of
  150. "opt protocols Link 1 2 Circuit 1". These are not currently needed
  151. and will probably eventually move into the "v" (version) line in
  152. the consensus. This proposal does not deal with them.
  153. Similarly a server descriptor contains the version number of
  154. a Tor node. This information is already present in the consensus
  155. and is thus available to all clients immediately.
  156. 3.4 Exit selection
  157. Currently finding an appropriate exit node for a user's request is
  158. easy for a client because it has complete knowledge of all the exit
  159. policies of all servers on the network.
  160. The consensus document will once again be extended to contain the
  161. information required by clients. This information will be a summary
  162. of each node's exit policy. The exit policy summary will only contain
  163. the list of ports to which a node exits to most destination IP
  164. addresses.
  165. A summary should claim a router exits to a specific TCP port if,
  166. ignoring private IP addresses (link and site local per RFC3300), the
  167. exit policy indicates that the router would exit to this port to any
  168. IP address with the exception of at most 2^25 single addresses (That's
  169. either two /8 netblocks, or one /8 and a couple of /12s or any other
  170. combination).
  171. An exit policy summary will be included in votes and consensus as a
  172. new line attached to each exit node. A lack of policy should indicate
  173. a non-exit policy. The line will have the format
  174. "p" <space> "accept"|"reject" <portlist>
  175. where portlist is a comma seperated list of single port numbers or
  176. portranges (e.g. "22,80-88,1024-6000,6667"). Whether the summary
  177. shows the list of accepted ports or the list of rejected ports depends
  178. on which list is shorter (has a shorter string representation). In case of
  179. ties we choose the list of accepted ports.
  180. Similarly to IP address, ports, and timestamp a consensus should list
  181. the exit policy matching the descriptor digest referenced in the
  182. consensus document (See dir-spec section 3.4).
  183. 3.4.1 Client behaviour
  184. When choosing an exit node for a specific request a Tor client will
  185. choose from the list of nodes that exit to the requested port as given
  186. by the consensus document. If a client has additional knowledge (like
  187. cached full descriptors) that indicates the so chosen exit node will
  188. reject the request then it MAY use that knowledge (or not include such
  189. nodes in the selection to begin with). However, clients MUST NOT use
  190. nodes that do not list the port as accepted in the summary (but for
  191. which they know that the node would exit to that address from other
  192. sources, like a cached descriptor).
  193. An exception to this is exit enclave behaviour: A client MAY use the
  194. node at a specific IP address to exit to any port on the same address
  195. even if that node is not listed as exiting to the port in the summary.
  196. 4. Migration
  197. 4.1 Consensus document changes.
  198. The consensus will need to include
  199. - bandwidth information (see 3.1)
  200. - exit policy summaries (3.4)
  201. A new consensus method (number TBD) will be chosen for this.
  202. 5. Future possibilities
  203. This proposal still requires that all servers have the descriptors of
  204. every other node in the network in order to answer RELAY_REQUEST_SD
  205. cells. These cells are sent when a circuit is extended from ending at
  206. node B to a new node C. In that case B would have to answer a
  207. RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
  208. In order to answer that request B obviously needs a copy of C's server
  209. descriptor. The RELAY_REQUEST_SD cell already has all the info that
  210. B needs to contact C so it can ask about the descriptor before passing it
  211. back to the client.