141-jit-sd-downloads.txt 9.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219
  1. Filename: 141-jit-sd-downloads.txt
  2. Title: Download server descriptors on demand
  3. Version: $Revision$
  4. Last-Modified: $Date$
  5. Author: Peter Palfrader
  6. Created: 15-Jun-2008
  7. Status: Draft
  8. 1. Overview
  9. Downloading all server descriptors is the most expensive part
  10. of bootstrapping a Tor client. These server descriptors currently
  11. amount to about 1.5 Megabytes of data, and this size will grow
  12. linearly with network size.
  13. Fetching all these server descriptors takes a long while for people
  14. behind slow network connections. It is also a considerable load on
  15. our network of directory mirrors.
  16. This document describes proposed changes to the Tor network and
  17. directory protocol so that clients will no longer need to download
  18. all server descriptors.
  19. These changes consist of moving load balancing information into
  20. network status documents, implementing a means to download server
  21. descriptors on demand in an anonymity-preserving way, and dealing
  22. with exit node selection.
  23. 2. What is in a server descriptor
  24. When a Tor client starts the first thing it will try to get is a
  25. current network status document, a consensus signed by a majority
  26. of directory authorities. This document is currently about 100
  27. Kilobytes in size, tho it will grow linearly with network size.
  28. This document lists all servers currently running on the network.
  29. The Tor client will then try to get a server descriptor for each
  30. of the running servers. All server descriptors currently amount
  31. to about 1.5 Metabytes of downloads.
  32. A Tor client learns several things about a server from its descriptor.
  33. Some of these it already learned from the network status document
  34. published by the authorities, but the server descriptor contains it
  35. again in a single statement signed by the server itself, not just by
  36. the directory authorities.
  37. Tor clients use the information from server descriptors for
  38. different purposes, which are considered in the following sections.
  39. #three ways: One, to determine if a server will be able to handle
  40. #this client's request; two, to actually communicate or use the server;
  41. #three, for load balancing decisions.
  42. #
  43. #These three points are considered in the following subsections.
  44. 2.1 Load balancing
  45. The Tor load balancing mechanism is quite complex in its details, but
  46. it has a simple goal: The more traffic a server can handle the more
  47. traffic it should get. That means the more traffic a server can
  48. handle the more likely a client will use it.
  49. For this purpose each server descriptor has bandwidth information
  50. which tries to convey a server's capacity to clients.
  51. Currently we weigh servers differently for different purposes. There
  52. is a weigh for when we use a server as a guard node (our entry to the
  53. Tor network), there is one weigh we assign servers for exit duties,
  54. and a third for when we need intermediate (middle) nodes.
  55. 2.2 Exit information
  56. When a Tor wants to exit to some resource on the internet it will
  57. build a circuit to an exit node that allows access to that resource's
  58. IP address and TCP Port.
  59. When building that circuit the client can make sure that the circuit
  60. ends at a server that will be able to fulfill the request because the
  61. client already learned of all the servers' exit policies from their
  62. descriptors.
  63. 2.3 Capability information
  64. Server descriptors contain information about the specific version or
  65. the Tor protocol they understand [proposal 105].
  66. Furthermore the server descriptor also contains the exact version of
  67. the Tor software that the server is running and some decisions are
  68. made based on the server version number (for instance a Tor client
  69. will only make conditional consensus requests [proposal from 13 Apr
  70. 2008 that never got a number] when talking to Tor servers version
  71. 0.2.1.1-alpha or later).
  72. 2.4 Contact/key information
  73. A server descriptor lists a server's IP address and TCP ports on which
  74. it accepts onion and directory connections. Furthermore it contains
  75. the onion key, a short lived RSA key to which clients encrypt CREATE
  76. cells.
  77. 2.5 Identity information
  78. A Tor client learns the digest of a server's key from the network
  79. status document. Once it has a server descriptor this descriptor
  80. contains the full RSA identity key of the server. Clients verify
  81. that 1) the digest of the identity key matches the expected digest
  82. it got from the consensus, and 2) that the signature on the descriptor
  83. from that key is valid.
  84. 3. Doing away with the need for all SDs
  85. 3.1 Load balancing info in consensus documents
  86. One of the reasons why clients download all server descriptors is for
  87. doing load proper load balancing as described in 2.1. In order for
  88. clients to not require all server descriptors this information will
  89. have to move into the network status document.
  90. [XXX Two open questions here:
  91. a) how do we arrive at a consensus weight?
  92. b) how to represent weights in the consensus?
  93. Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
  94. ]
  95. 3.2 Fetching descriptors on demand
  96. As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
  97. and the onion key for a server.
  98. A client already knows the IP address and the ports from the consensus
  99. documents, but without the onion key it will not be able to send
  100. CREATE/EXTEND cells for that server. Since the client needs the onion
  101. key it needs the descriptor.
  102. If a client only downloaded a few descriptors in an observable manner
  103. then that would leak which nodes it was going to use.
  104. This proposal suggests the following:
  105. 1) when connecting to a guard node for which the client does not
  106. yet have a cached descriptor it requests the descriptor it
  107. expects by hash. (The consensus document that the client holds
  108. has a hash for the descriptor of this server. We want exactly
  109. that descriptor, not a different one.)
  110. [XXX: How? We could either come up with a new cell type,
  111. RELAY_REQUEST_SD that takes only a hash (of the SD), or use
  112. RELAY_BEGIN_DIR. The former is probably smarter since we will
  113. want to use it later on as well, and there we will require
  114. padding.]
  115. A client MAY cache the descriptor of the guard node so that it does
  116. not need to request it every single time it contacts the guard.
  117. 2) when a client wants to extend a circuit that currently ends in
  118. server B to a new next server C, the client will send a
  119. RELAY_REQUEST_SD cell to server B. This cell contains in its
  120. payload the hash of a server descriptor the client would like
  121. to obtain (C's server descriptor). The server sends back the
  122. descriptor and the client can now form a valid EXTEND/CREATE cell
  123. encrypted to C's onion key.
  124. Clients MUST NOT cache such descriptors. If they did they might
  125. leak that they already extended to that server at least once
  126. before.
  127. Replies to RELAY_REQUEST_SD requests need to be padded to some
  128. constant upper limit in order to conceal a client's destination
  129. from anybody who might be counting cells/bytes.
  130. [XXX: detailed spec of RELAY_REQUEST_SD cell and its reply]
  131. [XXX: figure out a decent padding size]
  132. 3.3 Protocol versions
  133. [XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
  134. information described in 2.3 above. If we need it, it might have
  135. to go into the consensus document.]
  136. [XXX: Similarly find out where we need the version number of a
  137. remote tor server. This information is in the consensus, but
  138. maybe we use it in some place where having it signed by the
  139. server in question is really important?]
  140. 3.4 Exit selection
  141. Currently finding an appropriate exit node for a user's request is
  142. easy for a client because it has complete knowledge of all the exit
  143. policies of all servers on the network.
  144. [XXX: I have no finished ideas here yet.
  145. - if clients only rely on the current exit flag they will
  146. a) never use servers for exit purposes that don't have it,
  147. b) will have a hard time finding a suitable exit node for
  148. their weird port that only a few servers allow.
  149. - the authorities could create a new summary document that
  150. lists all the exit policies and their nodes (by fingerprint).
  151. I need to find out how large that document would be.
  152. - can we make the "Exit" flag more useful? can we come
  153. up with some "standard policies" and have operators pick
  154. one of the standards?
  155. ]
  156. 4. Future possibilities
  157. This proposal still requires that all servers have the descriptors of
  158. every other node in the network in order to answer RELAY_REQUEST_SD
  159. cells. These cells are sent when a circuit is extended from ending at
  160. node B to a new node C. In that case B would have to answer a
  161. RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
  162. In order to answer that request B obviously needs a copy of C's server
  163. descriptor. In the future we might amend RELAY_REQUEST_SD cells to
  164. contain also the expected IP address and OR-port of the server C (the
  165. client learns them from the network status document), so that B no
  166. longer needs to know all the descriptors of the entire network but
  167. instead can simply go and ask C for its descriptor before passing it
  168. back to the client.