| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325 | Filename: 141-jit-sd-downloads.txtTitle: Download server descriptors on demandVersion: $Revision$Last-Modified: $Date$Author: Peter PalfraderCreated: 15-Jun-2008Status: Draft1. Overview  Downloading all server descriptors is the most expensive part  of bootstrapping a Tor client.  These server descriptors currently  amount to about 1.5 Megabytes of data, and this size will grow  linearly with network size.  Fetching all these server descriptors takes a long while for people  behind slow network connections.  It is also a considerable load on  our network of directory mirrors.  This document describes proposed changes to the Tor network and  directory protocol so that clients will no longer need to download  all server descriptors.  These changes consist of moving load balancing information into  network status documents, implementing a means to download server  descriptors on demand in an anonymity-preserving way, and dealing  with exit node selection.2. What is in a server descriptor  When a Tor client starts the first thing it will try to get is a  current network status document: a consensus signed by a majority  of directory authorities.  This document is currently about 100  Kilobytes in size, tho it will grow linearly with network size.  This document lists all servers currently running on the network.  The Tor client will then try to get a server descriptor for each  of the running servers.  All server descriptors currently amount  to about 1.5 Megabytes of downloads.  A Tor client learns several things about a server from its descriptor.  Some of these it already learned from the network status document  published by the authorities, but the server descriptor contains it  again in a single statement signed by the server itself, not just by  the directory authorities.  Tor clients use the information from server descriptors for  different purposes, which are considered in the following sections.  #three ways:  One, to determine if a server will be able to handle  #this client's request; two, to actually communicate or use the server;  #three, for load balancing decisions.  #  #These three points are considered in the following subsections.2.1 Load balancing  The Tor load balancing mechanism is quite complex in its details, but  it has a simple goal: The more traffic a server can handle the more  traffic it should get.  That means the more traffic a server can  handle the more likely a client will use it.  For this purpose each server descriptor has bandwidth information  which tries to convey a server's capacity to clients.  Currently we weigh servers differently for different purposes.  There  is a weigh for when we use a server as a guard node (our entry to the  Tor network), there is one weigh we assign servers for exit duties,  and a third for when we need intermediate (middle) nodes.2.2 Exit information  When a Tor wants to exit to some resource on the internet it will  build a circuit to an exit node that allows access to that resource's  IP address and TCP Port.  When building that circuit the client can make sure that the circuit  ends at a server that will be able to fulfill the request because the  client already learned of all the servers' exit policies from their  descriptors.2.3 Capability information  Server descriptors contain information about the specific version or  the Tor protocol they understand [proposal 105].  Furthermore the server descriptor also contains the exact version of  the Tor software that the server is running and some decisions are  made based on the server version number (for instance a Tor client  will only make conditional consensus requests [proposal 139] when  talking to Tor servers version 0.2.1.1-alpha or later).2.4 Contact/key information  A server descriptor lists a server's IP address and TCP ports on which  it accepts onion and directory connections.  Furthermore it contains  the onion key (a short lived RSA key to which clients encrypt CREATE  cells).2.5 Identity information  A Tor client learns the digest of a server's key from the network  status document.  Once it has a server descriptor this descriptor  contains the full RSA identity key of the server.  Clients verify  that 1) the digest of the identity key matches the expected digest  it got from the consensus, and 2) that the signature on the descriptor  from that key is valid.3. No longer require clients to have copies of all SDs3.1 Load balancing info in consensus documents  One of the reasons why clients download all server descriptors is for  doing load proper load balancing as described in 2.1.  In order for  clients to not require all server descriptors this information will  have to move into the network status document.  Consensus documents will have a new line per router similar  to the "r", "s", and "v" lines that already exist.  This line  will convey weight information to clients.   "w Bandwidth=193"  The bandwidth number is the lesser of observed bandwidth and bandwidth  rate limit from the server descriptor that the "r" line referenced by  digest (1st and 3rd field of the bandwidth line in the descriptor).  It is given in kilobytes per second so the byte value in the  descriptor has to be divided by 1024 (and is then truncated, i.e.  rounded down).  Authorities will cap the bandwidth number at some arbitrary value,  currently 10MB/sec.  If a router claims a larger bandwidth an  authority's vote will still only show Bandwidth=10240.  The consensus value for bandwidth is the median of all bandwidth  numbers given in votes.  In case of an even number of votes we use  the lower median.  (Using this procedure allows us to change the  cap value more easily.)  Clients should believe the bandwidth as presented in the consensus,  not capping it again.3.2 Fetching descriptors on demand  As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,  and the onion key for a server.  A client already knows the IP address and the ports from the consensus  documents, but without the onion key it will not be able to send  CREATE/EXTEND cells for that server.  Since the client needs the onion  key it needs the descriptor.  If a client only downloaded a few descriptors in an observable manner  then that would leak which nodes it was going to use.  This proposal suggests the following:  1) when connecting to a guard node for which the client does not     yet have a cached descriptor it requests the descriptor it     expects by hash.  (The consensus document that the client holds     has a hash for the descriptor of this server.  We want exactly     that descriptor, not a different one.)     It does that by sending a RELAY_REQUEST_SD cell.     A client MAY cache the descriptor of the guard node so that it does     not need to request it every single time it contacts the guard.  2) when a client wants to extend a circuit that currently ends in     server B to a new next server C, the client will send a     RELAY_REQUEST_SD cell to server B.  This cell contains in its     payload the hash of a server descriptor the client would like     to obtain (C's server descriptor).  The server sends back the     descriptor and the client can now form a valid EXTEND/CREATE cell     encrypted to C's onion key.     Clients MUST NOT cache such descriptors.  If they did they might     leak that they already extended to that server at least once     before.  Replies to RELAY_REQUEST_SD requests need to be padded to some  constant upper limit in order to conceal a client's destination  from anybody who might be counting cells/bytes.  RELAY_REQUEST_SD cells contain the following information:    - hash of the server descriptor requested    - hash of the identity digest of the server for which we want the SD    - IP address and OR-port or the server for which we want the SD    - padding factor - the number of cells we want the answer      padded to.      [XXX this just occured to me and it might be smart.  or it might       be stupid.  clients would learn the padding factor they want       to use from the consensus document.  This allows us to grow       the replies later on should SDs become larger.]  [XXX: figure out a decent padding size]3.3 Protocol versions  Server descriptors contain optional information of supported  link-level and circuit-level protocols in the form of  "opt protocols Link 1 2 Circuit 1".  These are not currently needed  and will probably eventually move into the "v" (version) line in  the consensus.  This proposal does not deal with them.  Similarly a server descriptor contains the version number of  a Tor node.  This information is already present in the consensus  and is thus available to all clients immediately.3.4 Exit selection  Currently finding an appropriate exit node for a user's request is  easy for a client because it has complete knowledge of all the exit  policies of all servers on the network.  The consensus document will once again be extended to contain the  information required by clients.  This information will be a summary  of each node's exit policy.  The exit policy summary will only contain  the list of ports to which a node exits to most destination IP  addresses.  A summary should claim a router exits to a specific TCP port if,  ignoring private IP addresses, the exit policy indicates that the  router would exit to this port to most IP address.  either two /8  netblocks, or one /8 and a couple of /12s or any other combination).  The exact algorith used is this:  Going through all exit policy items   - ignore any accept that is not for all IP addresses ("*"),   - ignore rejects for these netblocks (exactly, no subnetting):     0.0.0.0/8, 169.254.0.0/16, 127.0.0.0/8, 192.168.0.0/16, 10.0.0.0/8,     and 172.16.0.0/12m   - for each reject count the number of IP addresses rejected against     the affected ports,   - once we hit an accept for all IP addresses ("*") add the ports in     that policy item to the list of accepted ports, if they don't have     more than 2^25 IP addresses (that's two /8 networks) counted     against them (i.e. if the router exits to a port to everywhere but     at most two /8 networks).  An exit policy summary will be included in votes and consensus as a  new line attached to each exit node.  The line will have the format   "p" <space> "accept"|"reject" <portlist>  where portlist is a comma seperated list of single port numbers or  portranges (e.g.  "22,80-88,1024-6000,6667").  Whether the summary shows the list of accepted ports or the list of  rejected ports depends on which list is shorter (has a shorter string  representation).  In case of ties we choose the list of accepted  ports.  As an exception to this rule an allow-all policy is  represented as "accept 1-65535" instead of "reject " and a reject-all  policy is similarly given as "reject 1-65535".  Summary items are compressed, that is instead of "80-88,89-100" there  only is a single item of "80-100", similarly instead of "20,21" a  summary will say "20-21".  Port lists are sorted in ascending order.  The maximum allowed length of a policy summary (including the "accept "  or "reject ") is 1000 characters.  If a summary exceeds that length we  use an accept-style summary and list as much of the port list as is  possible within these 1000 bytes.3.4.1 Consensus selection  When building a consensus, authorities have to agree on a digest of  the server descriptor to list in the router line for each router.  This is documented in dir-spec section 3.4.  All authorities that listed that agreed upon descriptor digest in  their vote should also list the same exit policy summary - or list  none at all if the authority has not been upgraded to list that  information in their vote.  If we have votes with matching server descriptor digest of which at  least one of them has an exit policy then we differ between two cases:   a) all authorities agree (or abstained) on the policy summary, and we      use the exit policy summary that they all listed in their vote,   b) something went wrong (or some authority is playing foul) and we      have different policy summaries.  In that case we pick the one      that is most commonly listed in votes with the matching      descriptor.  We break ties in favour of the lexigraphically larger      vote.  If none one of the votes with a matching server descriptor digest has  an exit policy summary we use the most commonly listed one in all  votes, breaking ties like in case b above.3.4.2 Client behaviour  When choosing an exit node for a specific request a Tor client will  choose from the list of nodes that exit to the requested port as given  by the consensus document.  If a client has additional knowledge (like  cached full descriptors) that indicates the so chosen exit node will  reject the request then it MAY use that knowledge (or not include such  nodes in the selection to begin with).  However, clients MUST NOT use  nodes that do not list the port as accepted in the summary (but for  which they know that the node would exit to that address from other  sources, like a cached descriptor).  An exception to this is exit enclave behaviour: A client MAY use the  node at a specific IP address to exit to any port on the same address  even if that node is not listed as exiting to the port in the summary.4. Migration4.1 Consensus document changes.  The consensus will need to include    - bandwidth information (see 3.1)    - exit policy summaries (3.4)  A new consensus method (number TBD) will be chosen for this.5. Future possibilities  This proposal still requires that all servers have the descriptors of  every other node in the network in order to answer RELAY_REQUEST_SD  cells.  These cells are sent when a circuit is extended from ending at  node B to a new node C.  In that case B would have to answer a  RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).  In order to answer that request B obviously needs a copy of C's server  descriptor.  The RELAY_REQUEST_SD cell already has all the info that  B needs to contact C so it can ask about the descriptor before passing it  back to the client.
 |