123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325 |
- Filename: 141-jit-sd-downloads.txt
- Title: Download server descriptors on demand
- Version: $Revision$
- Last-Modified: $Date$
- Author: Peter Palfrader
- Created: 15-Jun-2008
- Status: Draft
- 1. Overview
- Downloading all server descriptors is the most expensive part
- of bootstrapping a Tor client. These server descriptors currently
- amount to about 1.5 Megabytes of data, and this size will grow
- linearly with network size.
- Fetching all these server descriptors takes a long while for people
- behind slow network connections. It is also a considerable load on
- our network of directory mirrors.
- This document describes proposed changes to the Tor network and
- directory protocol so that clients will no longer need to download
- all server descriptors.
- These changes consist of moving load balancing information into
- network status documents, implementing a means to download server
- descriptors on demand in an anonymity-preserving way, and dealing
- with exit node selection.
- 2. What is in a server descriptor
- When a Tor client starts the first thing it will try to get is a
- current network status document: a consensus signed by a majority
- of directory authorities. This document is currently about 100
- Kilobytes in size, tho it will grow linearly with network size.
- This document lists all servers currently running on the network.
- The Tor client will then try to get a server descriptor for each
- of the running servers. All server descriptors currently amount
- to about 1.5 Megabytes of downloads.
- A Tor client learns several things about a server from its descriptor.
- Some of these it already learned from the network status document
- published by the authorities, but the server descriptor contains it
- again in a single statement signed by the server itself, not just by
- the directory authorities.
- Tor clients use the information from server descriptors for
- different purposes, which are considered in the following sections.
- #three ways: One, to determine if a server will be able to handle
- #this client's request; two, to actually communicate or use the server;
- #three, for load balancing decisions.
- #
- #These three points are considered in the following subsections.
- 2.1 Load balancing
- The Tor load balancing mechanism is quite complex in its details, but
- it has a simple goal: The more traffic a server can handle the more
- traffic it should get. That means the more traffic a server can
- handle the more likely a client will use it.
- For this purpose each server descriptor has bandwidth information
- which tries to convey a server's capacity to clients.
- Currently we weigh servers differently for different purposes. There
- is a weigh for when we use a server as a guard node (our entry to the
- Tor network), there is one weigh we assign servers for exit duties,
- and a third for when we need intermediate (middle) nodes.
- 2.2 Exit information
- When a Tor wants to exit to some resource on the internet it will
- build a circuit to an exit node that allows access to that resource's
- IP address and TCP Port.
- When building that circuit the client can make sure that the circuit
- ends at a server that will be able to fulfill the request because the
- client already learned of all the servers' exit policies from their
- descriptors.
- 2.3 Capability information
- Server descriptors contain information about the specific version or
- the Tor protocol they understand [proposal 105].
- Furthermore the server descriptor also contains the exact version of
- the Tor software that the server is running and some decisions are
- made based on the server version number (for instance a Tor client
- will only make conditional consensus requests [proposal 139] when
- talking to Tor servers version 0.2.1.1-alpha or later).
- 2.4 Contact/key information
- A server descriptor lists a server's IP address and TCP ports on which
- it accepts onion and directory connections. Furthermore it contains
- the onion key (a short lived RSA key to which clients encrypt CREATE
- cells).
- 2.5 Identity information
- A Tor client learns the digest of a server's key from the network
- status document. Once it has a server descriptor this descriptor
- contains the full RSA identity key of the server. Clients verify
- that 1) the digest of the identity key matches the expected digest
- it got from the consensus, and 2) that the signature on the descriptor
- from that key is valid.
- 3. No longer require clients to have copies of all SDs
- 3.1 Load balancing info in consensus documents
- One of the reasons why clients download all server descriptors is for
- doing load proper load balancing as described in 2.1. In order for
- clients to not require all server descriptors this information will
- have to move into the network status document.
- Consensus documents will have a new line per router similar
- to the "r", "s", and "v" lines that already exist. This line
- will convey weight information to clients.
- "w Bandwidth=193"
- The bandwidth number is the lesser of observed bandwidth and bandwidth
- rate limit from the server descriptor that the "r" line referenced by
- digest (1st and 3rd field of the bandwidth line in the descriptor).
- It is given in kilobytes per second so the byte value in the
- descriptor has to be divided by 1024 (and is then truncated, i.e.
- rounded down).
- Authorities will cap the bandwidth number at some arbitrary value,
- currently 10MB/sec. If a router claims a larger bandwidth an
- authority's vote will still only show Bandwidth=10240.
- The consensus value for bandwidth is the median of all bandwidth
- numbers given in votes. In case of an even number of votes we use
- the lower median. (Using this procedure allows us to change the
- cap value more easily.)
- Clients should believe the bandwidth as presented in the consensus,
- not capping it again.
- 3.2 Fetching descriptors on demand
- As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
- and the onion key for a server.
- A client already knows the IP address and the ports from the consensus
- documents, but without the onion key it will not be able to send
- CREATE/EXTEND cells for that server. Since the client needs the onion
- key it needs the descriptor.
- If a client only downloaded a few descriptors in an observable manner
- then that would leak which nodes it was going to use.
- This proposal suggests the following:
- 1) when connecting to a guard node for which the client does not
- yet have a cached descriptor it requests the descriptor it
- expects by hash. (The consensus document that the client holds
- has a hash for the descriptor of this server. We want exactly
- that descriptor, not a different one.)
- It does that by sending a RELAY_REQUEST_SD cell.
- A client MAY cache the descriptor of the guard node so that it does
- not need to request it every single time it contacts the guard.
- 2) when a client wants to extend a circuit that currently ends in
- server B to a new next server C, the client will send a
- RELAY_REQUEST_SD cell to server B. This cell contains in its
- payload the hash of a server descriptor the client would like
- to obtain (C's server descriptor). The server sends back the
- descriptor and the client can now form a valid EXTEND/CREATE cell
- encrypted to C's onion key.
- Clients MUST NOT cache such descriptors. If they did they might
- leak that they already extended to that server at least once
- before.
- Replies to RELAY_REQUEST_SD requests need to be padded to some
- constant upper limit in order to conceal a client's destination
- from anybody who might be counting cells/bytes.
- RELAY_REQUEST_SD cells contain the following information:
- - hash of the server descriptor requested
- - hash of the identity digest of the server for which we want the SD
- - IP address and OR-port or the server for which we want the SD
- - padding factor - the number of cells we want the answer
- padded to.
- [XXX this just occured to me and it might be smart. or it might
- be stupid. clients would learn the padding factor they want
- to use from the consensus document. This allows us to grow
- the replies later on should SDs become larger.]
- [XXX: figure out a decent padding size]
- 3.3 Protocol versions
- Server descriptors contain optional information of supported
- link-level and circuit-level protocols in the form of
- "opt protocols Link 1 2 Circuit 1". These are not currently needed
- and will probably eventually move into the "v" (version) line in
- the consensus. This proposal does not deal with them.
- Similarly a server descriptor contains the version number of
- a Tor node. This information is already present in the consensus
- and is thus available to all clients immediately.
- 3.4 Exit selection
- Currently finding an appropriate exit node for a user's request is
- easy for a client because it has complete knowledge of all the exit
- policies of all servers on the network.
- The consensus document will once again be extended to contain the
- information required by clients. This information will be a summary
- of each node's exit policy. The exit policy summary will only contain
- the list of ports to which a node exits to most destination IP
- addresses.
- A summary should claim a router exits to a specific TCP port if,
- ignoring private IP addresses, the exit policy indicates that the
- router would exit to this port to most IP address. either two /8
- netblocks, or one /8 and a couple of /12s or any other combination).
- The exact algorith used is this: Going through all exit policy items
- - ignore any accept that is not for all IP addresses ("*"),
- - ignore rejects for these netblocks (exactly, no subnetting):
- 0.0.0.0/8, 169.254.0.0/16, 127.0.0.0/8, 192.168.0.0/16, 10.0.0.0/8,
- and 172.16.0.0/12m
- - for each reject count the number of IP addresses rejected against
- the affected ports,
- - once we hit an accept for all IP addresses ("*") add the ports in
- that policy item to the list of accepted ports, if they don't have
- more than 2^25 IP addresses (that's two /8 networks) counted
- against them (i.e. if the router exits to a port to everywhere but
- at most two /8 networks).
- An exit policy summary will be included in votes and consensus as a
- new line attached to each exit node. The line will have the format
- "p" <space> "accept"|"reject" <portlist>
- where portlist is a comma seperated list of single port numbers or
- portranges (e.g. "22,80-88,1024-6000,6667").
- Whether the summary shows the list of accepted ports or the list of
- rejected ports depends on which list is shorter (has a shorter string
- representation). In case of ties we choose the list of accepted
- ports. As an exception to this rule an allow-all policy is
- represented as "accept 1-65535" instead of "reject " and a reject-all
- policy is similarly given as "reject 1-65535".
- Summary items are compressed, that is instead of "80-88,89-100" there
- only is a single item of "80-100", similarly instead of "20,21" a
- summary will say "20-21".
- Port lists are sorted in ascending order.
- The maximum allowed length of a policy summary (including the "accept "
- or "reject ") is 1000 characters. If a summary exceeds that length we
- use an accept-style summary and list as much of the port list as is
- possible within these 1000 bytes.
- 3.4.1 Consensus selection
- When building a consensus, authorities have to agree on a digest of
- the server descriptor to list in the router line for each router.
- This is documented in dir-spec section 3.4.
- All authorities that listed that agreed upon descriptor digest in
- their vote should also list the same exit policy summary - or list
- none at all if the authority has not been upgraded to list that
- information in their vote.
- If we have votes with matching server descriptor digest of which at
- least one of them has an exit policy then we differ between two cases:
- a) all authorities agree (or abstained) on the policy summary, and we
- use the exit policy summary that they all listed in their vote,
- b) something went wrong (or some authority is playing foul) and we
- have different policy summaries. In that case we pick the one
- that is most commonly listed in votes with the matching
- descriptor. We break ties in favour of the lexigraphically larger
- vote.
- If none one of the votes with a matching server descriptor digest has
- an exit policy summary we use the most commonly listed one in all
- votes, breaking ties like in case b above.
- 3.4.2 Client behaviour
- When choosing an exit node for a specific request a Tor client will
- choose from the list of nodes that exit to the requested port as given
- by the consensus document. If a client has additional knowledge (like
- cached full descriptors) that indicates the so chosen exit node will
- reject the request then it MAY use that knowledge (or not include such
- nodes in the selection to begin with). However, clients MUST NOT use
- nodes that do not list the port as accepted in the summary (but for
- which they know that the node would exit to that address from other
- sources, like a cached descriptor).
- An exception to this is exit enclave behaviour: A client MAY use the
- node at a specific IP address to exit to any port on the same address
- even if that node is not listed as exiting to the port in the summary.
- 4. Migration
- 4.1 Consensus document changes.
- The consensus will need to include
- - bandwidth information (see 3.1)
- - exit policy summaries (3.4)
- A new consensus method (number TBD) will be chosen for this.
- 5. Future possibilities
- This proposal still requires that all servers have the descriptors of
- every other node in the network in order to answer RELAY_REQUEST_SD
- cells. These cells are sent when a circuit is extended from ending at
- node B to a new node C. In that case B would have to answer a
- RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
- In order to answer that request B obviously needs a copy of C's server
- descriptor. The RELAY_REQUEST_SD cell already has all the info that
- B needs to contact C so it can ask about the descriptor before passing it
- back to the client.
|