123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241 |
- Filename: 141-jit-sd-downloads.txt
- Title: Download server descriptors on demand
- Version: $Revision$
- Last-Modified: $Date$
- Author: Peter Palfrader
- Created: 15-Jun-2008
- Status: Draft
- 1. Overview
- Downloading all server descriptors is the most expensive part
- of bootstrapping a Tor client. These server descriptors currently
- amount to about 1.5 Megabytes of data, and this size will grow
- linearly with network size.
- Fetching all these server descriptors takes a long while for people
- behind slow network connections. It is also a considerable load on
- our network of directory mirrors.
- This document describes proposed changes to the Tor network and
- directory protocol so that clients will no longer need to download
- all server descriptors.
- These changes consist of moving load balancing information into
- network status documents, implementing a means to download server
- descriptors on demand in an anonymity-preserving way, and dealing
- with exit node selection.
- 2. What is in a server descriptor
- When a Tor client starts the first thing it will try to get is a
- current network status document: a consensus signed by a majority
- of directory authorities. This document is currently about 100
- Kilobytes in size, tho it will grow linearly with network size.
- This document lists all servers currently running on the network.
- The Tor client will then try to get a server descriptor for each
- of the running servers. All server descriptors currently amount
- to about 1.5 Megabytes of downloads.
- A Tor client learns several things about a server from its descriptor.
- Some of these it already learned from the network status document
- published by the authorities, but the server descriptor contains it
- again in a single statement signed by the server itself, not just by
- the directory authorities.
- Tor clients use the information from server descriptors for
- different purposes, which are considered in the following sections.
- #three ways: One, to determine if a server will be able to handle
- #this client's request; two, to actually communicate or use the server;
- #three, for load balancing decisions.
- #
- #These three points are considered in the following subsections.
- 2.1 Load balancing
- The Tor load balancing mechanism is quite complex in its details, but
- it has a simple goal: The more traffic a server can handle the more
- traffic it should get. That means the more traffic a server can
- handle the more likely a client will use it.
- For this purpose each server descriptor has bandwidth information
- which tries to convey a server's capacity to clients.
- Currently we weigh servers differently for different purposes. There
- is a weigh for when we use a server as a guard node (our entry to the
- Tor network), there is one weigh we assign servers for exit duties,
- and a third for when we need intermediate (middle) nodes.
- 2.2 Exit information
- When a Tor wants to exit to some resource on the internet it will
- build a circuit to an exit node that allows access to that resource's
- IP address and TCP Port.
- When building that circuit the client can make sure that the circuit
- ends at a server that will be able to fulfill the request because the
- client already learned of all the servers' exit policies from their
- descriptors.
- 2.3 Capability information
- Server descriptors contain information about the specific version or
- the Tor protocol they understand [proposal 105].
- Furthermore the server descriptor also contains the exact version of
- the Tor software that the server is running and some decisions are
- made based on the server version number (for instance a Tor client
- will only make conditional consensus requests [proposal 139] when
- talking to Tor servers version 0.2.1.1-alpha or later).
- 2.4 Contact/key information
- A server descriptor lists a server's IP address and TCP ports on which
- it accepts onion and directory connections. Furthermore it contains
- the onion key (a short lived RSA key to which clients encrypt CREATE
- cells).
- 2.5 Identity information
- A Tor client learns the digest of a server's key from the network
- status document. Once it has a server descriptor this descriptor
- contains the full RSA identity key of the server. Clients verify
- that 1) the digest of the identity key matches the expected digest
- it got from the consensus, and 2) that the signature on the descriptor
- from that key is valid.
- 3. No longer require clients to have copies of all SDs
- 3.1 Load balancing info in consensus documents
- One of the reasons why clients download all server descriptors is for
- doing load proper load balancing as described in 2.1. In order for
- clients to not require all server descriptors this information will
- have to move into the network status document.
- Consensus documents will have a new line per router similar
- to the "r", "s", and "v" lines that already exist. This line
- will convey weight information to clients.
- "w Exit=41 Guard=94 Middle=543 ..."
- It starts with the letter w and then contains any number of Key=Value
- pairs. Values will be non-negative integers. Clients will pick
- routers with a propability proportional to the number for the intended
- purpose.
- Clients MUST accept sums of all weights for a given purpose over all
- routers in a consensus up to UINT64_max.
- [XXX how do we arrive at a consensus weight?
- option a) Perhaps the vote could contain the node's bandwidth, and
- this could be used to calculate the weights? It's
- necessary that the consensus remain a deterministic
- function of the votes.
- option b) Every voter assigns weights for each of the purposes
- (Exit, Guard, ..) so that their total sum is some constant
- X. When building a consensus we take the median for each
- purpose for each router.
- Option a has the disadvantage that if we want to tweak the weighting
- we have to make a new consensus-method]
- 3.2 Fetching descriptors on demand
- As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
- and the onion key for a server.
- A client already knows the IP address and the ports from the consensus
- documents, but without the onion key it will not be able to send
- CREATE/EXTEND cells for that server. Since the client needs the onion
- key it needs the descriptor.
- If a client only downloaded a few descriptors in an observable manner
- then that would leak which nodes it was going to use.
- This proposal suggests the following:
- 1) when connecting to a guard node for which the client does not
- yet have a cached descriptor it requests the descriptor it
- expects by hash. (The consensus document that the client holds
- has a hash for the descriptor of this server. We want exactly
- that descriptor, not a different one.)
- It does that by sending a RELAY_REQUEST_SD cell.
- A client MAY cache the descriptor of the guard node so that it does
- not need to request it every single time it contacts the guard.
- 2) when a client wants to extend a circuit that currently ends in
- server B to a new next server C, the client will send a
- RELAY_REQUEST_SD cell to server B. This cell contains in its
- payload the hash of a server descriptor the client would like
- to obtain (C's server descriptor). The server sends back the
- descriptor and the client can now form a valid EXTEND/CREATE cell
- encrypted to C's onion key.
- Clients MUST NOT cache such descriptors. If they did they might
- leak that they already extended to that server at least once
- before.
- Replies to RELAY_REQUEST_SD requests need to be padded to some
- constant upper limit in order to conceal a client's destination
- from anybody who might be counting cells/bytes.
- RELAY_REQUEST_SD cells contain the following information:
- - hash of the server descriptor requested
- - hash of the identity digest of the server for which we want the SD
- - IP address and OR-port or the server for which we want the SD
- - padding factor - the number of cells we want the answer
- padded to.
- [XXX this just occured to me and it might be smart. or it might
- be stupid. clients would learn the padding factor they want
- to use from the consensus document. This allows us to grow
- the replies later on should SDs become larger.]
- [XXX: figure out a decent padding size]
- 3.3 Protocol versions
- [XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
- information described in 2.3 above. If we need it, it might have
- to go into the consensus document.]
- [XXX: Similarly find out where we need the version number of a
- remote tor server. This information is in the consensus, but
- maybe we use it in some place where having it signed by the
- server in question is really important?]
- 3.4 Exit selection
- Currently finding an appropriate exit node for a user's request is
- easy for a client because it has complete knowledge of all the exit
- policies of all servers on the network.
- [XXX: I have no finished ideas here yet.
- - if clients only rely on the current exit flag they will
- a) never use servers for exit purposes that don't have it,
- b) will have a hard time finding a suitable exit node for
- their weird port that only a few servers allow.
- - the authorities could create a new summary document that
- lists all the exit policies and their nodes (by fingerprint).
- I need to find out how large that document would be.
- - can we make the "Exit" flag more useful? can we come
- up with some "standard policies" and have operators pick
- one of the standards?
- ]
- 4. Future possibilities
- This proposal still requires that all servers have the descriptors of
- every other node in the network in order to answer RELAY_REQUEST_SD
- cells. These cells are sent when a circuit is extended from ending at
- node B to a new node C. In that case B would have to answer a
- RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
- In order to answer that request B obviously needs a copy of C's server
- descriptor. The RELAY_REQUEST_SD cell already has all the info that
- B needs to contact C so it can ask about the descriptor before passing it
- back to the client.
|