|
@@ -0,0 +1,219 @@
|
|
|
|
+Filename: 141-jit-sd-downloads.txt
|
|
|
|
+Title: Download server descriptors on demand
|
|
|
|
+Version: $Revision$
|
|
|
|
+Last-Modified: $Date$
|
|
|
|
+Author: Peter Palfrader
|
|
|
|
+Created: 15-Jun-2008
|
|
|
|
+Status: Draft
|
|
|
|
+
|
|
|
|
+1. Overview
|
|
|
|
+
|
|
|
|
+ Downloading all server descriptors is the most expensive part
|
|
|
|
+ of bootstrapping a Tor client. These server descriptors currently
|
|
|
|
+ amount to about 1.5 Megabytes of data, and this size will grow
|
|
|
|
+ linearly with network size.
|
|
|
|
+
|
|
|
|
+ Fetching all these server descriptors takes a long while for people
|
|
|
|
+ behind slow network connections. It is also a considerable load on
|
|
|
|
+ our network of directory mirrors.
|
|
|
|
+
|
|
|
|
+ This document describes proposed changes to the Tor network and
|
|
|
|
+ directory protocol so that clients will no longer need to download
|
|
|
|
+ all server descriptors.
|
|
|
|
+
|
|
|
|
+ These changes consist of moving load balancing information into
|
|
|
|
+ network status documents, implementing a means to download server
|
|
|
|
+ descriptors on demand in an anonymity-preserving way, and dealing
|
|
|
|
+ with exit node selection.
|
|
|
|
+
|
|
|
|
+2. What is in a server descriptor
|
|
|
|
+
|
|
|
|
+ When a Tor client starts the first thing it will try to get is a
|
|
|
|
+ current network status document, a consensus signed by a majority
|
|
|
|
+ of directory authorities. This document is currently about 100
|
|
|
|
+ Kilobytes in size, tho it will grow linearly with network size.
|
|
|
|
+ This document lists all servers currently running on the network.
|
|
|
|
+ The Tor client will then try to get a server descriptor for each
|
|
|
|
+ of the running servers. All server descriptors currently amount
|
|
|
|
+ to about 1.5 Metabytes of downloads.
|
|
|
|
+
|
|
|
|
+ A Tor client learns several things about a server from its descriptor.
|
|
|
|
+ Some of these it already learned from the network status document
|
|
|
|
+ published by the authorities, but the server descriptor contains it
|
|
|
|
+ again in a single statement signed by the server itself, not just by
|
|
|
|
+ the directory authorities.
|
|
|
|
+
|
|
|
|
+ Tor clients use the information from server descriptors for
|
|
|
|
+ different purposes, which are considered in the following sections.
|
|
|
|
+
|
|
|
|
+ #three ways: One, to determine if a server will be able to handle
|
|
|
|
+ #this client's request; two, to actually communicate or use the server;
|
|
|
|
+ #three, for load balancing decisions.
|
|
|
|
+ #
|
|
|
|
+ #These three points are considered in the following subsections.
|
|
|
|
+
|
|
|
|
+2.1 Load balancing
|
|
|
|
+
|
|
|
|
+ The Tor load balancing mechanism is quite complex in its details, but
|
|
|
|
+ it has a simple goal: The more traffic a server can handle the more
|
|
|
|
+ traffic it should get. That means the more traffic a server can
|
|
|
|
+ handle the more likely a client will use it.
|
|
|
|
+
|
|
|
|
+ For this purpose each server descriptor has bandwidth information
|
|
|
|
+ which tries to convey a server's capacity to clients.
|
|
|
|
+
|
|
|
|
+ Currently we weigh servers differently for different purposes. There
|
|
|
|
+ is a weigh for when we use a server as a guard node (our entry to the
|
|
|
|
+ Tor network), there is one weigh we assign servers for exit duties,
|
|
|
|
+ and a third for when we need intermediate (middle) nodes.
|
|
|
|
+
|
|
|
|
+2.2 Exit information
|
|
|
|
+
|
|
|
|
+ When a Tor wants to exit to some resource on the internet it will
|
|
|
|
+ build a circuit to an exit node that allows access to that resource's
|
|
|
|
+ IP address and TCP Port.
|
|
|
|
+
|
|
|
|
+ When building that circuit the client can make sure that the circuit
|
|
|
|
+ ends at a server that will be able to fulfill the request because the
|
|
|
|
+ client already learned of all the servers' exit policies from their
|
|
|
|
+ descriptors.
|
|
|
|
+
|
|
|
|
+2.3 Capability information
|
|
|
|
+
|
|
|
|
+ Server descriptors contain information about the specific version or
|
|
|
|
+ the Tor protocol they understand [proposal 105].
|
|
|
|
+
|
|
|
|
+ Furthermore the server descriptor also contains the exact version of
|
|
|
|
+ the Tor software that the server is running and some decisions are
|
|
|
|
+ made based on the server version number (for instance a Tor client
|
|
|
|
+ will only make conditional consensus requests [proposal from 13 Apr
|
|
|
|
+ 2008 that never got a number] when talking to Tor servers version
|
|
|
|
+ 0.2.1.1-alpha or later).
|
|
|
|
+
|
|
|
|
+2.4 Contact/key information
|
|
|
|
+
|
|
|
|
+ A server descriptor lists a server's IP address and TCP ports on which
|
|
|
|
+ it accepts onion and directory connections. Furthermore it contains
|
|
|
|
+ the onion key, a short lived RSA key to which clients encrypt CREATE
|
|
|
|
+ cells.
|
|
|
|
+
|
|
|
|
+2.5 Identity information
|
|
|
|
+
|
|
|
|
+ A Tor client learns the digest of a server's key from the network
|
|
|
|
+ status document. Once it has a server descriptor this descriptor
|
|
|
|
+ contains the full RSA identity key of the server. Clients verify
|
|
|
|
+ that 1) the digest of the identity key matches the expected digest
|
|
|
|
+ it got from the consensus, and 2) that the signature on the descriptor
|
|
|
|
+ from that key is valid.
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+3. Doing away with the need for all SDs
|
|
|
|
+
|
|
|
|
+3.1 Load balancing info in consensus documents
|
|
|
|
+
|
|
|
|
+ One of the reasons why clients download all server descriptors is for
|
|
|
|
+ doing load proper load balancing as described in 2.1. In order for
|
|
|
|
+ clients to not require all server descriptors this information will
|
|
|
|
+ have to move into the network status document.
|
|
|
|
+
|
|
|
|
+ [XXX Two open questions here:
|
|
|
|
+ a) how do we arrive at a consensus weight?
|
|
|
|
+ b) how to represent weights in the consensus?
|
|
|
|
+ Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
|
|
|
|
+ ]
|
|
|
|
+
|
|
|
|
+3.2 Fetching descriptors on demand
|
|
|
|
+
|
|
|
|
+ As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
|
|
|
|
+ and the onion key for a server.
|
|
|
|
+
|
|
|
|
+ A client already knows the IP address and the ports from the consensus
|
|
|
|
+ documents, but without the onion key it will not be able to send
|
|
|
|
+ CREATE/EXTEND cells for that server. Since the client needs the onion
|
|
|
|
+ key it needs the descriptor.
|
|
|
|
+
|
|
|
|
+ If a client only downloaded a few descriptors in an observable manner
|
|
|
|
+ then that would leak which nodes it was going to use.
|
|
|
|
+
|
|
|
|
+ This proposal suggests the following:
|
|
|
|
+
|
|
|
|
+ 1) when connecting to a guard node for which the client does not
|
|
|
|
+ yet have a cached descriptor it requests the descriptor it
|
|
|
|
+ expects by hash. (The consensus document that the client holds
|
|
|
|
+ has a hash for the descriptor of this server. We want exactly
|
|
|
|
+ that descriptor, not a different one.)
|
|
|
|
+
|
|
|
|
+ [XXX: How? We could either come up with a new cell type,
|
|
|
|
+ RELAY_REQUEST_SD that takes only a hash (of the SD), or use
|
|
|
|
+ RELAY_BEGIN_DIR. The former is probably smarter since we will
|
|
|
|
+ want to use it later on as well, and there we will require
|
|
|
|
+ padding.]
|
|
|
|
+
|
|
|
|
+ A client MAY cache the descriptor of the guard node so that it does
|
|
|
|
+ not need to request it every single time it contacts the guard.
|
|
|
|
+
|
|
|
|
+ 2) when a client wants to extend a circuit that currently ends in
|
|
|
|
+ server B to a new next server C, the client will send a
|
|
|
|
+ RELAY_REQUEST_SD cell to server B. This cell contains in its
|
|
|
|
+ payload the hash of a server descriptor the client would like
|
|
|
|
+ to obtain (C's server descriptor). The server sends back the
|
|
|
|
+ descriptor and the client can now form a valid EXTEND/CREATE cell
|
|
|
|
+ encrypted to C's onion key.
|
|
|
|
+
|
|
|
|
+ Clients MUST NOT cache such descriptors. If they did they might
|
|
|
|
+ leak that they already extended to that server at least once
|
|
|
|
+ before.
|
|
|
|
+
|
|
|
|
+ Replies to RELAY_REQUEST_SD requests need to be padded to some
|
|
|
|
+ constant upper limit in order to conceal a client's destination
|
|
|
|
+ from anybody who might be counting cells/bytes.
|
|
|
|
+
|
|
|
|
+ [XXX: detailed spec of RELAY_REQUEST_SD cell and its reply]
|
|
|
|
+ [XXX: figure out a decent padding size]
|
|
|
|
+
|
|
|
|
+3.3 Protocol versions
|
|
|
|
+
|
|
|
|
+ [XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
|
|
|
|
+ information described in 2.3 above. If we need it, it might have
|
|
|
|
+ to go into the consensus document.]
|
|
|
|
+
|
|
|
|
+ [XXX: Similarly find out where we need the version number of a
|
|
|
|
+ remote tor server. This information is in the consensus, but
|
|
|
|
+ maybe we use it in some place where having it signed by the
|
|
|
|
+ server in question is really important?]
|
|
|
|
+
|
|
|
|
+3.4 Exit selection
|
|
|
|
+
|
|
|
|
+ Currently finding an appropriate exit node for a user's request is
|
|
|
|
+ easy for a client because it has complete knowledge of all the exit
|
|
|
|
+ policies of all servers on the network.
|
|
|
|
+
|
|
|
|
+ [XXX: I have no finished ideas here yet.
|
|
|
|
+ - if clients only rely on the current exit flag they will
|
|
|
|
+ a) never use servers for exit purposes that don't have it,
|
|
|
|
+ b) will have a hard time finding a suitable exit node for
|
|
|
|
+ their weird port that only a few servers allow.
|
|
|
|
+ - the authorities could create a new summary document that
|
|
|
|
+ lists all the exit policies and their nodes (by fingerprint).
|
|
|
|
+ I need to find out how large that document would be.
|
|
|
|
+ - can we make the "Exit" flag more useful? can we come
|
|
|
|
+ up with some "standard policies" and have operators pick
|
|
|
|
+ one of the standards?
|
|
|
|
+ ]
|
|
|
|
+
|
|
|
|
+4. Future possibilities
|
|
|
|
+
|
|
|
|
+ This proposal still requires that all servers have the descriptors of
|
|
|
|
+ every other node in the network in order to answer RELAY_REQUEST_SD
|
|
|
|
+ cells. These cells are sent when a circuit is extended from ending at
|
|
|
|
+ node B to a new node C. In that case B would have to answer a
|
|
|
|
+ RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
|
|
|
|
+
|
|
|
|
+ In order to answer that request B obviously needs a copy of C's server
|
|
|
|
+ descriptor. In the future we might amend RELAY_REQUEST_SD cells to
|
|
|
|
+ contain also the expected IP address and OR-port of the server C (the
|
|
|
|
+ client learns them from the network status document), so that B no
|
|
|
|
+ longer needs to know all the descriptors of the entire network but
|
|
|
|
+ instead can simply go and ask C for its descriptor before passing it
|
|
|
|
+ back to the client.
|
|
|
|
+
|