17 years ago · bcd7357b71
--- a/doc/spec/proposals/000-index.txt
+++ b/doc/spec/proposals/000-index.txt
@@ -63,6 +63,7 @@ Proposals by number:
 
				 138  Remove routers that are not Running from consensus documents [CLOSED]
			
 
				 139  Download consensus documents only when it will be trusted [CLOSED]
			
 
				 140  Provide diffs between consensuses [OPEN]
			
 
				+141  Download server descriptors on demand [DRAFT]
			
 
				 
			
 
				 
			
 
				 Proposals by status:
			
@@ -74,6 +75,7 @@ Proposals by status:
 
				    132  A Tor Web Service For Verifying Correct Browser Configuration
			
 
				    133  Incorporate Unreachable ORs into the Tor Network
			
 
				    134  More robust consensus voting with diverse authority sets
			
 
				+   141  Download server descriptors on demand
			
 
				  OPEN:
			
 
				    120  Shutdown descriptors when Tor servers stop
			
 
				    121  Hidden Service Authentication
			
--- a/doc/spec/proposals/141-jit-sd-downloads.txt
+++ b/doc/spec/proposals/141-jit-sd-downloads.txt
@@ -0,0 +1,219 @@
 
				+Filename: 141-jit-sd-downloads.txt
			
 
				+Title: Download server descriptors on demand
			
 
				+Version: $Revision$
			
 
				+Last-Modified: $Date$
			
 
				+Author: Peter Palfrader
			
 
				+Created: 15-Jun-2008
			
 
				+Status: Draft
			
 
				+
			
 
				+1. Overview
			
 
				+
			
 
				+  Downloading all server descriptors is the most expensive part
			
 
				+  of bootstrapping a Tor client.  These server descriptors currently
			
 
				+  amount to about 1.5 Megabytes of data, and this size will grow
			
 
				+  linearly with network size.
			
 
				+
			
 
				+  Fetching all these server descriptors takes a long while for people
			
 
				+  behind slow network connections.  It is also a considerable load on
			
 
				+  our network of directory mirrors.
			
 
				+
			
 
				+  This document describes proposed changes to the Tor network and
			
 
				+  directory protocol so that clients will no longer need to download
			
 
				+  all server descriptors.
			
 
				+
			
 
				+  These changes consist of moving load balancing information into
			
 
				+  network status documents, implementing a means to download server
			
 
				+  descriptors on demand in an anonymity-preserving way, and dealing
			
 
				+  with exit node selection.
			
 
				+
			
 
				+2. What is in a server descriptor
			
 
				+
			
 
				+  When a Tor client starts the first thing it will try to get is a
			
 
				+  current network status document, a consensus signed by a majority
			
 
				+  of directory authorities.  This document is currently about 100
			
 
				+  Kilobytes in size, tho it will grow linearly with network size.
			
 
				+  This document lists all servers currently running on the network.
			
 
				+  The Tor client will then try to get a server descriptor for each
			
 
				+  of the running servers.  All server descriptors currently amount
			
 
				+  to about 1.5 Metabytes of downloads.
			
 
				+
			
 
				+  A Tor client learns several things about a server from its descriptor.
			
 
				+  Some of these it already learned from the network status document
			
 
				+  published by the authorities, but the server descriptor contains it
			
 
				+  again in a single statement signed by the server itself, not just by
			
 
				+  the directory authorities.
			
 
				+
			
 
				+  Tor clients use the information from server descriptors for
			
 
				+  different purposes, which are considered in the following sections.
			
 
				+
			
 
				+  #three ways:  One, to determine if a server will be able to handle
			
 
				+  #this client's request; two, to actually communicate or use the server;
			
 
				+  #three, for load balancing decisions.
			
 
				+  #
			
 
				+  #These three points are considered in the following subsections.
			
 
				+
			
 
				+2.1 Load balancing
			
 
				+
			
 
				+  The Tor load balancing mechanism is quite complex in its details, but
			
 
				+  it has a simple goal: The more traffic a server can handle the more
			
 
				+  traffic it should get.  That means the more traffic a server can
			
 
				+  handle the more likely a client will use it.
			
 
				+
			
 
				+  For this purpose each server descriptor has bandwidth information
			
 
				+  which tries to convey a server's capacity to clients.
			
 
				+
			
 
				+  Currently we weigh servers differently for different purposes.  There
			
 
				+  is a weigh for when we use a server as a guard node (our entry to the
			
 
				+  Tor network), there is one weigh we assign servers for exit duties,
			
 
				+  and a third for when we need intermediate (middle) nodes.
			
 
				+
			
 
				+2.2 Exit information
			
 
				+
			
 
				+  When a Tor wants to exit to some resource on the internet it will
			
 
				+  build a circuit to an exit node that allows access to that resource's
			
 
				+  IP address and TCP Port.
			
 
				+
			
 
				+  When building that circuit the client can make sure that the circuit
			
 
				+  ends at a server that will be able to fulfill the request because the
			
 
				+  client already learned of all the servers' exit policies from their
			
 
				+  descriptors.
			
 
				+
			
 
				+2.3 Capability information
			
 
				+
			
 
				+  Server descriptors contain information about the specific version or
			
 
				+  the Tor protocol they understand [proposal 105].
			
 
				+
			
 
				+  Furthermore the server descriptor also contains the exact version of
			
 
				+  the Tor software that the server is running and some decisions are
			
 
				+  made based on the server version number (for instance a Tor client
			
 
				+  will only make conditional consensus requests [proposal from 13 Apr
			
 
				+  2008 that never got a number] when talking to Tor servers version
			
 
				+  0.2.1.1-alpha or later).
			
 
				+
			
 
				+2.4 Contact/key information
			
 
				+
			
 
				+  A server descriptor lists a server's IP address and TCP ports on which
			
 
				+  it accepts onion and directory connections.  Furthermore it contains
			
 
				+  the onion key, a short lived RSA key to which clients encrypt CREATE
			
 
				+  cells.
			
 
				+
			
 
				+2.5 Identity information
			
 
				+
			
 
				+  A Tor client learns the digest of a server's key from the network
			
 
				+  status document.  Once it has a server descriptor this descriptor
			
 
				+  contains the full RSA identity key of the server.  Clients verify
			
 
				+  that 1) the digest of the identity key matches the expected digest
			
 
				+  it got from the consensus, and 2) that the signature on the descriptor
			
 
				+  from that key is valid.
			
 
				+
			
 
				+
			
 
				+3. Doing away with the need for all SDs
			
 
				+
			
 
				+3.1 Load balancing info in consensus documents
			
 
				+
			
 
				+  One of the reasons why clients download all server descriptors is for
			
 
				+  doing load proper load balancing as described in 2.1.  In order for
			
 
				+  clients to not require all server descriptors this information will
			
 
				+  have to move into the network status document.
			
 
				+
			
 
				+  [XXX Two open questions here:
			
 
				+   a) how do we arrive at a consensus weight?
			
 
				+   b) how to represent weights in the consensus?
			
 
				+      Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
			
 
				+  ]
			
 
				+
			
 
				+3.2 Fetching descriptors on demand
			
 
				+
			
 
				+  As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
			
 
				+  and the onion key for a server.
			
 
				+
			
 
				+  A client already knows the IP address and the ports from the consensus
			
 
				+  documents, but without the onion key it will not be able to send
			
 
				+  CREATE/EXTEND cells for that server.  Since the client needs the onion
			
 
				+  key it needs the descriptor.
			
 
				+
			
 
				+  If a client only downloaded a few descriptors in an observable manner
			
 
				+  then that would leak which nodes it was going to use.
			
 
				+
			
 
				+  This proposal suggests the following:
			
 
				+
			
 
				+  1) when connecting to a guard node for which the client does not
			
 
				+     yet have a cached descriptor it requests the descriptor it
			
 
				+     expects by hash.  (The consensus document that the client holds
			
 
				+     has a hash for the descriptor of this server.  We want exactly
			
 
				+     that descriptor, not a different one.)
			
 
				+
			
 
				+     [XXX: How?  We could either come up with a new cell type,
			
 
				+      RELAY_REQUEST_SD that takes only a hash (of the SD), or use
			
 
				+      RELAY_BEGIN_DIR.  The former is probably smarter since we will
			
 
				+      want to use it later on as well, and there we will require
			
 
				+      padding.]
			
 
				+
			
 
				+     A client MAY cache the descriptor of the guard node so that it does
			
 
				+     not need to request it every single time it contacts the guard.
			
 
				+
			
 
				+  2) when a client wants to extend a circuit that currently ends in
			
 
				+     server B to a new next server C, the client will send a
			
 
				+     RELAY_REQUEST_SD cell to server B.  This cell contains in its
			
 
				+     payload the hash of a server descriptor the client would like
			
 
				+     to obtain (C's server descriptor).  The server sends back the
			
 
				+     descriptor and the client can now form a valid EXTEND/CREATE cell
			
 
				+     encrypted to C's onion key.
			
 
				+
			
 
				+     Clients MUST NOT cache such descriptors.  If they did they might
			
 
				+     leak that they already extended to that server at least once
			
 
				+     before.
			
 
				+
			
 
				+  Replies to RELAY_REQUEST_SD requests need to be padded to some
			
 
				+  constant upper limit in order to conceal a client's destination
			
 
				+  from anybody who might be counting cells/bytes.
			
 
				+
			
 
				+  [XXX: detailed spec of RELAY_REQUEST_SD cell and its reply]
			
 
				+  [XXX: figure out a decent padding size]
			
 
				+
			
 
				+3.3 Protocol versions
			
 
				+
			
 
				+  [XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
			
 
				+  information described in 2.3 above.  If we need it, it might have
			
 
				+  to go into the consensus document.]
			
 
				+
			
 
				+  [XXX: Similarly find out where we need the version number of a
			
 
				+  remote tor server.  This information is in the consensus, but
			
 
				+  maybe we use it in some place where having it signed by the
			
 
				+  server in question is really important?]
			
 
				+
			
 
				+3.4 Exit selection
			
 
				+
			
 
				+  Currently finding an appropriate exit node for a user's request is
			
 
				+  easy for a client because it has complete knowledge of all the exit
			
 
				+  policies of all servers on the network.
			
 
				+
			
 
				+  [XXX: I have no finished ideas here yet.
			
 
				+    - if clients only rely on the current exit flag they will
			
 
				+      a) never use servers for exit purposes that don't have it,
			
 
				+      b) will have a hard time finding a suitable exit node for
			
 
				+         their weird port that only a few servers allow.
			
 
				+    - the authorities could create a new summary document that
			
 
				+      lists all the exit policies and their nodes (by fingerprint).
			
 
				+      I need to find out how large that document would be.
			
 
				+    - can we make the "Exit" flag more useful?  can we come
			
 
				+      up with some "standard policies" and have operators pick
			
 
				+      one of the standards?
			
 
				+  ]
			
 
				+
			
 
				+4. Future possibilities
			
 
				+
			
 
				+  This proposal still requires that all servers have the descriptors of
			
 
				+  every other node in the network in order to answer RELAY_REQUEST_SD
			
 
				+  cells.  These cells are sent when a circuit is extended from ending at
			
 
				+  node B to a new node C.  In that case B would have to answer a
			
 
				+  RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
			
 
				+
			
 
				+  In order to answer that request B obviously needs a copy of C's server
			
 
				+  descriptor.  In the future we might amend RELAY_REQUEST_SD cells to
			
 
				+  contain also the expected IP address and OR-port of the server C (the
			
 
				+  client learns them from the network status document), so that B no
			
 
				+  longer needs to know all the descriptors of the entire network but
			
 
				+  instead can simply go and ask C for its descriptor before passing it
			
 
				+  back to the client.
			
 
				+