15 years ago · e84025bc2b
--- a/doc/spec/path-spec.txt
+++ b/doc/spec/path-spec.txt
@@ -175,6 +175,7 @@ of their choices.
 
				 
			
 
				    XXXX
			
 
				 
			
 
				+
			
 
				 2.2. Path selection and constraints
			
 
				 
			
 
				    We choose the path for each new circuit before we build it.  We choose the
			
@@ -295,8 +296,136 @@ of their choices.
 
				    at a given node -- either via the ".exit" notation or because the
			
 
				    destination is running at the same location as an exit node.
			
 
				 
			
 
				+2.4. Learning when to give up ("timeout") on circuit construction
			
 
				+
			
 
				+   Since version 0.2.2.8-alpha, Tor attempts to learn when to give up on
			
 
				+   circuits based on network conditions.
			
 
				+
			
 
				+2.4.1 Distribution choice and parameter estimation
			
 
				+
			
 
				+   Based on studies of build times, we found that the distribution of
			
 
				+   circuit build times appears to be a Frechet distribution. However,
			
 
				+   estimators and quantile functions of the Frechet distribution are
			
 
				+   difficult to work with and slow to converge. So instead, since we
			
 
				+   are only interested in the accuracy of the tail, we approximate
			
 
				+   the tail of the distribution with a Pareto curve.
			
 
				+
			
 
				+   We calculate the parameters for a Pareto distribution fitting the data
			
 
				+   using the estimators at
			
 
				+   http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
			
 
				+
			
 
				+   Because this is not a true Pareto distribution, we alter how Xm is
			
 
				+   computed. The Xm parameter is computed as the midpoint of the most
			
 
				+   frequently occurring 50ms histogram bin, until the point where 1000
			
 
				+   circuits are recorded. After this point, the weighted average of the top
			
 
				+   3 midpoint modes is used as Xm. All times below this value are counted
			
 
				+   as having the midpoint value of this weighted average bin.
			
 
				+
			
 
				+   The timeout itself is calculated by using the Pareto Quantile function (the
			
 
				+   inverted CDF) to give us the value on the CDF such that 80% of the mass
			
 
				+   of the distribution is below the timeout value.
			
 
				+
			
 
				+   Thus, we expect that the Tor client will accept the fastest 80% of
			
 
				+   the total number of paths on the network.
			
 
				+
			
 
				+2.4.2. How much data to record
			
 
				+
			
 
				+   From our observations, the minimum number of circuit build times for a
			
 
				+   reasonable fit appears to be on the order of 100. However, to keep a
			
 
				+   good fit over the long term, we store 1000 most recent circuit build times
			
 
				+   in a circular array.
			
 
				+
			
 
				+   The Tor client should build test circuits at a rate of one per
			
 
				+   minute up until 100 circuits are built. This allows a fresh Tor to have
			
 
				+   a CircuitBuildTimeout estimated within 1.5 hours after install,
			
 
				+   upgrade, or network change (see below).
			
 
				+
			
 
				+   Timeouts are stored on disk in a histogram of 50ms bin width, the same
			
 
				+   width used to calculate the Xm value above. This histogram must be shuffled
			
 
				+   after being read from disk, to preserve a proper expiration of old values
			
 
				+   after restart.
			
 
				+
			
 
				+2.4.3. How to record timeouts
			
 
				+
			
 
				+   Timeouts should be counted as the expectation of the region of
			
 
				+   of the Pareto distribution beyond the cutoff. This is done by
			
 
				+   generating a random sample for each timeout at points on the
			
 
				+   curve beyond the current timeout cutoff up to the 90% quantile marker.
			
 
				+
			
 
				+2.4.4. Detecting Changing Network Conditions
			
 
				+
			
 
				+   We attempt to detect both network connectivity loss and drastic
			
 
				+   changes in the timeout characteristics.
			
 
				+
			
 
				+   We assume that we've had network connectivity loss if 3 circuits
			
 
				+   timeout and we've received no cells or TLS handshakes since those
			
 
				+   circuits began. We then temporarily set the timeout to 60 seconds
			
 
				+   and stop counting timeouts.
			
 
				+
			
 
				+   If 3 more circuits timeout and the network still has not been
			
 
				+   live within this new 60 second timeout window, we then discard
			
 
				+   the previous timeouts during this period from our history.
			
 
				+
			
 
				+   To detect changing network conditions, we keep a history of
			
 
				+   the timeout or non-timeout status of the past 20 circuits that
			
 
				+   successfully completed at least one hop. If more than 90% of
			
 
				+   these circuits timeout, we discard all buildtimes history, reset
			
 
				+   the timeout to 60, and then begin recomputing the timeout.
			
 
				+
			
 
				+   If the timeout was already 60 or higher, we double the timeout.
			
 
				+
			
 
				+2.4.5. Consensus parameters governing behavior
			
 
				+
			
 
				+   Clients that implement circuit build timeout learning should obey the
			
 
				+   following consensus parameters that govern behavior, in order to allow
			
 
				+   us to handle bugs or other emergent behaviors due to client circuit
			
 
				+   construction. If these parameters are not present in the consensus,
			
 
				+   the listed default values should be used instead.
			
 
				+
			
 
				+      cbtrecentcount
			
 
				+        Default: 20
			
 
				+        Effect: This is the number of circuit build times to keep track of
			
 
				+                for the following option.
			
 
				+
			
 
				+      cbtmaxtimeouts
			
 
				+        Default: 18
			
 
				+        Effect: When this many timeouts happen in the last 'cbtrecentcount'
			
 
				+                circuit attempts, the client should discard all of its
			
 
				+                history and begin learning a fresh timeout value.
			
 
				+
			
 
				+      cbtmincircs
			
 
				+        Default: 100
			
 
				+        Effect: This is the minimum number of circuits to build before
			
 
				+                computing a timeout.
			
 
				+
			
 
				+      cbtquantile
			
 
				+        Default: 80
			
 
				+        Effect: This is the position on the quantile curve to use to set the
			
 
				+                timeout value. It is a percent (0-99).
			
 
				+
			
 
				+      cbtmaxsynthquantile
			
 
				+        Default: 90
			
 
				+        Effect: This is the maximum position on the quantile curve to use to
			
 
				+                generate synthetic circuit build times for timeouts. It is a
			
 
				+                percent (0-99).
			
 
				+
			
 
				+      cbttestfreq
			
 
				+        Default: 60
			
 
				+        Effect: Describes how often in seconds to build a test circuit to
			
 
				+                gather timeout values. Only applies if less than 'cbtmincircs'
			
 
				+                have been recorded.
			
 
				+
			
 
				+      cbtmintimeout
			
 
				+        Default: 2000
			
 
				+        Effect: This is the minimum allowed timeout value in milliseconds.
			
 
				+
			
 
				+      cbtinitialtimeout
			
 
				+        Default: 60000
			
 
				+        Effect: This is the timeout value to use before computing a timeout,
			
 
				+                in milliseconds.
			
 
				+
			
 
				 
			
 
				-2.4. Handling failure
			
 
				+2.5. Handling failure
			
 
				 
			
 
				    If an attempt to extend a circuit fails (either because the first create
			
 
				    failed or a subsequent extend failed) then the circuit is torn down and is
			
--- a/doc/spec/proposals/151-path-selection-improvements.txt
+++ b/doc/spec/proposals/151-path-selection-improvements.txt
@@ -3,6 +3,7 @@ Title: Improving Tor Path Selection
 
				 Author: Fallon Chen, Mike Perry
			
 
				 Created: 5-Jul-2008
			
 
				 Status: Finished
			
 
				+In-Spec: path-spec.txt
			
 
				 
			
 
				 Overview