16 лет назад · 81dc435ffa
--- a/doc/spec/proposals/151-path-selection-improvements.txt
+++ b/doc/spec/proposals/151-path-selection-improvements.txt
@@ -20,7 +20,7 @@ Motivation
 
				 
			
 
				 Implementation
			
 
				 
			
 
				-  Storing Build Times
			
 
				+  Gathering Build Times
			
 
				 
			
 
				     Circuit build times are stored in the circular array
			
 
				     'circuit_build_times' consisting of uint32_t elements as milliseconds.
			
@@ -30,8 +30,16 @@ Implementation
 
				     too large, because it will make it difficult for clients to adapt to
			
 
				     moving between different links.
			
 
				 
			
 
				-    From our observations, this value appears to be on the order of 1000,
			
 
				-    but is configurable in a #define NCIRCUITS_TO_OBSERVE.
			
 
				+    From our observations, the minimum value for a reasonable fit appears
			
 
				+    to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
			
 
				+    a good fit over the long term, we store 5000 most recent circuits in
			
 
				+    the array (NCIRCUITS_TO_OBSERVE).
			
 
				+
			
 
				+    The Tor client will build test circuits at a rate of one per
			
 
				+    minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
			
 
				+    MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
			
 
				+    a CircuitBuildTimeout estimated within 8 hours after install,
			
 
				+    upgrade, or network change (see below).
			
 
				 
			
 
				   Long Term Storage
			
 
				 
			
@@ -43,9 +51,9 @@ Implementation
 
				     Example:
			
 
				 
			
 
				     TotalBuildTimes 100
			
 
				-    CircuitBuildTimeBin 0 50
			
 
				-    CircuitBuildTimeBin 50 25
			
 
				-    CircuitBuildTimeBin 100 13
			
 
				+    CircuitBuildTimeBin 25 50
			
 
				+    CircuitBuildTimeBin 75 25
			
 
				+    CircuitBuildTimeBin 125 13
			
 
				     ...
			
 
				 
			
 
				     Reading the histogram in will entail inserting <count> values
			
@@ -57,7 +65,12 @@ Implementation
 
				   Learning the CircuitBuildTimeout
			
 
				 
			
 
				     Based on studies of build times, we found that the distribution of
			
 
				-    circuit buildtimes appears to be a Pareto distribution.
			
 
				+    circuit buildtimes appears to be a Frechet distribution. However,
			
 
				+    estimators and quantile functions of the Frechet distribution are
			
 
				+    difficult to work with and slow to converge. So instead, since we
			
 
				+    are only interested in the accuracy of the tail, we approximate
			
 
				+    the tail of the distribution with a Pareto curve starting at
			
 
				+    the mode of the circuit build time sample set.
			
 
				 
			
 
				     We will calculate the parameters for a Pareto distribution
			
 
				     fitting the data using the estimators at
			
@@ -73,11 +86,8 @@ Implementation
 
				 
			
 
				   Detecting Changing Network Conditions
			
 
				 
			
 
				-    We attempt to detect both network connectivty loss and drastic
			
 
				-    changes in the timeout characteristics. Network connectivity loss
			
 
				-    is detected by recording a timestamp every time Tor either completes
			
 
				-    a TLS connection or receives a cell. If this timestamp is more than
			
 
				-    90 seconds in the past, circuit timeouts are no longer counted.
			
 
				+    We attempt to detect both network connectivity loss and drastic
			
 
				+    changes in the timeout characteristics.
			
 
				 
			
 
				     If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
			
 
				     RECENT_CIRCUITS (20) time out, we assume the network connection
			
@@ -86,6 +96,11 @@ Implementation
 
				     position on the Pareto Quartile function for the ratio of
			
 
				     timeouts.
			
 
				 
			
 
				+    Network connectivity loss is detected by recording a timestamp every
			
 
				+    time Tor either completes a TLS connection or receives a cell. If
			
 
				+    this timestamp is more than CircuitBuildTimeout*RECENT_CIRCUITS/3
			
 
				+    seconds in the past, circuit timeouts are no longer counted.
			
 
				+
			
 
				   Testing
			
 
				 
			
 
				     After circuit build times, storage, and learning are implemented,
			
@@ -96,44 +111,28 @@ Implementation
 
				     the python produces matches that which is output to the state file in Tor,
			
 
				     and verify that the Pareto parameters and cutoff points also match.
			
 
				 
			
 
				-  Soft timeout vs Hard Timeout
			
 
				-
			
 
				-    At some point, it may be desirable to change the cutoff from a
			
 
				-    single hard cutoff that destroys the circuit to a soft cutoff and
			
 
				-    a hard cutoff, where the soft cutoff merely triggers the building
			
 
				-    of a new circuit, and the hard cutoff triggers destruction of the
			
 
				-    circuit.
			
 
				-
			
 
				-    Good values for hard and soft cutoffs seem to be 80% and 60%
			
 
				-    respectively, but we should eventually justify this with observation.
			
 
				-
			
 
				-  When to Begin Calculation
			
 
				-
			
 
				-    The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
			
 
				-    changing the CircuitBuildTimeout will be tunable via a #define. From
			
 
				-    our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
			
 
				-    on the order of 100.
			
 
				+    We will also verify that there are no unexpected large deviations from
			
 
				+    node selection, such as nodes from distant geographical locations being
			
 
				+    completely excluded.
			
 
				 
			
 
				   Dealing with Timeouts
			
 
				 
			
 
				     Timeouts should be counted as the expectation of the region of
			
 
				-    of the Pareto distribution beyond the cutoff. The proposal will
			
 
				-    be updated with this value soon.
			
 
				+    of the Pareto distribution beyond the cutoff. This is done by
			
 
				+    generating a random sample for each timeout at points on the
			
 
				+    curve beyond the current timeout cutoff.
			
 
				 
			
 
				-    Also, in the event of network failure, the observation mechanism
			
 
				-    should stop collecting timeout data.
			
 
				+  Future Work
			
 
				 
			
 
				-  Client Hints
			
 
				-
			
 
				-    Some research still needs to be done to provide initial values
			
 
				-    for CircuitBuildTimeout based on values learned from modem
			
 
				-    users, DSL users, Cable Modem users, and dedicated links. A
			
 
				-    radiobutton in Vidalia should eventually be provided that
			
 
				-    sets CircuitBuildTimeout to one of these values and also
			
 
				-    provide the option of purging all learned data, should any exist.
			
 
				+    At some point, it may be desirable to change the cutoff from a
			
 
				+    single hard cutoff that destroys the circuit to a soft cutoff and
			
 
				+    a hard cutoff, where the soft cutoff merely triggers the building
			
 
				+    of a new circuit, and the hard cutoff triggers destruction of the
			
 
				+    circuit.
			
 
				 
			
 
				-    These values can either be published in the directory, or
			
 
				-    shipped hardcoded for a particular Tor version.
			
 
				+    It may also be beneficial to learn separate timeouts for each
			
 
				+    guard node, as they will have slightly different distributions.
			
 
				+    This will take longer to generate initial values though.
			
 
				 
			
 
				 Issues