Browse Source

Update proposal to match implementation.

Mike Perry 15 years ago
parent
commit
81dc435ffa
1 changed files with 42 additions and 43 deletions
  1. 42 43
      doc/spec/proposals/151-path-selection-improvements.txt

+ 42 - 43
doc/spec/proposals/151-path-selection-improvements.txt

@@ -20,7 +20,7 @@ Motivation
 
 
 Implementation
 Implementation
 
 
-  Storing Build Times
+  Gathering Build Times
 
 
     Circuit build times are stored in the circular array
     Circuit build times are stored in the circular array
     'circuit_build_times' consisting of uint32_t elements as milliseconds.
     'circuit_build_times' consisting of uint32_t elements as milliseconds.
@@ -30,8 +30,16 @@ Implementation
     too large, because it will make it difficult for clients to adapt to
     too large, because it will make it difficult for clients to adapt to
     moving between different links.
     moving between different links.
 
 
-    From our observations, this value appears to be on the order of 1000,
+    From our observations, the minimum value for a reasonable fit appears
-    but is configurable in a #define NCIRCUITS_TO_OBSERVE.
+    to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
+    a good fit over the long term, we store 5000 most recent circuits in
+    the array (NCIRCUITS_TO_OBSERVE).
+
+    The Tor client will build test circuits at a rate of one per
+    minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
+    MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
+    a CircuitBuildTimeout estimated within 8 hours after install,
+    upgrade, or network change (see below).
 
 
   Long Term Storage
   Long Term Storage
 
 
@@ -43,9 +51,9 @@ Implementation
     Example:
     Example:
 
 
     TotalBuildTimes 100
     TotalBuildTimes 100
-    CircuitBuildTimeBin 0 50
+    CircuitBuildTimeBin 25 50
-    CircuitBuildTimeBin 50 25
+    CircuitBuildTimeBin 75 25
-    CircuitBuildTimeBin 100 13
+    CircuitBuildTimeBin 125 13
     ...
     ...
 
 
     Reading the histogram in will entail inserting <count> values
     Reading the histogram in will entail inserting <count> values
@@ -57,7 +65,12 @@ Implementation
   Learning the CircuitBuildTimeout
   Learning the CircuitBuildTimeout
 
 
     Based on studies of build times, we found that the distribution of
     Based on studies of build times, we found that the distribution of
-    circuit buildtimes appears to be a Pareto distribution.
+    circuit buildtimes appears to be a Frechet distribution. However,
+    estimators and quantile functions of the Frechet distribution are
+    difficult to work with and slow to converge. So instead, since we
+    are only interested in the accuracy of the tail, we approximate
+    the tail of the distribution with a Pareto curve starting at
+    the mode of the circuit build time sample set.
 
 
     We will calculate the parameters for a Pareto distribution
     We will calculate the parameters for a Pareto distribution
     fitting the data using the estimators at
     fitting the data using the estimators at
@@ -73,11 +86,8 @@ Implementation
 
 
   Detecting Changing Network Conditions
   Detecting Changing Network Conditions
 
 
-    We attempt to detect both network connectivty loss and drastic
+    We attempt to detect both network connectivity loss and drastic
-    changes in the timeout characteristics. Network connectivity loss
+    changes in the timeout characteristics.
-    is detected by recording a timestamp every time Tor either completes
-    a TLS connection or receives a cell. If this timestamp is more than
-    90 seconds in the past, circuit timeouts are no longer counted.
 
 
     If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
     If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
     RECENT_CIRCUITS (20) time out, we assume the network connection
     RECENT_CIRCUITS (20) time out, we assume the network connection
@@ -86,6 +96,11 @@ Implementation
     position on the Pareto Quartile function for the ratio of
     position on the Pareto Quartile function for the ratio of
     timeouts.
     timeouts.
 
 
+    Network connectivity loss is detected by recording a timestamp every
+    time Tor either completes a TLS connection or receives a cell. If
+    this timestamp is more than CircuitBuildTimeout*RECENT_CIRCUITS/3
+    seconds in the past, circuit timeouts are no longer counted.
+
   Testing
   Testing
 
 
     After circuit build times, storage, and learning are implemented,
     After circuit build times, storage, and learning are implemented,
@@ -96,44 +111,28 @@ Implementation
     the python produces matches that which is output to the state file in Tor,
     the python produces matches that which is output to the state file in Tor,
     and verify that the Pareto parameters and cutoff points also match.
     and verify that the Pareto parameters and cutoff points also match.
 
 
-  Soft timeout vs Hard Timeout
+    We will also verify that there are no unexpected large deviations from
-
+    node selection, such as nodes from distant geographical locations being
-    At some point, it may be desirable to change the cutoff from a
+    completely excluded.
-    single hard cutoff that destroys the circuit to a soft cutoff and
-    a hard cutoff, where the soft cutoff merely triggers the building
-    of a new circuit, and the hard cutoff triggers destruction of the
-    circuit.
-
-    Good values for hard and soft cutoffs seem to be 80% and 60%
-    respectively, but we should eventually justify this with observation.
-
-  When to Begin Calculation
-
-    The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
-    changing the CircuitBuildTimeout will be tunable via a #define. From
-    our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
-    on the order of 100.
 
 
   Dealing with Timeouts
   Dealing with Timeouts
 
 
     Timeouts should be counted as the expectation of the region of
     Timeouts should be counted as the expectation of the region of
-    of the Pareto distribution beyond the cutoff. The proposal will
+    of the Pareto distribution beyond the cutoff. This is done by
-    be updated with this value soon.
+    generating a random sample for each timeout at points on the
+    curve beyond the current timeout cutoff.
 
 
-    Also, in the event of network failure, the observation mechanism
+  Future Work
-    should stop collecting timeout data.
 
 
-  Client Hints
+    At some point, it may be desirable to change the cutoff from a
-
+    single hard cutoff that destroys the circuit to a soft cutoff and
-    Some research still needs to be done to provide initial values
+    a hard cutoff, where the soft cutoff merely triggers the building
-    for CircuitBuildTimeout based on values learned from modem
+    of a new circuit, and the hard cutoff triggers destruction of the
-    users, DSL users, Cable Modem users, and dedicated links. A
+    circuit.
-    radiobutton in Vidalia should eventually be provided that
-    sets CircuitBuildTimeout to one of these values and also
-    provide the option of purging all learned data, should any exist.
 
 
-    These values can either be published in the directory, or
+    It may also be beneficial to learn separate timeouts for each
-    shipped hardcoded for a particular Tor version.
+    guard node, as they will have slightly different distributions.
+    This will take longer to generate initial values though.
 
 
 Issues
 Issues