17 years ago · 5166e5ff55
--- a/doc/spec/proposals/151-path-selection-improvements.txt
+++ b/doc/spec/proposals/151-path-selection-improvements.txt
@@ -10,8 +10,8 @@ Overview
 
															   The performance of paths selected can be improved by adjusting the
														
 
															   CircuitBuildTimeout and avoiding failing guard nodes. This proposal
														
 
															-  describes a method of tracking buildtime statistics, and using those
														
 
															-  statistics to adjust the CircuitBuildTimeout and the number of guards.
														
 
															+  describes a method of tracking buildtime statistics at the client, and 
														
 
															+  using those statistics to adjust the CircuitBuildTimeout.
														
 
															 Motivation
														
@@ -22,71 +22,91 @@ Motivation
 
															 Implementation
														
 
															+  Storing Build Times
														
 
															+
														
 
															+    Circuit build times will be stored in the circular array
														
 
															+    'circuit_build_times' consisting of uint16_t elements as milliseconds.
														
 
															+    The total size of this array will be based on the number of circuits
														
 
															+    it takes to converge on a good fit of the long term distribution of
														
 
															+    the circuit builds for a fixed link. We do not want this value to be
														
 
															+    too large, because it will make it difficult for clients to adapt to
														
 
															+    moving between different links.
														
 
															+
														
 
															+    From our initial observations, this value appears to be on the order 
														
 
															+    of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
														
 
															+ 
														
 
															+  Long Term Storage
														
 
															+
														
 
															+    The long-term storage representation will be implemented by storing a 
														
 
															+    histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when 
														
 
															+    writing out the statistics to disk. The format of this histogram on disk 
														
 
															+    is yet to be finalized, but it will likely be of the format 
														
 
															+    'CircuitBuildTime <bin> <count>'.
														
 
															+    Example:
														
 
															+
														
 
															+    CircuitBuildTimeBin 1 100
														
 
															+    CircuitBuildTimeBin 2 50
														
 
															+    ...
														
 
															+
														
 
															+    Reading the histogram in will entail multiplying each bin by the 
														
 
															+    BUILDTIME_BIN_WIDTH and then inserting <count> values into the 
														
 
															+    circuit_build_times array each with the value of
														
 
															+    <bin>*BUILDTIME_BIN_WIDTH.
														
 
															+
														
 
															   Learning the CircuitBuildTimeout
														
 
															     Based on studies of build times, we found that the distribution of
														
 
															-    circuit buildtimes appears to be a Pareto distribution. The number
														
 
															-    of circuits to observe (ncircuits_to_cutoff) before changing the
														
 
															-    CircuitBuildTimeout will be tunable. From out measurements, 
														
 
															-    ncircuits_to_cuttoff appears to be on the order of 100.
														
 
															- 
														
 
															-	In addition, the total number of circuits gathered
														
 
															-    (ncircuits_to_observe) will also be tunable. It is likely that
														
 
															-    ncircuits_to_observe will be somewhere on the order of 1000. The values
														
 
															-    can be represented compactly in Tor in milliseconds as a circular array
														
 
															-    of 16 bit integers. More compact long-term storage representations can
														
 
															-    be implemented by simply storing a histogram with 50 millisecond buckets
														
 
															-    when writing out the statistics to disk.
														
 
															-
														
 
															-  Calculating the preferred CircuitBuildTimeout
														
 
															-
														
 
															-    Circuits that have longer buildtimes than some x% of the estimated
														
 
															-    CDF of the Pareto distribution will be excluded. x will be tunable
														
 
															-    as well.
														
 
															-
														
 
															-  Circuit timeouts
														
 
															-
														
 
															-    In the event of a timeout, backoff values should include the 100-x%
														
 
															-    of expected CDF of timeouts.  Also, in the event of network failure,
														
 
															-    the observation mechanism should stop collecting timeout data.
														
 
															-
														
 
															-  Dropping Failed Guards
														
 
															-
														
 
															-    In addition, we have noticed that some entry guards are much more
														
 
															-    failure prone than others. In particular, the circuit failure rates for
														
 
															-    the fastest entry guards was approximately 20-25%, where as slower
														
 
															-    guards exhibit failure rates as high as 45-50%. In [1], it was
														
 
															-    demonstrated that failing guard nodes can deliberately bias path
														
 
															-    selection to improve their success at capturing traffic. For both these
														
 
															-    reasons, failing guards should be avoided. 
														
 
															-    
														
 
															-    We propose increasing the number of entry guards to five, and gathering
														
 
															-    circuit failure statistics on each entry guard. Any guards that exceed
														
 
															-    the average failure rate of all guards by 10% after we have
														
 
															-    gathered ncircuits_to_observe circuits will be replaced.
														
 
															-    
														
 
															+    circuit buildtimes appears to be a Pareto distribution. 
														
 
															-Issues
														
 
															+    We will calculate the parameters for a Pareto distribution
														
 
															+    fitting the data using the estimators at
														
 
															+    http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
														
 
															-  Impact on anonymity
														
 
															+    The timeout itself will be calculated by solving the CDF for the 
														
 
															+    a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
														
 
															+    represents the percentage of paths the Tor client will accept out of
														
 
															+    the total number of paths. We have not yet determined a good
														
 
															+    cutoff for this mathematically, but 85% seems a good choice for now.
														
 
															-    Since this follows a Pareto distribution, large reductions on the
														
 
															-    timeout can be achieved without cutting off a great number of the
														
 
															-    total paths.  However, hard statistics on which cutoff percentage
														
 
															-    gives optimal performance have not yet been gathered.
														
 
															+    From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
														
 
															+    the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm. 
														
 
															+
														
 
															+  When to Begin Calculation
														
 
															+
														
 
															+    The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before 
														
 
															+    changing the CircuitBuildTimeout will be tunable via a #define. From 
														
 
															+    our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be 
														
 
															+    on the order of 100.
														
 
															+
														
 
															+  Dealing with Timeouts
														
 
															+
														
 
															+    Timeouts should be counted as the expectation of the region of 
														
 
															+    of the Pareto distribution beyond the cutoff. The proposal will
														
 
															+    be updated with this value soon.
														
 
															+
														
 
															+    Also, in the event of network failure, the observation mechanism 
														
 
															+    should stop collecting timeout data.
														
 
															-  Guard Turnover
														
 
															+    Circuits that timeout will be destroyed, as this indicates one
														
 
															+    or more of their respective nodes are currently overloaded.
														
 
															-    We contend that the risk from failing guards biasing path selection
														
 
															-    outweighs the risk of exposure to larger portions of the network
														
 
															-    for the first hop. Furthermore, from our observations, it appears
														
 
															-    that circuit failure is strongly correlated to node load. Allowing
														
 
															-    clients to migrate away from failing guards should naturally
														
 
															-    rebalance the network, and eventually clients should converge on
														
 
															-    a stable set of reliable guards. It is also likely that once clients
														
 
															-    begin to migrate away from failing guards, their load should go
														
 
															-    down, causing their failure rates to drop as well.
														
 
															+  Client Hints
														
 
															+    Some research still needs to be done to provide initial values
														
 
															+    for CircuitBuildTimeout based on values learned from modem
														
 
															+    users, DSL users, Cable Modem users, and dedicated links. A 
														
 
															+    radiobutton in Vidalia should eventually be provided that
														
 
															+    sets CircuitBuildTimeout to one of these values and also 
														
 
															+    provide the option of purging all learned data, should any exist.
														
 
															-[1] http://www.crhc.uiuc.edu/~nikita/papers/relmix-ccs07.pdf
														
 
															+    These values can either be published in the directory, or
														
 
															+    shipped hardcoded for a particular Tor version.
														
 
															+    
														
 
															+Issues
														
 
															+
														
 
															+  Impact on anonymity
														
 
															+    Since this follows a Pareto distribution, large reductions on the
														
 
															+    timeout can be achieved without cutting off a great number of the
														
 
															+    total paths. This will eliminate a great deal of the performance
														
 
															+    variation of Tor usage.