Pārlūkot izejas kodu

Update proposal to bring it more in-line with implementation.

Mike Perry 15 gadi atpakaļ
vecāks
revīzija
fd412549fd
1 mainītis faili ar 40 papildinājumiem un 40 dzēšanām
  1. 40 40
      doc/spec/proposals/151-path-selection-improvements.txt

+ 40 - 40
doc/spec/proposals/151-path-selection-improvements.txt

@@ -2,7 +2,7 @@ Filename: 151-path-selection-improvements.txt
 Title: Improving Tor Path Selection
 Title: Improving Tor Path Selection
 Author: Fallon Chen, Mike Perry
 Author: Fallon Chen, Mike Perry
 Created: 5-Jul-2008
 Created: 5-Jul-2008
-Status: Draft
+Status: Implemented
 
 
 Overview
 Overview
 
 
@@ -22,51 +22,37 @@ Implementation
 
 
   Storing Build Times
   Storing Build Times
 
 
-    Circuit build times will be stored in the circular array
+    Circuit build times are stored in the circular array
-    'circuit_build_times' consisting of uint16_t elements as milliseconds.
+    'circuit_build_times' consisting of uint32_t elements as milliseconds.
-    The total size of this array will be based on the number of circuits
+    The total size of this array is based on the number of circuits
     it takes to converge on a good fit of the long term distribution of
     it takes to converge on a good fit of the long term distribution of
     the circuit builds for a fixed link. We do not want this value to be
     the circuit builds for a fixed link. We do not want this value to be
     too large, because it will make it difficult for clients to adapt to
     too large, because it will make it difficult for clients to adapt to
     moving between different links.
     moving between different links.
 
 
-    From our initial observations, this value appears to be on the order 
+    From our observations, this value appears to be on the order of 1000, 
-    of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
+    but is configurable in a #define NCIRCUITS_TO_OBSERVE.
-    The exact value for this #define will be determined by performing
-    goodness of fit tests using measurments obtained from the shufflebt.py
-    script from TorFlow.
  
  
   Long Term Storage
   Long Term Storage
 
 
-    The long-term storage representation will be implemented by storing a 
+    The long-term storage representation is implemented by storing a 
     histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when 
     histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when 
-    writing out the statistics to disk. The format of this histogram on disk 
+    writing out the statistics to disk. The format this takes in the
-    is yet to be finalized, but it will likely be of the format 
+    state file is 'CircuitBuildTime <bin-ms> <count>', with the total 
-    'CircuitBuildTime <bin> <count>', with the total specified as 
+    specified as 'TotalBuildTimes <total>'
-    'TotalBuildTimes <total>'
     Example:
     Example:
 
 
     TotalBuildTimes 100
     TotalBuildTimes 100
-    CircuitBuildTimeBin 1 50
+    CircuitBuildTimeBin 0 50
-    CircuitBuildTimeBin 2 25
+    CircuitBuildTimeBin 50 25
-    CircuitBuildTimeBin 3 13
+    CircuitBuildTimeBin 100 13
     ...
     ...
 
 
-    Reading the histogram in will entail multiplying each bin by the 
+    Reading the histogram in will entail inserting <count> values
-    BUILDTIME_BIN_WIDTH and then inserting <count> values into the 
+    into the circuit_build_times array each with the value of
-    circuit_build_times array each with the value of
+    <bin-ms> milliseconds. In order to evenly distribute the values
-    <bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the 
+    in the circular array, the Fisher-Yates shuffle will be performed
-    values in the circular array, a form of index skipping must
+    after reading values from the bins.
-    be employed. Values from bin #N with bin count C and total T
-    will occupy indexes specified by N+((T/C)*k)-1, where k is the
-    set of integers ranging from 0 to C-1.
-
-    For example, this would mean that the values from bin 1 would
-    occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on.
-    The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions
-    will be inserted at the first empty position in the array greater 
-    than the selected index (which may requiring looping around the 
-    array back to index 0).
 
 
   Learning the CircuitBuildTimeout
   Learning the CircuitBuildTimeout
 
 
@@ -77,14 +63,28 @@ Implementation
     fitting the data using the estimators at
     fitting the data using the estimators at
     http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
     http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
 
 
-    The timeout itself will be calculated by solving the CDF for the 
+    The timeout itself is calculated by using the Quartile function (the
-    a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
+    inverted CDF) to give us the value on the CDF such that
-    represents the percentage of paths the Tor client will accept out of
+    BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
-    the total number of paths. We have not yet determined a good
+    below the timeout value.
-    cutoff for this mathematically, but 85% seems a good choice for now.
 
 
-    From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
+    Thus, we expect that the Tor client will accept the fastest 80% of 
-    the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm. 
+    the total number of paths on the network.
+
+  Detecting Changing Network Conditions
+
+    We attempt to detect both network connectivty loss and drastic
+    changes in the timeout characteristics. Network connectivity loss
+    is detected by recording a timestamp every time Tor either completes
+    a TLS connection or receives a cell. If this timestamp is more than 
+    90 seconds in the past, circuit timeouts are no longer counted.
+
+    If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past 
+    RECENT_CIRCUITS (20) time out, we assume the network connection
+    has changed, and we discard all buildtimes history and compute
+    a new timeout by estimating a new Pareto curve using the
+    position on the Pareto Quartile function for the ratio of
+    timeouts. 
 
 
   Testing
   Testing
 
 
@@ -104,7 +104,7 @@ Implementation
     of a new circuit, and the hard cutoff triggers destruction of the 
     of a new circuit, and the hard cutoff triggers destruction of the 
     circuit.
     circuit.
 
 
-    Good values for hard and soft cutoffs seem to be 85% and 65% 
+    Good values for hard and soft cutoffs seem to be 80% and 60% 
     respectively, but we should eventually justify this with observation.
     respectively, but we should eventually justify this with observation.
 
 
   When to Begin Calculation
   When to Begin Calculation