| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112 | Filename: 151-path-selection-improvements.txtTitle: Improving Tor Path SelectionVersion:Last-Modified:Author: Fallon Chen, Mike PerryCreated: 5-Jul-2008Status: DraftOverview  The performance of paths selected can be improved by adjusting the  CircuitBuildTimeout and avoiding failing guard nodes. This proposal  describes a method of tracking buildtime statistics at the client, and   using those statistics to adjust the CircuitBuildTimeout.Motivation  Tor's performance can be improved by excluding those circuits that  have long buildtimes (and by extension, high latency). For those Tor  users who require better performance and have lower requirements for  anonymity, this would be a very useful option to have.Implementation  Storing Build Times    Circuit build times will be stored in the circular array    'circuit_build_times' consisting of uint16_t elements as milliseconds.    The total size of this array will be based on the number of circuits    it takes to converge on a good fit of the long term distribution of    the circuit builds for a fixed link. We do not want this value to be    too large, because it will make it difficult for clients to adapt to    moving between different links.    From our initial observations, this value appears to be on the order     of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.   Long Term Storage    The long-term storage representation will be implemented by storing a     histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when     writing out the statistics to disk. The format of this histogram on disk     is yet to be finalized, but it will likely be of the format     'CircuitBuildTime <bin> <count>'.    Example:    CircuitBuildTimeBin 1 100    CircuitBuildTimeBin 2 50    ...    Reading the histogram in will entail multiplying each bin by the     BUILDTIME_BIN_WIDTH and then inserting <count> values into the     circuit_build_times array each with the value of    <bin>*BUILDTIME_BIN_WIDTH.  Learning the CircuitBuildTimeout    Based on studies of build times, we found that the distribution of    circuit buildtimes appears to be a Pareto distribution.     We will calculate the parameters for a Pareto distribution    fitting the data using the estimators at    http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.    The timeout itself will be calculated by solving the CDF for the     a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value    represents the percentage of paths the Tor client will accept out of    the total number of paths. We have not yet determined a good    cutoff for this mathematically, but 85% seems a good choice for now.    From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,    the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm.   When to Begin Calculation    The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before     changing the CircuitBuildTimeout will be tunable via a #define. From     our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be     on the order of 100.  Dealing with Timeouts    Timeouts should be counted as the expectation of the region of     of the Pareto distribution beyond the cutoff. The proposal will    be updated with this value soon.    Also, in the event of network failure, the observation mechanism     should stop collecting timeout data.    Circuits that timeout will be destroyed, as this indicates one    or more of their respective nodes are currently overloaded.  Client Hints    Some research still needs to be done to provide initial values    for CircuitBuildTimeout based on values learned from modem    users, DSL users, Cable Modem users, and dedicated links. A     radiobutton in Vidalia should eventually be provided that    sets CircuitBuildTimeout to one of these values and also     provide the option of purging all learned data, should any exist.    These values can either be published in the directory, or    shipped hardcoded for a particular Tor version.    Issues  Impact on anonymity    Since this follows a Pareto distribution, large reductions on the    timeout can be achieved without cutting off a great number of the    total paths. This will eliminate a great deal of the performance    variation of Tor usage.
 |