123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112 |
- Filename: 151-path-selection-improvements.txt
- Title: Improving Tor Path Selection
- Version:
- Last-Modified:
- Author: Fallon Chen, Mike Perry
- Created: 5-Jul-2008
- Status: Draft
- Overview
- The performance of paths selected can be improved by adjusting the
- CircuitBuildTimeout and avoiding failing guard nodes. This proposal
- describes a method of tracking buildtime statistics at the client, and
- using those statistics to adjust the CircuitBuildTimeout.
- Motivation
- Tor's performance can be improved by excluding those circuits that
- have long buildtimes (and by extension, high latency). For those Tor
- users who require better performance and have lower requirements for
- anonymity, this would be a very useful option to have.
- Implementation
- Storing Build Times
- Circuit build times will be stored in the circular array
- 'circuit_build_times' consisting of uint16_t elements as milliseconds.
- The total size of this array will be based on the number of circuits
- it takes to converge on a good fit of the long term distribution of
- the circuit builds for a fixed link. We do not want this value to be
- too large, because it will make it difficult for clients to adapt to
- moving between different links.
- From our initial observations, this value appears to be on the order
- of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
-
- Long Term Storage
- The long-term storage representation will be implemented by storing a
- histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
- writing out the statistics to disk. The format of this histogram on disk
- is yet to be finalized, but it will likely be of the format
- 'CircuitBuildTime <bin> <count>'.
- Example:
- CircuitBuildTimeBin 1 100
- CircuitBuildTimeBin 2 50
- ...
- Reading the histogram in will entail multiplying each bin by the
- BUILDTIME_BIN_WIDTH and then inserting <count> values into the
- circuit_build_times array each with the value of
- <bin>*BUILDTIME_BIN_WIDTH.
- Learning the CircuitBuildTimeout
- Based on studies of build times, we found that the distribution of
- circuit buildtimes appears to be a Pareto distribution.
- We will calculate the parameters for a Pareto distribution
- fitting the data using the estimators at
- http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
- The timeout itself will be calculated by solving the CDF for the
- a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
- represents the percentage of paths the Tor client will accept out of
- the total number of paths. We have not yet determined a good
- cutoff for this mathematically, but 85% seems a good choice for now.
- From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
- the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm.
- When to Begin Calculation
- The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
- changing the CircuitBuildTimeout will be tunable via a #define. From
- our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
- on the order of 100.
- Dealing with Timeouts
- Timeouts should be counted as the expectation of the region of
- of the Pareto distribution beyond the cutoff. The proposal will
- be updated with this value soon.
- Also, in the event of network failure, the observation mechanism
- should stop collecting timeout data.
- Circuits that timeout will be destroyed, as this indicates one
- or more of their respective nodes are currently overloaded.
- Client Hints
- Some research still needs to be done to provide initial values
- for CircuitBuildTimeout based on values learned from modem
- users, DSL users, Cable Modem users, and dedicated links. A
- radiobutton in Vidalia should eventually be provided that
- sets CircuitBuildTimeout to one of these values and also
- provide the option of purging all learned data, should any exist.
- These values can either be published in the directory, or
- shipped hardcoded for a particular Tor version.
-
- Issues
- Impact on anonymity
- Since this follows a Pareto distribution, large reductions on the
- timeout can be achieved without cutting off a great number of the
- total paths. This will eliminate a great deal of the performance
- variation of Tor usage.
|