123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148 |
- Filename: 151-path-selection-improvements.txt
- Title: Improving Tor Path Selection
- Author: Fallon Chen, Mike Perry
- Created: 5-Jul-2008
- Status: Finished
- In-Spec: path-spec.txt
- Overview
- The performance of paths selected can be improved by adjusting the
- CircuitBuildTimeout and avoiding failing guard nodes. This proposal
- describes a method of tracking buildtime statistics at the client, and
- using those statistics to adjust the CircuitBuildTimeout.
- Motivation
- Tor's performance can be improved by excluding those circuits that
- have long buildtimes (and by extension, high latency). For those Tor
- users who require better performance and have lower requirements for
- anonymity, this would be a very useful option to have.
- Implementation
- Gathering Build Times
- Circuit build times are stored in the circular array
- 'circuit_build_times' consisting of uint32_t elements as milliseconds.
- The total size of this array is based on the number of circuits
- it takes to converge on a good fit of the long term distribution of
- the circuit builds for a fixed link. We do not want this value to be
- too large, because it will make it difficult for clients to adapt to
- moving between different links.
- From our observations, the minimum value for a reasonable fit appears
- to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
- a good fit over the long term, we store 5000 most recent circuits in
- the array (NCIRCUITS_TO_OBSERVE).
- The Tor client will build test circuits at a rate of one per
- minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
- MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
- a CircuitBuildTimeout estimated within 8 hours after install,
- upgrade, or network change (see below).
- Long Term Storage
- The long-term storage representation is implemented by storing a
- histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
- writing out the statistics to disk. The format this takes in the
- state file is 'CircuitBuildTime <bin-ms> <count>', with the total
- specified as 'TotalBuildTimes <total>'
- Example:
- TotalBuildTimes 100
- CircuitBuildTimeBin 25 50
- CircuitBuildTimeBin 75 25
- CircuitBuildTimeBin 125 13
- ...
- Reading the histogram in will entail inserting <count> values
- into the circuit_build_times array each with the value of
- <bin-ms> milliseconds. In order to evenly distribute the values
- in the circular array, the Fisher-Yates shuffle will be performed
- after reading values from the bins.
- Learning the CircuitBuildTimeout
- Based on studies of build times, we found that the distribution of
- circuit buildtimes appears to be a Frechet distribution. However,
- estimators and quantile functions of the Frechet distribution are
- difficult to work with and slow to converge. So instead, since we
- are only interested in the accuracy of the tail, we approximate
- the tail of the distribution with a Pareto curve starting at
- the mode of the circuit build time sample set.
- We will calculate the parameters for a Pareto distribution
- fitting the data using the estimators at
- http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
- The timeout itself is calculated by using the Quartile function (the
- inverted CDF) to give us the value on the CDF such that
- BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
- below the timeout value.
- Thus, we expect that the Tor client will accept the fastest 80% of
- the total number of paths on the network.
- Detecting Changing Network Conditions
- We attempt to detect both network connectivity loss and drastic
- changes in the timeout characteristics.
- We assume that we've had network connectivity loss if 3 circuits
- timeout and we've received no cells or TLS handshakes since those
- circuits began. We then set the timeout to 60 seconds and stop
- counting timeouts.
- If 3 more circuits timeout and the network still has not been
- live within this new 60 second timeout window, we then discard
- the previous timeouts during this period from our history.
- To detect changing network conditions, we keep a history of
- the timeout or non-timeout status of the past RECENT_CIRCUITS (20)
- that successfully completed at least one hop. If more than 75%
- of these circuits timeout, we discard all buildtimes history,
- reset the timeout to 60, and then begin recomputing the timeout.
- Testing
- After circuit build times, storage, and learning are implemented,
- the resulting histogram should be checked for consistency by
- verifying it persists across successive Tor invocations where
- no circuits are built. In addition, we can also use the existing
- buildtime scripts to record build times, and verify that the histogram
- the python produces matches that which is output to the state file in Tor,
- and verify that the Pareto parameters and cutoff points also match.
- We will also verify that there are no unexpected large deviations from
- node selection, such as nodes from distant geographical locations being
- completely excluded.
- Dealing with Timeouts
- Timeouts should be counted as the expectation of the region of
- of the Pareto distribution beyond the cutoff. This is done by
- generating a random sample for each timeout at points on the
- curve beyond the current timeout cutoff.
- Future Work
- At some point, it may be desirable to change the cutoff from a
- single hard cutoff that destroys the circuit to a soft cutoff and
- a hard cutoff, where the soft cutoff merely triggers the building
- of a new circuit, and the hard cutoff triggers destruction of the
- circuit.
- It may also be beneficial to learn separate timeouts for each
- guard node, as they will have slightly different distributions.
- This will take longer to generate initial values though.
- Issues
- Impact on anonymity
- Since this follows a Pareto distribution, large reductions on the
- timeout can be achieved without cutting off a great number of the
- total paths. This will eliminate a great deal of the performance
- variation of Tor usage.
|