Filename: 151-path-selection-improvements.txt Title: Improving Tor Path Selection Author: Fallon Chen, Mike Perry Created: 5-Jul-2008 Status: Finished In-Spec: path-spec.txt Overview The performance of paths selected can be improved by adjusting the CircuitBuildTimeout and avoiding failing guard nodes. This proposal describes a method of tracking buildtime statistics at the client, and using those statistics to adjust the CircuitBuildTimeout. Motivation Tor's performance can be improved by excluding those circuits that have long buildtimes (and by extension, high latency). For those Tor users who require better performance and have lower requirements for anonymity, this would be a very useful option to have. Implementation Gathering Build Times Circuit build times are stored in the circular array 'circuit_build_times' consisting of uint32_t elements as milliseconds. The total size of this array is based on the number of circuits it takes to converge on a good fit of the long term distribution of the circuit builds for a fixed link. We do not want this value to be too large, because it will make it difficult for clients to adapt to moving between different links. From our observations, the minimum value for a reasonable fit appears to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep a good fit over the long term, we store 5000 most recent circuits in the array (NCIRCUITS_TO_OBSERVE). The Tor client will build test circuits at a rate of one per minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have a CircuitBuildTimeout estimated within 8 hours after install, upgrade, or network change (see below). Long Term Storage The long-term storage representation is implemented by storing a histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when writing out the statistics to disk. The format this takes in the state file is 'CircuitBuildTime ', with the total specified as 'TotalBuildTimes ' Example: TotalBuildTimes 100 CircuitBuildTimeBin 25 50 CircuitBuildTimeBin 75 25 CircuitBuildTimeBin 125 13 ... Reading the histogram in will entail inserting values into the circuit_build_times array each with the value of milliseconds. In order to evenly distribute the values in the circular array, the Fisher-Yates shuffle will be performed after reading values from the bins. Learning the CircuitBuildTimeout Based on studies of build times, we found that the distribution of circuit buildtimes appears to be a Frechet distribution. However, estimators and quantile functions of the Frechet distribution are difficult to work with and slow to converge. So instead, since we are only interested in the accuracy of the tail, we approximate the tail of the distribution with a Pareto curve starting at the mode of the circuit build time sample set. We will calculate the parameters for a Pareto distribution fitting the data using the estimators at http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation. The timeout itself is calculated by using the Quartile function (the inverted CDF) to give us the value on the CDF such that BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is below the timeout value. Thus, we expect that the Tor client will accept the fastest 80% of the total number of paths on the network. Detecting Changing Network Conditions We attempt to detect both network connectivity loss and drastic changes in the timeout characteristics. We assume that we've had network connectivity loss if 3 circuits timeout and we've received no cells or TLS handshakes since those circuits began. We then set the timeout to 60 seconds and stop counting timeouts. If 3 more circuits timeout and the network still has not been live within this new 60 second timeout window, we then discard the previous timeouts during this period from our history. To detect changing network conditions, we keep a history of the timeout or non-timeout status of the past RECENT_CIRCUITS (20) that successfully completed at least one hop. If more than 75% of these circuits timeout, we discard all buildtimes history, reset the timeout to 60, and then begin recomputing the timeout. Testing After circuit build times, storage, and learning are implemented, the resulting histogram should be checked for consistency by verifying it persists across successive Tor invocations where no circuits are built. In addition, we can also use the existing buildtime scripts to record build times, and verify that the histogram the python produces matches that which is output to the state file in Tor, and verify that the Pareto parameters and cutoff points also match. We will also verify that there are no unexpected large deviations from node selection, such as nodes from distant geographical locations being completely excluded. Dealing with Timeouts Timeouts should be counted as the expectation of the region of of the Pareto distribution beyond the cutoff. This is done by generating a random sample for each timeout at points on the curve beyond the current timeout cutoff. Future Work At some point, it may be desirable to change the cutoff from a single hard cutoff that destroys the circuit to a soft cutoff and a hard cutoff, where the soft cutoff merely triggers the building of a new circuit, and the hard cutoff triggers destruction of the circuit. It may also be beneficial to learn separate timeouts for each guard node, as they will have slightly different distributions. This will take longer to generate initial values though. Issues Impact on anonymity Since this follows a Pareto distribution, large reductions on the timeout can be achieved without cutting off a great number of the total paths. This will eliminate a great deal of the performance variation of Tor usage.