151-path-selection-improvements.txt 6.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
  1. Filename: 151-path-selection-improvements.txt
  2. Title: Improving Tor Path Selection
  3. Author: Fallon Chen, Mike Perry
  4. Created: 5-Jul-2008
  5. Status: Finished
  6. In-Spec: path-spec.txt
  7. Overview
  8. The performance of paths selected can be improved by adjusting the
  9. CircuitBuildTimeout and avoiding failing guard nodes. This proposal
  10. describes a method of tracking buildtime statistics at the client, and
  11. using those statistics to adjust the CircuitBuildTimeout.
  12. Motivation
  13. Tor's performance can be improved by excluding those circuits that
  14. have long buildtimes (and by extension, high latency). For those Tor
  15. users who require better performance and have lower requirements for
  16. anonymity, this would be a very useful option to have.
  17. Implementation
  18. Gathering Build Times
  19. Circuit build times are stored in the circular array
  20. 'circuit_build_times' consisting of uint32_t elements as milliseconds.
  21. The total size of this array is based on the number of circuits
  22. it takes to converge on a good fit of the long term distribution of
  23. the circuit builds for a fixed link. We do not want this value to be
  24. too large, because it will make it difficult for clients to adapt to
  25. moving between different links.
  26. From our observations, the minimum value for a reasonable fit appears
  27. to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
  28. a good fit over the long term, we store 5000 most recent circuits in
  29. the array (NCIRCUITS_TO_OBSERVE).
  30. The Tor client will build test circuits at a rate of one per
  31. minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
  32. MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
  33. a CircuitBuildTimeout estimated within 8 hours after install,
  34. upgrade, or network change (see below).
  35. Long Term Storage
  36. The long-term storage representation is implemented by storing a
  37. histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
  38. writing out the statistics to disk. The format this takes in the
  39. state file is 'CircuitBuildTime <bin-ms> <count>', with the total
  40. specified as 'TotalBuildTimes <total>'
  41. Example:
  42. TotalBuildTimes 100
  43. CircuitBuildTimeBin 25 50
  44. CircuitBuildTimeBin 75 25
  45. CircuitBuildTimeBin 125 13
  46. ...
  47. Reading the histogram in will entail inserting <count> values
  48. into the circuit_build_times array each with the value of
  49. <bin-ms> milliseconds. In order to evenly distribute the values
  50. in the circular array, the Fisher-Yates shuffle will be performed
  51. after reading values from the bins.
  52. Learning the CircuitBuildTimeout
  53. Based on studies of build times, we found that the distribution of
  54. circuit buildtimes appears to be a Frechet distribution. However,
  55. estimators and quantile functions of the Frechet distribution are
  56. difficult to work with and slow to converge. So instead, since we
  57. are only interested in the accuracy of the tail, we approximate
  58. the tail of the distribution with a Pareto curve starting at
  59. the mode of the circuit build time sample set.
  60. We will calculate the parameters for a Pareto distribution
  61. fitting the data using the estimators at
  62. http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
  63. The timeout itself is calculated by using the Quartile function (the
  64. inverted CDF) to give us the value on the CDF such that
  65. BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
  66. below the timeout value.
  67. Thus, we expect that the Tor client will accept the fastest 80% of
  68. the total number of paths on the network.
  69. Detecting Changing Network Conditions
  70. We attempt to detect both network connectivity loss and drastic
  71. changes in the timeout characteristics.
  72. We assume that we've had network connectivity loss if 3 circuits
  73. timeout and we've received no cells or TLS handshakes since those
  74. circuits began. We then set the timeout to 60 seconds and stop
  75. counting timeouts.
  76. If 3 more circuits timeout and the network still has not been
  77. live within this new 60 second timeout window, we then discard
  78. the previous timeouts during this period from our history.
  79. To detect changing network conditions, we keep a history of
  80. the timeout or non-timeout status of the past RECENT_CIRCUITS (20)
  81. that successfully completed at least one hop. If more than 75%
  82. of these circuits timeout, we discard all buildtimes history,
  83. reset the timeout to 60, and then begin recomputing the timeout.
  84. Testing
  85. After circuit build times, storage, and learning are implemented,
  86. the resulting histogram should be checked for consistency by
  87. verifying it persists across successive Tor invocations where
  88. no circuits are built. In addition, we can also use the existing
  89. buildtime scripts to record build times, and verify that the histogram
  90. the python produces matches that which is output to the state file in Tor,
  91. and verify that the Pareto parameters and cutoff points also match.
  92. We will also verify that there are no unexpected large deviations from
  93. node selection, such as nodes from distant geographical locations being
  94. completely excluded.
  95. Dealing with Timeouts
  96. Timeouts should be counted as the expectation of the region of
  97. of the Pareto distribution beyond the cutoff. This is done by
  98. generating a random sample for each timeout at points on the
  99. curve beyond the current timeout cutoff.
  100. Future Work
  101. At some point, it may be desirable to change the cutoff from a
  102. single hard cutoff that destroys the circuit to a soft cutoff and
  103. a hard cutoff, where the soft cutoff merely triggers the building
  104. of a new circuit, and the hard cutoff triggers destruction of the
  105. circuit.
  106. It may also be beneficial to learn separate timeouts for each
  107. guard node, as they will have slightly different distributions.
  108. This will take longer to generate initial values though.
  109. Issues
  110. Impact on anonymity
  111. Since this follows a Pareto distribution, large reductions on the
  112. timeout can be achieved without cutting off a great number of the
  113. total paths. This will eliminate a great deal of the performance
  114. variation of Tor usage.