151-path-selection-improvements.txt 4.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
  1. Filename: 151-path-selection-improvements.txt
  2. Title: Improving Tor Path Selection
  3. Version:
  4. Last-Modified:
  5. Author: Fallon Chen, Mike Perry
  6. Created: 5-Jul-2008
  7. Status: Draft
  8. Overview
  9. The performance of paths selected can be improved by adjusting the
  10. CircuitBuildTimeout and avoiding failing guard nodes. This proposal
  11. describes a method of tracking buildtime statistics at the client, and
  12. using those statistics to adjust the CircuitBuildTimeout.
  13. Motivation
  14. Tor's performance can be improved by excluding those circuits that
  15. have long buildtimes (and by extension, high latency). For those Tor
  16. users who require better performance and have lower requirements for
  17. anonymity, this would be a very useful option to have.
  18. Implementation
  19. Storing Build Times
  20. Circuit build times will be stored in the circular array
  21. 'circuit_build_times' consisting of uint16_t elements as milliseconds.
  22. The total size of this array will be based on the number of circuits
  23. it takes to converge on a good fit of the long term distribution of
  24. the circuit builds for a fixed link. We do not want this value to be
  25. too large, because it will make it difficult for clients to adapt to
  26. moving between different links.
  27. From our initial observations, this value appears to be on the order
  28. of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
  29. Long Term Storage
  30. The long-term storage representation will be implemented by storing a
  31. histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
  32. writing out the statistics to disk. The format of this histogram on disk
  33. is yet to be finalized, but it will likely be of the format
  34. 'CircuitBuildTime <bin> <count>'.
  35. Example:
  36. CircuitBuildTimeBin 1 100
  37. CircuitBuildTimeBin 2 50
  38. ...
  39. Reading the histogram in will entail multiplying each bin by the
  40. BUILDTIME_BIN_WIDTH and then inserting <count> values into the
  41. circuit_build_times array each with the value of
  42. <bin>*BUILDTIME_BIN_WIDTH.
  43. Learning the CircuitBuildTimeout
  44. Based on studies of build times, we found that the distribution of
  45. circuit buildtimes appears to be a Pareto distribution.
  46. We will calculate the parameters for a Pareto distribution
  47. fitting the data using the estimators at
  48. http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
  49. The timeout itself will be calculated by solving the CDF for the
  50. a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
  51. represents the percentage of paths the Tor client will accept out of
  52. the total number of paths. We have not yet determined a good
  53. cutoff for this mathematically, but 85% seems a good choice for now.
  54. From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
  55. the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm.
  56. When to Begin Calculation
  57. The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
  58. changing the CircuitBuildTimeout will be tunable via a #define. From
  59. our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
  60. on the order of 100.
  61. Dealing with Timeouts
  62. Timeouts should be counted as the expectation of the region of
  63. of the Pareto distribution beyond the cutoff. The proposal will
  64. be updated with this value soon.
  65. Also, in the event of network failure, the observation mechanism
  66. should stop collecting timeout data.
  67. Circuits that timeout will be destroyed, as this indicates one
  68. or more of their respective nodes are currently overloaded.
  69. Client Hints
  70. Some research still needs to be done to provide initial values
  71. for CircuitBuildTimeout based on values learned from modem
  72. users, DSL users, Cable Modem users, and dedicated links. A
  73. radiobutton in Vidalia should eventually be provided that
  74. sets CircuitBuildTimeout to one of these values and also
  75. provide the option of purging all learned data, should any exist.
  76. These values can either be published in the directory, or
  77. shipped hardcoded for a particular Tor version.
  78. Issues
  79. Impact on anonymity
  80. Since this follows a Pareto distribution, large reductions on the
  81. timeout can be achieved without cutting off a great number of the
  82. total paths. This will eliminate a great deal of the performance
  83. variation of Tor usage.