161-computing-bandwidth-adjustments.txt 6.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178
  1. Title: Computing Bandwidth Adjustments
  2. Filename: 161-computing-bandwidth-adjustments.txt
  3. Author: Mike Perry
  4. Created: 12-May-2009
  5. Target: 0.2.2.x
  6. Status: Open
  7. 1. Motivation
  8. There is high variance in the performance of the Tor network. Despite
  9. our efforts to balance load evenly across the Tor nodes, some nodes are
  10. significantly slower and more overloaded than others.
  11. Proposal 160 describes how we can augment the directory authorities to
  12. vote on measured bandwidths for routers. This proposal describes what
  13. goes into the measuring process.
  14. 2. Measurement Selection
  15. The general idea is to determine a load factor representing the ratio
  16. of the capacity of measured nodes to the rest of the network. This load
  17. factor could be computed from three potentially relevant statistics:
  18. circuit failure rates, circuit extend times, or stream capacity.
  19. Circuit failure rates and circuit extend times appear to be
  20. non-linearly proportional to node load. We've observed that the same
  21. nodes when scanned at US nighttime hours (when load is presumably
  22. lower) exhibit almost no circuit failure, and significantly faster
  23. extend times than when scanned during the day.
  24. Stream capacity, however, is much more uniform, even during US
  25. nighttime hours. Moreover, it is a more intuitive representation of
  26. node capacity, and also less dependent upon distance and latency
  27. if amortized over large stream fetches.
  28. 3. Average Stream Bandwidth Calculation
  29. The average stream bandwidths are obtained by dividing the network into
  30. slices of 50 nodes each, grouped according to advertised node bandwidth.
  31. Two hop circuits are built using nodes from the same slice, and a large
  32. file is downloaded via these circuits. The file sizes are set based
  33. on node percentile rank as follows:
  34. 0-10: 4M
  35. 10-20: 2M
  36. 20-30: 1M
  37. 30-50: 512k
  38. 50-75: 256k
  39. 75-100: 128k
  40. These sizes are based on measurements performed during test scans.
  41. This process is repeated until each node has been chosen to participate
  42. in at least 5 circuits.
  43. 4. Ratio Calculation
  44. The ratios are calculated by dividing each measured value by the
  45. network-wide average.
  46. 5. Ratio Filtering
  47. After the base ratios are calculated, a second pass is performed
  48. to remove any streams with nodes of ratios less than X=0.5 from
  49. the results of other nodes. In addition, all outlying streams
  50. with capacity of one standard deviation below a node's average
  51. are also removed.
  52. The final ratio result will be the unfiltered ratio if it is
  53. close to 1.0, otherwise it will be the filtered ratio.
  54. 6. Pseudocode for Ratio Calculation Algorithm
  55. Here is the complete pseudocode for the ratio algorithm:
  56. Slices = {S | S is 50 nodes of similar consensus capacity}
  57. for S in Slices:
  58. while exists node N in S with circ_chosen(N) < 7:
  59. fetch_slice_file(build_2hop_circuit(N, (exit in S)))
  60. for N in S:
  61. BW_measured(N) = MEAN(b | b is bandwidth of a stream through N)
  62. Bw_stddev(N) = STDDEV(b | b is bandwidth of a stream through N)
  63. Bw_avg(S) = MEAN(b | b = BW_measured(N) for all N in S)
  64. Normal_Routers(S) = {N | Bw_measured(N)/Bw_avg(S) > 0.5 }
  65. for N in S:
  66. Normal_Streams(N) =
  67. {stream via N | all nodes in stream not in {Normal_Routers(S)-N}
  68. and bandwidth > BW_measured(N)-Bw_stddev(N)}
  69. BW_Norm_measured(N) = MEAN(b | b is a bandwidth of Normal_Streams(N))
  70. Bw_net_avg(Slices) = MEAN(BW_measured(N) for all N in Slices)
  71. Bw_Norm_net_avg(Slices) = MEAN(BW_Norm_measured(N) for all N in Slices)
  72. for N in all Slices:
  73. Bw_net_ratio(N) = Bw_measured(N)/Bw_net_avg(Slices)
  74. Bw_Norm_net_ratio(N) = Bw_measured2(N)/Bw_Norm_net_avg(Slices)
  75. ResultRatio(N) = ClosestToOne(Bw_net_ratio(N), Bw_Norm_net_ratio(N))
  76. 7. Security implications
  77. The ratio filtering will deal with cases of sabotage by dropping
  78. both very slow outliers in stream average calculations, as well
  79. as dropping streams that used very slow nodes from the calculation
  80. of other nodes.
  81. This scheme will not address nodes that try to game the system by
  82. providing better service to scanners. The scanners can be detected
  83. at the entry by IP address, and at the exit by the destination fetch
  84. IP.
  85. Measures can be taken to obfuscate and separate the scanners' source
  86. IP address from the directory authority IP address. For instance,
  87. scans can happen offsite and the results can be rsynced into the
  88. authorities. The destination server IP can also change.
  89. Neither of these methods are foolproof, but such nodes can already
  90. lie about their bandwidth to attract more traffic, so this solution
  91. does not set us back any in that regard.
  92. 8. Parallelization
  93. Because each slice takes as long as 6 hours to complete, we will want
  94. to parallelize as much as possible. This will be done by concurrently
  95. running multiple scanners from each authority to deal with different
  96. segments of the network. Each scanner piece will continually loop
  97. over a portion of the network, outputting files of the form:
  98. node_id=<idhex> SP strm_bw=<BW_measured(N)> SP
  99. filt_bw=<BW_Norm_measured(N)> ns_bw=<CurrentConsensusBw(N)> NL
  100. The most recent file from each scanner will be periodically gathered
  101. by another script that uses them to produce network-wide averages
  102. and calculate ratios as per the algorithm in section 6. Because nodes
  103. may shift in capacity, they may appear in more than one slice and/or
  104. appear more than once in the file set. The line that yields a ratio
  105. closest to 1.0 will be chosen in this case.
  106. 9. Integration with Proposal 160
  107. The final results will be produced for the voting mechanism
  108. described in Proposal 160 by multiplying the derived ratio by
  109. the average published consensus bandwidth during the course of the
  110. scan, and taking the weighted average with the previous consensus
  111. bandwidth:
  112. Bw_new = Round((Bw_current * Alpha + Bw_scan_avg*Bw_ratio)/(Alpha + 1))
  113. The Alpha parameter is a smoothing parameter intended to prevent
  114. rapid oscillation between loaded and unloaded conditions. It is
  115. currently fixed at 0.333.
  116. The Round() step consists of rounding to the 3 most significant figures
  117. in base10, and then rounding that result to the nearest 1000, with
  118. a minimum value of 1000.
  119. This will produce a new bandwidth value that will be output into a
  120. file consisting of lines of the form:
  121. node_id=<idhex> SP bw=<Bw_new> NL
  122. The first line of the file will contain a timestamp in UNIX time()
  123. seconds. This will be used by the authority to decide if the
  124. measured values are too old to use.
  125. This file can be either copied or rsynced into a directory readable
  126. by the directory authority.