123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174 |
- Title: Computing Bandwidth Adjustments
- Filename: 161-computing-bandwidth-adjustments.txt
- Author: Mike Perry
- Created: 12-May-2009
- Target: 0.2.2.x
- Status: Open
- 1. Motivation
- There is high variance in the performance of the Tor network. Despite
- our efforts to balance load evenly across the Tor nodes, some nodes are
- significantly slower and more overloaded than others.
- Proposal 160 describes how we can augment the directory authorities to
- vote on measured bandwidths for routers. This proposal describes what
- goes into the measuring process.
- 2. Measurement Selection
- The general idea is to determine a load factor representing the ratio
- of the capacity of measured nodes to the rest of the network. This load
- factor could be computed from three potentially relevant statistics:
- circuit failure rates, circuit extend times, or stream capacity.
- Circuit failure rates and circuit extend times appear to be
- non-linearly proportional to node load. We've observed that the same
- nodes when scanned at US nighttime hours (when load is presumably
- lower) exhibit almost no circuit failure, and significantly faster
- extend times than when scanned during the day.
- Stream capacity, however, is much more uniform, even during US
- nighttime hours. Moreover, it is a more intuitive representation of
- node capacity, and also less dependent upon distance and latency
- if amortized over large stream fetches.
- 3. Average Stream Bandwidth Calculation
- The average stream bandwidths are obtained by dividing the network into
- slices of 50 nodes each, grouped according to advertised node bandwidth.
- Two hop circuits are built using nodes from the same slice, and a large
- file is downloaded via these circuits. The file sizes are set based
- on node percentile rank as follows:
-
- 0-10: 2M
- 10-20: 1M
- 20-30: 512k
- 30-50: 256k
- 50-100: 128k
- These sizes are based on measurements performed during test scans.
- This process is repeated until each node has been chosen to participate
- in at least 5 circuits.
- 4. Ratio Calculation
- The ratios are calculated by dividing each measured value by the
- network-wide average.
- 5. Ratio Filtering
- After the base ratios are calculated, a second pass is performed
- to remove any streams with nodes of ratios less than X=0.5 from
- the results of other nodes. In addition, all outlying streams
- with capacity of one standard deviation below a node's average
- are also removed.
- The final ratio result will be greater of the unfiltered ratio
- and the filtered ratio.
- 6. Pseudocode for Ratio Calculation Algorithm
- Here is the complete pseudocode for the ratio algorithm:
- Slices = {S | S is 50 nodes of similar consensus capacity}
- for S in Slices:
- while exists node N in S with circ_chosen(N) < 7:
- fetch_slice_file(build_2hop_circuit(N, (exit in S)))
- for N in S:
- BW_measured(N) = MEAN(b | b is bandwidth of a stream through N)
- Bw_stddev(N) = STDDEV(b | b is bandwidth of a stream through N)
- Bw_avg(S) = MEAN(b | b = BW_measured(N) for all N in S)
- for N in S:
- Normal_Streams(N) = {stream via N | bandwidth >= BW_measured(N)}
- BW_Norm_measured(N) = MEAN(b | b is a bandwidth of Normal_Streams(N))
- Bw_net_avg(Slices) = MEAN(BW_measured(N) for all N in Slices)
- Bw_Norm_net_avg(Slices) = MEAN(BW_Norm_measured(N) for all N in Slices)
- for N in all Slices:
- Bw_net_ratio(N) = Bw_measured(N)/Bw_net_avg(Slices)
- Bw_Norm_net_ratio(N) = BW_Norm_measured(N)/Bw_Norm_net_avg(Slices)
- ResultRatio(N) = MAX(Bw_net_ratio(N), Bw_Norm_net_ratio(N))
- 7. Security implications
- The ratio filtering will deal with cases of sabotage by dropping
- both very slow outliers in stream average calculations, as well
- as dropping streams that used very slow nodes from the calculation
- of other nodes.
- This scheme will not address nodes that try to game the system by
- providing better service to scanners. The scanners can be detected
- at the entry by IP address, and at the exit by the destination fetch
- IP.
- Measures can be taken to obfuscate and separate the scanners' source
- IP address from the directory authority IP address. For instance,
- scans can happen offsite and the results can be rsynced into the
- authorities. The destination server IP can also change.
-
- Neither of these methods are foolproof, but such nodes can already
- lie about their bandwidth to attract more traffic, so this solution
- does not set us back any in that regard.
- 8. Parallelization
- Because each slice takes as long as 6 hours to complete, we will want
- to parallelize as much as possible. This will be done by concurrently
- running multiple scanners from each authority to deal with different
- segments of the network. Each scanner piece will continually loop
- over a portion of the network, outputting files of the form:
- node_id=<idhex> SP strm_bw=<BW_measured(N)> SP
- filt_bw=<BW_Norm_measured(N)> ns_bw=<CurrentConsensusBw(N)> NL
- The most recent file from each scanner will be periodically gathered
- by another script that uses them to produce network-wide averages
- and calculate ratios as per the algorithm in section 6. Because nodes
- may shift in capacity, they may appear in more than one slice and/or
- appear more than once in the file set. The most recently measured
- line will be chosen in this case.
- 9. Integration with Proposal 160
- The final results will be produced for the voting mechanism
- described in Proposal 160 by multiplying the derived ratio by
- the average published consensus bandwidth during the course of the
- scan, and taking the weighted average with the previous consensus
- bandwidth:
- Bw_new = Round((Bw_current * Alpha + Bw_scan_avg*Bw_ratio)/(Alpha + 1))
- The Alpha parameter is a smoothing parameter intended to prevent
- rapid oscillation between loaded and unloaded conditions. It is
- currently fixed at 0.333.
- The Round() step consists of rounding to the 3 most significant figures
- in base10, and then rounding that result to the nearest 1000, with
- a minimum value of 1000.
- This will produce a new bandwidth value that will be output into a
- file consisting of lines of the form:
- node_id=<idhex> SP bw=<Bw_new> NL
-
- The first line of the file will contain a timestamp in UNIX time()
- seconds. This will be used by the authority to decide if the
- measured values are too old to use.
-
- This file can be either copied or rsynced into a directory readable
- by the directory authority.
|