|  | @@ -34,6 +34,9 @@ Implementation
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |      From our initial observations, this value appears to be on the order 
 | 
	
		
			
				|  |  |      of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
 | 
	
		
			
				|  |  | +    The exact value for this #define will be determined by performing
 | 
	
		
			
				|  |  | +    goodness of fit tests using measurments obtained from the shufflebt.py
 | 
	
		
			
				|  |  | +    script from TorFlow.
 | 
	
		
			
				|  |  |   
 | 
	
		
			
				|  |  |    Long Term Storage
 | 
	
		
			
				|  |  |  
 | 
	
	
		
			
				|  | @@ -41,17 +44,31 @@ Implementation
 | 
	
		
			
				|  |  |      histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when 
 | 
	
		
			
				|  |  |      writing out the statistics to disk. The format of this histogram on disk 
 | 
	
		
			
				|  |  |      is yet to be finalized, but it will likely be of the format 
 | 
	
		
			
				|  |  | -    'CircuitBuildTime <bin> <count>'.
 | 
	
		
			
				|  |  | +    'CircuitBuildTime <bin> <count>', with the total specified as 
 | 
	
		
			
				|  |  | +    'TotalBuildTimes <total>'
 | 
	
		
			
				|  |  |      Example:
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -    CircuitBuildTimeBin 1 100
 | 
	
		
			
				|  |  | -    CircuitBuildTimeBin 2 50
 | 
	
		
			
				|  |  | +    TotalBuildTimes 100
 | 
	
		
			
				|  |  | +    CircuitBuildTimeBin 1 50
 | 
	
		
			
				|  |  | +    CircuitBuildTimeBin 2 25
 | 
	
		
			
				|  |  | +    CircuitBuildTimeBin 3 13
 | 
	
		
			
				|  |  |      ...
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |      Reading the histogram in will entail multiplying each bin by the 
 | 
	
		
			
				|  |  |      BUILDTIME_BIN_WIDTH and then inserting <count> values into the 
 | 
	
		
			
				|  |  |      circuit_build_times array each with the value of
 | 
	
		
			
				|  |  | -    <bin>*BUILDTIME_BIN_WIDTH.
 | 
	
		
			
				|  |  | +    <bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the 
 | 
	
		
			
				|  |  | +    values in the circular array, a form of index skipping must
 | 
	
		
			
				|  |  | +    be employed. Values from bin #N with bin count C and total T
 | 
	
		
			
				|  |  | +    will occupy indexes specified by N+((T/C)*k)-1, where k is the
 | 
	
		
			
				|  |  | +    set of integers ranging from 0 to C-1.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +    For example, this would mean that the values from bin 1 would
 | 
	
		
			
				|  |  | +    occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on.
 | 
	
		
			
				|  |  | +    The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions
 | 
	
		
			
				|  |  | +    will be inserted at the first empty position in the array greater 
 | 
	
		
			
				|  |  | +    than the selected index (which may requiring looping around the 
 | 
	
		
			
				|  |  | +    array back to index 0).
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |    Learning the CircuitBuildTimeout
 | 
	
		
			
				|  |  |  
 | 
	
	
		
			
				|  | @@ -71,6 +88,27 @@ Implementation
 | 
	
		
			
				|  |  |      From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
 | 
	
		
			
				|  |  |      the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm. 
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | +  Testing
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +    After circuit build times, storage, and learning are implemented,
 | 
	
		
			
				|  |  | +    the resulting histogram should be checked for consistency by
 | 
	
		
			
				|  |  | +    verifying it persists across successive Tor invocations where 
 | 
	
		
			
				|  |  | +    no circuits are built. In addition, we can also use the existing
 | 
	
		
			
				|  |  | +    buildtime scripts to record build times, and verify that the histogram 
 | 
	
		
			
				|  |  | +    the python produces matches that which is output to the state file in Tor,
 | 
	
		
			
				|  |  | +    and verify that the Pareto parameters and cutoff points also match.
 | 
	
		
			
				|  |  | +  
 | 
	
		
			
				|  |  | +  Soft timeout vs Hard Timeout
 | 
	
		
			
				|  |  | +   
 | 
	
		
			
				|  |  | +    At some point, it may be desirable to change the cutoff from a 
 | 
	
		
			
				|  |  | +    single hard cutoff that destroys the circuit to a soft cutoff and
 | 
	
		
			
				|  |  | +    a hard cutoff, where the soft cutoff merely triggers the building
 | 
	
		
			
				|  |  | +    of a new circuit, and the hard cutoff triggers destruction of the 
 | 
	
		
			
				|  |  | +    circuit.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +    Good values for hard and soft cutoffs seem to be 85% and 65% 
 | 
	
		
			
				|  |  | +    respectively, but we should eventually justify this with observation.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  |    When to Begin Calculation
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |      The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before 
 | 
	
	
		
			
				|  | @@ -87,9 +125,6 @@ Implementation
 | 
	
		
			
				|  |  |      Also, in the event of network failure, the observation mechanism 
 | 
	
		
			
				|  |  |      should stop collecting timeout data.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -    Circuits that timeout will be destroyed, as this indicates one
 | 
	
		
			
				|  |  | -    or more of their respective nodes are currently overloaded.
 | 
	
		
			
				|  |  | -
 | 
	
		
			
				|  |  |    Client Hints
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |      Some research still needs to be done to provide initial values
 |