| 
					
				 | 
			
			
				@@ -0,0 +1,194 @@ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Filename: xxx-microdescriptors.txt 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Title: Clients download consensus + microdescriptors 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Version: $Revision$ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Last-Modified: $Date$ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Author: Roger Dingledine 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Created: 17-Jan-2009 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Status: Open 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+1. Overview 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  This proposal replaces section 3.2 of proposal 141, called "Fetching 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  descriptors on demand". Rather than modifying the circuit-building 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  protocol to fetch a server descriptor inline at each circuit extend, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  we instead put all of the information that clients need either into 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  the consensus itself, or into a new set of data about each relay 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  called a microdescriptor. The goal is that descriptor elements that 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  are small and frequently changing should go in the consensus itself, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  descriptor elements that are small and relatively static should go in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  the microdescriptor, and if we ever end up with descriptor elements 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  that aren't small yet clients need to know them, we'll need to resume 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  considering some design like the one in proposal 141. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+2. Motivation 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  See 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  http://archives.seul.org/or/dev/Nov-2008/msg00000.html and 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  http://archives.seul.org/or/dev/Nov-2008/msg00007.html 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  for a discussion of the options and why this is currently the best 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  approach. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+3. Design 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  There are three pieces to the proposal. First, authorities will list 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  in their votes (and thus in the consensus) what relay elements are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  included in the microdescriptor, and also list the expected hash of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptor for each relay. Second, directory mirrors will serve 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptors. Third, clients will ask for them and then cache them. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+3.1. Consensus changes 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  V3 votes should include a new line: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    microdescriptor-elements bar baz foo 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  We also need to include the hash of each expected microdescriptor in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  the routerstatus section. I suggest a new "m" line for each stanza, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  with the base64 of the hash of the elements that the authority voted 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  for above. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The consensus microdescriptor-elements and "m" lines are then computed 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  as described in Section 3.1.2 below. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  I believe that means we need a new consensus-method "6" that knows 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  how to compute the microdescriptor-elements and add "m" lines. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+3.1.1. Descriptor elements to include for now 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  To start, the element list that authorities suggest should be 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    family onion-key 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  (Note that the or-dev posts above only mention onion-key, but if 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  we don't also include family then clients will never learn it. It 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  seemed like it should be relatively static, so putting it in the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptor is smarter than trying to fit it into the consensus.) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+3.1.2. Computing consensus for microdescriptor-elements and "m" lines 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  One approach is for the consensus microdescriptor-elements line to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  include all elements listed by a majority of authorities, sorted. The 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  problem here is that it will no longer be deterministic what the correct 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  hash for the "m" line should be. We could imagine telling the authority 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  to go look in its descriptor and produce the right hash itself, but 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  we don't want consensus calculation to be based on external data like 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  that. (Plus, the authority may not have the descriptor that everybody 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  else voted to use.) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The better approach is to take the exact set that has the most votes 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  (breaking ties by the set that has the most elements, and breaking 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  ties after that by whichever is alphabetically first). That will 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  increase the odds that we actually get a microdescriptor hash that 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  is both a) for the descriptor we're putting in the consensus, and b) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  over the elements that we're declaring it should be for. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Then the "m" line for a given relay is the one that gets the most votes 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  from authorities that a) voted for the microdescriptor-elements line 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  we're using, and b) voted for the descriptor we're using. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  (If there's a tie, use the smaller hash. But really, if there are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  multiple such votes and they differ about a microdescriptor, we caught 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  one of them being lying or buggy. We should log it to track down why.) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  If there are no such votes, then we leave out the "m" line for that 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  relay. That means clients should avoid it for this time period. (As 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  an extension it could instead mean that clients should fetch the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  descriptor and figure out its microdescriptor themselves. But let's 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  not get ahead of ourselves.) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  It would be nice to have a more foolproof way to agree on what 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptor hash each authority should vote for, so we can avoid 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  missing "m" lines. Just switching to a new consensus-method each time 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  we change the set of microdescriptor-elements won't help though, since 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  each authority will still have to decide what hash to vote for before 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  knowing what consensus-method will be used. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Here's one way we could do that. Each vote / consensus includes both 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  the microdescriptor-elements that were used to compute the hashes, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  and also a preferred-microdescriptor-elements set. If an authority 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  has a consensus from the previous period, then it should use the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  consensus preferred-microdescriptor-elements when computing its votes 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  for microdescriptor-elements and the appropriate hashes in the upcoming 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  period. (If it has no previous consensus, then it just puts down its 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  own preferences in both lines.) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+3.2. Directory mirrors serve microdescriptors 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Directory mirrors should then read the microdescriptor-elements line 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  from the consensus, and learn how to answer requests. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The microdescriptors with hashes <D1>,<D2>,<D3> should be available at: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  All the microdescriptors from the current consensus should also be 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  available at: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    http://<hostname>/tor/micro/all.z 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  so a client that's bootstrapping doesn't need to send a 70KB URL just 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  to name every microdescriptor it's looking for. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The format of a microdescriptor is the header line 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  "microdescriptor 1" 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  followed by each element (keyword and body), alphabetically. There's 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  no need to mention what hash it is, since you can hash the elements 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  to learn this. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  (Do we need a footer line to show that it's over, or is the next 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptor line or EOF enough of a hint? A footer line wouldn't 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  hurt much. Also, no fair voting for the microdescriptor-element 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  "microdescriptor".) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The hash of the microdescriptor is simply the hash of the concatenated 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  elements -- not counting the header line or hypothetical footer line. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Is this smart? 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Note that I put a "1" up there in the header line. It isn't part 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  of what's hashed, though. Is there a way to put in a version that's 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  more useful? 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Directory mirrors should check to make sure that the microdescriptors 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  they're about to serve match the right hashes (either the hashes from 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  the fetch URL or the hashes from the consensus, respectively). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  We will probably want to consider some sort of smart data structure to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  be able to quickly convert microdescriptor hashes into the appropriate 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptor. Clients will want this anyway when they load their 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptor cache and want to match it up with the consensus to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  see what's missing. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+3.3. Clients fetch them and cache them 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  When a client gets a new consensus, it looks to see if there are any 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptors it needs to learn. If it needs to learn more than 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  some threshold of the microdescriptors (half?), it requests 'all', 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  else it requests only the missing ones. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Clients maintain a cache of microdescriptors along with metadata like 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  when it was last referenced by a consensus. They keep a microdescriptor 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  until it hasn't been mentioned in any consensus for a week. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+3.3.1. Information leaks from clients 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  If a client asks you for a set of microdescs, then you know she didn't 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  have them cached before. How much does that leak? What about when 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  we're all using our entry guards as directory guards, and we've seen 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  that user make a bunch of circuits already? 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Fetching "all" when you need at least half is a good first order fix, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  but might not be all there is to it. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+4. Transition and deployment 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Phase one, the directory authorities should start voting on 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  microdescriptors and microdescriptor elements, and putting them in the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  consensus. This should happen during the 0.2.1.x series, and should 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  be relatively easy to do. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Phase two, directory mirrors should learn how to serve them, and learn 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  how to read the consensus to find out what they should be serving. It 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  would be great if we can squeeze this in during 0.2.1.x also, so once 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  clients start to fetch them there will be many mirrors to choose from. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  (Are there reasonable ways to build only part of phase two in 0.2.1.x?) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Phase three, clients should start fetching and caching them instead 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  of normal descriptors. This should happen post 0.2.1.x. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 |