| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187 | Filename: 158-microdescriptors.txtTitle: Clients download consensus + microdescriptorsAuthor: Roger DingledineCreated: 17-Jan-2009Status: Open0. History  15 May 2009: Substantially revised based on discussions on or-dev  from late January.  Removed the notion of voting on how to choose  microdescriptors; made it just a function of the consensus method.  (This lets us avoid the possibility of "desynchronization.")  Added suggestion to use a new consensus flavor.  Specified use of  SHA256 for new hashes. -nickm1. Overview  This proposal replaces section 3.2 of proposal 141, which was  called "Fetching descriptors on demand". Rather than modifying the  circuit-building protocol to fetch a server descriptor inline at each  circuit extend, we instead put all of the information that clients need  either into the consensus itself, or into a new set of data about each  relay called a microdescriptor. The microdescriptor is a direct  transform from the relay descriptor, so relays don't even need to know  this is happening.  Descriptor elements that are small and frequently changing should go  in the consensus itself, and descriptor elements that are small and  relatively static should go in the microdescriptor. If we ever end up  with descriptor elements that aren't small yet clients need to know  them, we'll need to resume considering some design like the one in  proposal 141.  Note also that any descriptor element which clients need to use to  decide which servers to fetch info about, or which servers to fetch  info from, needs to stay in the consensus.2. Motivation  See  http://archives.seul.org/or/dev/Nov-2008/msg00000.html and  http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially  http://archives.seul.org/or/dev/Nov-2008/msg00007.html  for a discussion of the options and why this is currently the best  approach.3. Design  There are three pieces to the proposal. First, authorities will list in  their votes (and thus in the consensus) the expected hash  of microdescriptor for each relay. Second, directory mirrors will serve  microdescriptors. Third, clients will ask for them and cache them.3.1. Consensus changes  If the authorities choose a consensus method of a given version or  later, a microdescriptor format is implicit in that version.  A microdescriptor should in every case be a pure function of the  router descriptor and the conensus method.  In votes, we need to include the hash of each expected microdescriptor  in the routerstatus section. I suggest a new "m" line for each stanza,  with the base64 of the SHA256 hash of the router's microdescriptor.  For every consensus method that an authority supports, it includes a  separate "m" line in each router section of its vote, containing:    "m" SP methods SP digest NL  where methods is a comma-separated list of the consensus methods  that the authority believes will produce "digest".  (As with base64 encoding of SHA1 hashes in consensuses, let's  omit the trailing =s)  The consensus microdescriptor-elements and "m" lines are then computed  as described in Section 3.1.2 below.  (This means we need a new consensus-method that knows  how to compute the microdescriptor-elements and add "m" lines.)3.1.1. Descriptor elements to include for now  In the first version, the microdescriptor should contain the  onion-key element and the family element from the router descriptor.3.1.2. Computing consensus for microdescriptor-elements and "m" lines  When we are generating a consensus, we use whichever m line  unambiguously corresponds to the descriptor digest that will be  included in the consensus.  (If there are multiple m lines for that  descriptor digest, we use whichever is most common.  If they are  equally common, we break ties in the favor of the lexically  earliest.  Either way, we should log a warning: That's likely a  bug.)  The "m" lines in a consensus contain only the digest, not a list of  consensus methods.3.1.3. A new flavor of consensus  Rather than inserting "m" lines in the current consensus format,  they should be included in a new consensus flavor (see proposal  162).  This flavor can safely omit descriptor digests.  We still need to decide whether to move ports into microdescriptors  or not.  In either case, they can be removed from the current "ns"  flavor of consensus, since no current clients use them, and they  take up about 5% of the compressed consensus.  This new consensus flavor should be signed with the sha256 signature  format as documented in proposal 162.3.2. Directory mirrors fetch, cache, and serve microdescriptors  Directory mirrors should then read the microdescriptor-elements line  from the consensus, and learn how to answer requests. (Directory mirrors  continue to serve normal relay descriptors too, a) to serve old clients  and b) to be able to construct microdescriptors on the fly.)  The microdescriptors with base64 hashes <D1>,<D2>,<D3> should be available at:    http://<hostname>/tor/micro/d/<D1>-<D2>-<D3>.z  (We use base64 for size and for consistency with the consensus  format. We use -s instead of +s to separate these items, since  ... since...?  All the microdescriptors from the current consensus should also be  available at:    http://<hostname>/tor/micro/all.z  so a client that's bootstrapping doesn't need to send a 70KB URL just  to name every microdescriptor it's looking for.  Microdescriptors have no header or footer.  The hash of the microdescriptor is simply the hash of the concatenated  elements.  Directory mirrors should check to make sure that the microdescriptors  they're about to serve match the right hashes (either the hashes from  the fetch URL or the hashes from the consensus, respectively).  We will probably want to consider some sort of smart data structure to  be able to quickly convert microdescriptor hashes into the appropriate  microdescriptor. Clients will want this anyway when they load their  microdescriptor cache and want to match it up with the consensus to  see what's missing.3.3. Clients fetch them and cache them  When a client gets a new consensus, it looks to see if there are any  microdescriptors it needs to learn. If it needs to learn more than  some threshold of the microdescriptors (half?), it requests 'all',  else it requests only the missing ones.  Clients MAY try to  determine whether the upload bandwidth for listing the  microdescriptors they want is more or less than the download  bandwidth for the microdescriptors they do not want.  Clients maintain a cache of microdescriptors along with metadata like  when it was last referenced by a consensus, and which identity key  it corresponds to.  They keep a microdescriptor  until it hasn't been mentioned in any consensus for a week. Future  clients might cache them for longer or shorter times.3.3.1. Information leaks from clients  If a client asks you for a set of microdescs, then you know she didn't  have them cached before. How much does that leak? What about when  we're all using our entry guards as directory guards, and we've seen  that user make a bunch of circuits already?  Fetching "all" when you need at least half is a good first order fix,  but might not be all there is to it.  Another future option would be to fetch some of the microdescriptors  anonymously (via a Tor circuit).4. Transition and deployment  Phase one, the directory authorities should start voting on  microdescriptors and microdescriptor elements, and putting them in the  consensus.  Phase two, directory mirrors should learn how to serve them, and learn  how to read the consensus to find out what they should be serving.  Phase three, clients should start fetching and caching them instead  of normal descriptors.
 |