Filename: 158-microdescriptors.txt Title: Clients download consensus + microdescriptors Author: Roger Dingledine Created: 17-Jan-2009 Status: Open 0. History 15 May 2009: Substantially revised based on discussions on or-dev from late January. Removed the notion of voting on how to choose microdescriptors; made it just a function of the consensus method. (This lets us avoid the possibility of "desynchronization.") Added suggestion to use a new consensus flavor. Specified use of SHA256 for new hashes. -nickm 15 June 2009: Cleaned up based on comments from Roger. -nickm 1. Overview This proposal replaces section 3.2 of proposal 141, which was called "Fetching descriptors on demand". Rather than modifying the circuit-building protocol to fetch a server descriptor inline at each circuit extend, we instead put all of the information that clients need either into the consensus itself, or into a new set of data about each relay called a microdescriptor. Descriptor elements that are small and frequently changing should go in the consensus itself, and descriptor elements that are small and relatively static should go in the microdescriptor. If we ever end up with descriptor elements that aren't small yet clients need to know them, we'll need to resume considering some design like the one in proposal 141. Note also that any descriptor element which clients need to use to decide which servers to fetch info about, or which servers to fetch info from, needs to stay in the consensus. 2. Motivation See http://archives.seul.org/or/dev/Nov-2008/msg00000.html and http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially http://archives.seul.org/or/dev/Nov-2008/msg00007.html for a discussion of the options and why this is currently the best approach. 3. Design There are three pieces to the proposal. First, authorities will list in their votes (and thus in the consensus) the expected hash of microdescriptor for each relay. Second, authorities will serve microdescriptors, directory mirrors will cache and serve them. Third, clients will ask for them and cache them. 3.1. Consensus changes If the authorities choose a consensus method of a given version or later, a microdescriptor format is implicit in that version. A microdescriptor should in every case be a pure function of the router descriptor and the consensus method. In votes, we need to include the hash of each expected microdescriptor in the routerstatus section. I suggest a new "m" line for each stanza, with the base64 of the SHA256 hash of the router's microdescriptor. For every consensus method that an authority supports, it includes a separate "m" line in each router section of its vote, containing: "m" SP methods 1*(SP AlgorithmName "=" digest) NL where methods is a comma-separated list of the consensus methods that the authority believes will produce "digest". (As with base64 encoding of SHA1 hashes in consensuses, let's omit the trailing =s) The consensus microdescriptor-elements and "m" lines are then computed as described in Section 3.1.2 below. (This means we need a new consensus-method that knows how to compute the microdescriptor-elements and add "m" lines.) The microdescriptor consensus uses the directory-signature format from proposal 162, with the "sha256" algorithm. 3.1.1. Descriptor elements to include for now In the first version, the microdescriptor should contain the onion-key element, and the family element from the router descriptor, and the exit policy summary as currently specified in dir-spec.txt. 3.1.2. Computing consensus for microdescriptor-elements and "m" lines When we are generating a consensus, we use whichever m line unambiguously corresponds to the descriptor digest that will be included in the consensus. (If different votes have different microdescriptor digests for a single pair, then at least one of the authorities is broken. If this happens, the consensus should contain whichever microdescriptor digest is most common. If there is no winner, we break ties in the favor of the lexically earliest. Either way, we should log a warning: there is definitely a bug.) The "m" lines in a consensus contain only the digest, not a list of consensus methods. 3.1.3. A new flavor of consensus Rather than inserting "m" lines in the current consensus format, they should be included in a new consensus flavor (see proposal 162). This flavor can safely omit descriptor digests. When we implement this voting method, we can remove the exit policy summary from the current "ns" flavor of consensus, since no current clients use them, and they take up about 5% of the compressed consensus. This new consensus flavor should be signed with the sha256 signature format as documented in proposal 162. 3.2. Directory mirrors fetch, cache, and serve microdescriptors Directory mirrors should fetch, catch, and serve each microdescriptor from the authorities. (They need to continue to serve normal relay descriptors too, to handle old clients.) The microdescriptors with base64 hashes ,, should be available at: http:///tor/micro/d/--.z (We use base64 for size and for consistency with the consensus format. We use -s instead of +s to separate these items, since the + character is used in base64 encoding.) All the microdescriptors from the current consensus should also be available at: http:///tor/micro/all.z so a client that's bootstrapping doesn't need to send a 70KB URL just to name every microdescriptor it's looking for. Microdescriptors have no header or footer. The hash of the microdescriptor is simply the hash of the concatenated elements. Directory mirrors should check to make sure that the microdescriptors they're about to serve match the right hashes (either the hashes from the fetch URL or the hashes from the consensus, respectively). We will probably want to consider some sort of smart data structure to be able to quickly convert microdescriptor hashes into the appropriate microdescriptor. Clients will want this anyway when they load their microdescriptor cache and want to match it up with the consensus to see what's missing. 3.3. Clients fetch them and cache them When a client gets a new consensus, it looks to see if there are any microdescriptors it needs to learn. If it needs to learn more than some threshold of the microdescriptors (half?), it requests 'all', else it requests only the missing ones. Clients MAY try to determine whether the upload bandwidth for listing the microdescriptors they want is more or less than the download bandwidth for the microdescriptors they do not want. Clients maintain a cache of microdescriptors along with metadata like when it was last referenced by a consensus, and which identity key it corresponds to. They keep a microdescriptor until it hasn't been mentioned in any consensus for a week. Future clients might cache them for longer or shorter times. 3.3.1. Information leaks from clients If a client asks you for a set of microdescs, then you know she didn't have them cached before. How much does that leak? What about when we're all using our entry guards as directory guards, and we've seen that user make a bunch of circuits already? Fetching "all" when you need at least half is a good first order fix, but might not be all there is to it. Another future option would be to fetch some of the microdescriptors anonymously (via a Tor circuit). Another crazy option (Roger's phrasing) is to do decoy fetches as well. 4. Transition and deployment Phase one, the directory authorities should start voting on microdescriptors, and putting them in the consensus. Phase two, directory mirrors should learn how to serve them, and learn how to read the consensus to find out what they should be serving. Phase three, clients should start fetching and caching them instead of normal descriptors.