| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207 | 
							- Filename: 158-microdescriptors.txt
 
- Title: Clients download consensus + microdescriptors
 
- Version: $Revision$
 
- Last-Modified: $Date$
 
- Author: Roger Dingledine
 
- Created: 17-Jan-2009
 
- Status: Open
 
- 1. Overview
 
-   This proposal replaces section 3.2 of proposal 141, which was
 
-   called "Fetching descriptors on demand". Rather than modifying the
 
-   circuit-building protocol to fetch a server descriptor inline at each
 
-   circuit extend, we instead put all of the information that clients need
 
-   either into the consensus itself, or into a new set of data about each
 
-   relay called a microdescriptor. The microdescriptor is a direct
 
-   transform from the relay descriptor, so relays don't even need to know
 
-   this is happening.
 
-   Descriptor elements that are small and frequently changing should go
 
-   in the consensus itself, and descriptor elements that are small and
 
-   relatively static should go in the microdescriptor. If we ever end up
 
-   with descriptor elements that aren't small yet clients need to know
 
-   them, we'll need to resume considering some design like the one in
 
-   proposal 141.
 
- 2. Motivation
 
-   See
 
-   http://archives.seul.org/or/dev/Nov-2008/msg00000.html and
 
-   http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially
 
-   http://archives.seul.org/or/dev/Nov-2008/msg00007.html
 
-   for a discussion of the options and why this is currently the best
 
-   approach.
 
- 3. Design
 
-   There are three pieces to the proposal. First, authorities will list in
 
-   their votes (and thus in the consensus) what relay descriptor elements
 
-   are included in the microdescriptor, and also list the expected hash
 
-   of microdescriptor for each relay. Second, directory mirrors will serve
 
-   microdescriptors. Third, clients will ask for them and cache them.
 
- 3.1. Consensus changes
 
-   V3 votes should include a new line:
 
-     microdescriptor-elements bar baz foo
 
-   listing each descriptor element (sorted alphabetically) that authority
 
-   included when it calculated its expected microdescriptor hashes.
 
-   We also need to include the hash of each expected microdescriptor in
 
-   the routerstatus section. I suggest a new "m" line for each stanza,
 
-   with the base64 of the hash of the elements that the authority voted
 
-   for above.
 
-   The consensus microdescriptor-elements and "m" lines are then computed
 
-   as described in Section 3.1.2 below.
 
-   I believe that means we need a new consensus-method "6" that knows
 
-   how to compute the microdescriptor-elements and add "m" lines.
 
- 3.1.1. Descriptor elements to include for now
 
-   To start, the element list that authorities suggest should be
 
-     family onion-key
 
-   (Note that the or-dev posts above only mention onion-key, but if
 
-   we don't also include family then clients will never learn it. It
 
-   seemed like it should be relatively static, so putting it in the
 
-   microdescriptor is smarter than trying to fit it into the consensus.)
 
-   We could imagine a config option "family,onion-key" so authorities
 
-   could change their voted preferences without needing to upgrade.
 
- 3.1.2. Computing consensus for microdescriptor-elements and "m" lines
 
-   One approach is for the consensus microdescriptor-elements line to
 
-   include every element listed by a majority of authorities, sorted. The
 
-   problem here is that it will no longer be deterministic what the correct
 
-   hash for the "m" line should be. We could imagine telling the authority
 
-   to go look in its descriptor and produce the right hash itself, but
 
-   we don't want consensus calculation to be based on external data like
 
-   that. (Plus, the authority may not have the descriptor that everybody
 
-   else voted to use.)
 
-   The better approach is to take the exact set that has the most votes
 
-   (breaking ties by the set that has the most elements, and breaking
 
-   ties after that by whichever is alphabetically first). That will
 
-   increase the odds that we actually get a microdescriptor hash that
 
-   is both a) for the descriptor we're putting in the consensus, and b)
 
-   over the elements that we're declaring it should be for.
 
-   Then the "m" line for a given relay is the one that gets the most votes
 
-   from authorities that both a) voted for the microdescriptor-elements
 
-   line we're using, and b) voted for the descriptor we're using.
 
-   (If there's a tie, use the smaller hash. But really, if there are
 
-   multiple such votes and they differ about a microdescriptor, we caught
 
-   one of them lying or being buggy. We should log it to track down why.)
 
-   If there are no such votes, then we leave out the "m" line for that
 
-   relay. That means clients should avoid it for this time period. (As
 
-   an extension it could instead mean that clients should fetch the
 
-   descriptor and figure out its microdescriptor themselves. But let's
 
-   not get ahead of ourselves.)
 
-   It would be nice to have a more foolproof way to agree on what
 
-   microdescriptor hash each authority should vote for, so we can avoid
 
-   missing "m" lines. Just switching to a new consensus-method each time
 
-   we change the set of microdescriptor-elements won't help though, since
 
-   each authority will still have to decide what hash to vote for before
 
-   knowing what consensus-method will be used.
 
-   Here's one way we could do it. Each vote / consensus includes
 
-   the microdescriptor-elements that were used to compute the hashes,
 
-   and also a preferred-microdescriptor-elements set. If an authority
 
-   has a consensus from the previous period, then it should use the
 
-   consensus preferred-microdescriptor-elements when computing its votes
 
-   for microdescriptor-elements and the appropriate hashes in the upcoming
 
-   period. (If it has no previous consensus, then it just writes its
 
-   own preferences in both lines.)
 
- 3.2. Directory mirrors serve microdescriptors
 
-   Directory mirrors should then read the microdescriptor-elements line
 
-   from the consensus, and learn how to answer requests. (Directory mirrors
 
-   continue to serve normal relay descriptors too, a) to serve old clients
 
-   and b) to be able to construct microdescriptors on the fly.)
 
-   The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
 
-     http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
 
-   All the microdescriptors from the current consensus should also be
 
-   available at:
 
-     http://<hostname>/tor/micro/all.z
 
-   so a client that's bootstrapping doesn't need to send a 70KB URL just
 
-   to name every microdescriptor it's looking for.
 
-   The format of a microdescriptor is the header line
 
-   "microdescriptor-header"
 
-   followed by each element (keyword and body), alphabetically. There's
 
-   no need to mention what hash it's for, since it's self-identifying:
 
-   you can hash the elements to learn this.
 
-   (Do we need a footer line to show that it's over, or is the next
 
-   microdescriptor line or EOF enough of a hint? A footer line wouldn't
 
-   hurt much. Also, no fair voting for the microdescriptor-element
 
-   "microdescriptor-header".)
 
-   The hash of the microdescriptor is simply the hash of the concatenated
 
-   elements -- not counting the header line or hypothetical footer line.
 
-   Unless you prefer that?
 
-   Is there a reasonable way to version these things? We could say that
 
-   the microdescriptor-header line can contain arguments which clients
 
-   must ignore if they don't understand them. Any better ways?
 
-   Directory mirrors should check to make sure that the microdescriptors
 
-   they're about to serve match the right hashes (either the hashes from
 
-   the fetch URL or the hashes from the consensus, respectively).
 
-   We will probably want to consider some sort of smart data structure to
 
-   be able to quickly convert microdescriptor hashes into the appropriate
 
-   microdescriptor. Clients will want this anyway when they load their
 
-   microdescriptor cache and want to match it up with the consensus to
 
-   see what's missing.
 
- 3.3. Clients fetch them and cache them
 
-   When a client gets a new consensus, it looks to see if there are any
 
-   microdescriptors it needs to learn. If it needs to learn more than
 
-   some threshold of the microdescriptors (half?), it requests 'all',
 
-   else it requests only the missing ones.
 
-   Clients maintain a cache of microdescriptors along with metadata like
 
-   when it was last referenced by a consensus. They keep a microdescriptor
 
-   until it hasn't been mentioned in any consensus for a week. Future
 
-   clients might cache them for longer or shorter times.
 
- 3.3.1. Information leaks from clients
 
-   If a client asks you for a set of microdescs, then you know she didn't
 
-   have them cached before. How much does that leak? What about when
 
-   we're all using our entry guards as directory guards, and we've seen
 
-   that user make a bunch of circuits already?
 
-   Fetching "all" when you need at least half is a good first order fix,
 
-   but might not be all there is to it.
 
-   Another future option would be to fetch some of the microdescriptors
 
-   anonymously (via a Tor circuit).
 
- 4. Transition and deployment
 
-   Phase one, the directory authorities should start voting on
 
-   microdescriptors and microdescriptor elements, and putting them in the
 
-   consensus. This should happen during the 0.2.1.x series, and should
 
-   be relatively easy to do.
 
-   Phase two, directory mirrors should learn how to serve them, and learn
 
-   how to read the consensus to find out what they should be serving. This
 
-   phase could be done either in 0.2.1.x or early in 0.2.2.x, depending
 
-   on how messy it turns out to be and how quickly we get around to it.
 
-   Phase three, clients should start fetching and caching them instead
 
-   of normal descriptors. This should happen post 0.2.1.x.
 
 
  |