xxx-microdescriptors.txt 8.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199
  1. Filename: xxx-microdescriptors.txt
  2. Title: Clients download consensus + microdescriptors
  3. Version: $Revision$
  4. Last-Modified: $Date$
  5. Author: Roger Dingledine
  6. Created: 17-Jan-2009
  7. Status: Open
  8. 1. Overview
  9. This proposal replaces section 3.2 of proposal 141, which was
  10. called "Fetching descriptors on demand". Rather than modifying the
  11. circuit-building protocol to fetch a server descriptor inline at each
  12. circuit extend, we instead put all of the information that clients need
  13. either into the consensus itself, or into a new set of data about each
  14. relay called a microdescriptor.
  15. The goal is that descriptor elements that are small and frequently
  16. changing should go in the consensus itself, descriptor elements that
  17. are small and relatively static should go in the microdescriptor,
  18. and if we ever end up with descriptor elements that aren't small yet
  19. clients need to know them, we'll need to resume considering some design
  20. like the one in proposal 141.
  21. 2. Motivation
  22. See
  23. http://archives.seul.org/or/dev/Nov-2008/msg00000.html and
  24. http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially
  25. http://archives.seul.org/or/dev/Nov-2008/msg00007.html
  26. for a discussion of the options and why this is currently the best
  27. approach.
  28. 3. Design
  29. There are three pieces to the proposal. First, authorities will list in
  30. their votes (and thus in the consensus) what relay descriptor elements
  31. are included in the microdescriptor, and also list the expected hash
  32. of microdescriptor for each relay. Second, directory mirrors will serve
  33. microdescriptors. Third, clients will ask for them and then cache them.
  34. 3.1. Consensus changes
  35. V3 votes should include a new line:
  36. microdescriptor-elements bar baz foo
  37. We also need to include the hash of each expected microdescriptor in
  38. the routerstatus section. I suggest a new "m" line for each stanza,
  39. with the base64 of the hash of the elements that the authority voted
  40. for above.
  41. The consensus microdescriptor-elements and "m" lines are then computed
  42. as described in Section 3.1.2 below.
  43. I believe that means we need a new consensus-method "6" that knows
  44. how to compute the microdescriptor-elements and add "m" lines.
  45. 3.1.1. Descriptor elements to include for now
  46. To start, the element list that authorities suggest should be
  47. family onion-key
  48. (Note that the or-dev posts above only mention onion-key, but if
  49. we don't also include family then clients will never learn it. It
  50. seemed like it should be relatively static, so putting it in the
  51. microdescriptor is smarter than trying to fit it into the consensus.)
  52. 3.1.2. Computing consensus for microdescriptor-elements and "m" lines
  53. One approach is for the consensus microdescriptor-elements line to
  54. include all elements listed by a majority of authorities, sorted. The
  55. problem here is that it will no longer be deterministic what the correct
  56. hash for the "m" line should be. We could imagine telling the authority
  57. to go look in its descriptor and produce the right hash itself, but
  58. we don't want consensus calculation to be based on external data like
  59. that. (Plus, the authority may not have the descriptor that everybody
  60. else voted to use.)
  61. The better approach is to take the exact set that has the most votes
  62. (breaking ties by the set that has the most elements, and breaking
  63. ties after that by whichever is alphabetically first). That will
  64. increase the odds that we actually get a microdescriptor hash that
  65. is both a) for the descriptor we're putting in the consensus, and b)
  66. over the elements that we're declaring it should be for.
  67. Then the "m" line for a given relay is the one that gets the most votes
  68. from authorities that both a) voted for the microdescriptor-elements
  69. line we're using, and b) voted for the descriptor we're using.
  70. (If there's a tie, use the smaller hash. But really, if there are
  71. multiple such votes and they differ about a microdescriptor, we caught
  72. one of them being lying or buggy. We should log it to track down why.)
  73. If there are no such votes, then we leave out the "m" line for that
  74. relay. That means clients should avoid it for this time period. (As
  75. an extension it could instead mean that clients should fetch the
  76. descriptor and figure out its microdescriptor themselves. But let's
  77. not get ahead of ourselves.)
  78. It would be nice to have a more foolproof way to agree on what
  79. microdescriptor hash each authority should vote for, so we can avoid
  80. missing "m" lines. Just switching to a new consensus-method each time
  81. we change the set of microdescriptor-elements won't help though, since
  82. each authority will still have to decide what hash to vote for before
  83. knowing what consensus-method will be used.
  84. Here's one way we could do it. Each vote / consensus includes
  85. the microdescriptor-elements that were used to compute the hashes,
  86. and also a preferred-microdescriptor-elements set. If an authority
  87. has a consensus from the previous period, then it should use the
  88. consensus preferred-microdescriptor-elements when computing its votes
  89. for microdescriptor-elements and the appropriate hashes in the upcoming
  90. period. (If it has no previous consensus, then it just puts down its
  91. own preferences in both lines.)
  92. 3.2. Directory mirrors serve microdescriptors
  93. Directory mirrors should then read the microdescriptor-elements line
  94. from the consensus, and learn how to answer requests.
  95. The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
  96. http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
  97. All the microdescriptors from the current consensus should also be
  98. available at:
  99. http://<hostname>/tor/micro/all.z
  100. so a client that's bootstrapping doesn't need to send a 70KB URL just
  101. to name every microdescriptor it's looking for.
  102. The format of a microdescriptor is the header line
  103. "microdescriptor 1"
  104. followed by each element (keyword and body), alphabetically. There's
  105. no need to mention what hash it is, since you can hash the elements
  106. to learn this.
  107. (Do we need a footer line to show that it's over, or is the next
  108. microdescriptor line or EOF enough of a hint? A footer line wouldn't
  109. hurt much. Also, no fair voting for the microdescriptor-element
  110. "microdescriptor".)
  111. The hash of the microdescriptor is simply the hash of the concatenated
  112. elements -- not counting the header line or hypothetical footer line.
  113. Is this smart?
  114. Note that I put a "1" up there in the header line. It isn't part
  115. of what's hashed, though. Is there a way to put in a version that's
  116. more useful?
  117. Directory mirrors should check to make sure that the microdescriptors
  118. they're about to serve match the right hashes (either the hashes from
  119. the fetch URL or the hashes from the consensus, respectively).
  120. We will probably want to consider some sort of smart data structure to
  121. be able to quickly convert microdescriptor hashes into the appropriate
  122. microdescriptor. Clients will want this anyway when they load their
  123. microdescriptor cache and want to match it up with the consensus to
  124. see what's missing.
  125. 3.3. Clients fetch them and cache them
  126. When a client gets a new consensus, it looks to see if there are any
  127. microdescriptors it needs to learn. If it needs to learn more than
  128. some threshold of the microdescriptors (half?), it requests 'all',
  129. else it requests only the missing ones.
  130. Clients maintain a cache of microdescriptors along with metadata like
  131. when it was last referenced by a consensus. They keep a microdescriptor
  132. until it hasn't been mentioned in any consensus for a week.
  133. 3.3.1. Information leaks from clients
  134. If a client asks you for a set of microdescs, then you know she didn't
  135. have them cached before. How much does that leak? What about when
  136. we're all using our entry guards as directory guards, and we've seen
  137. that user make a bunch of circuits already?
  138. Fetching "all" when you need at least half is a good first order fix,
  139. but might not be all there is to it.
  140. Another future option would be to fetch some of the microdescriptors
  141. anonymously (via a Tor circuit).
  142. 4. Transition and deployment
  143. Phase one, the directory authorities should start voting on
  144. microdescriptors and microdescriptor elements, and putting them in the
  145. consensus. This should happen during the 0.2.1.x series, and should
  146. be relatively easy to do.
  147. Phase two, directory mirrors should learn how to serve them, and learn
  148. how to read the consensus to find out what they should be serving. It
  149. would be great if we can squeeze this in during 0.2.1.x also, so once
  150. clients start to fetch them there will be many mirrors to choose from.
  151. (Are there reasonable ways to build only part of phase two in 0.2.1.x?)
  152. Phase three, clients should start fetching and caching them instead
  153. of normal descriptors. This should happen post 0.2.1.x.