158-microdescriptors.txt 7.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
  1. Filename: 158-microdescriptors.txt
  2. Title: Clients download consensus + microdescriptors
  3. Author: Roger Dingledine
  4. Created: 17-Jan-2009
  5. Status: Open
  6. 0. History
  7. 15 May 2009: Substantially revised based on discussions on or-dev
  8. from late January. Removed the notion of voting on how to choose
  9. microdescriptors; made it just a function of the consesus method.
  10. (This lets us avoid the possibility of "desynchronization.")
  11. Added suggestion to use a new consensus flavor. Specified use of
  12. SHA256 for new hashes. -nickm
  13. 1. Overview
  14. This proposal replaces section 3.2 of proposal 141, which was
  15. called "Fetching descriptors on demand". Rather than modifying the
  16. circuit-building protocol to fetch a server descriptor inline at each
  17. circuit extend, we instead put all of the information that clients need
  18. either into the consensus itself, or into a new set of data about each
  19. relay called a microdescriptor. The microdescriptor is a direct
  20. transform from the relay descriptor, so relays don't even need to know
  21. this is happening.
  22. Descriptor elements that are small and frequently changing should go
  23. in the consensus itself, and descriptor elements that are small and
  24. relatively static should go in the microdescriptor. If we ever end up
  25. with descriptor elements that aren't small yet clients need to know
  26. them, we'll need to resume considering some design like the one in
  27. proposal 141.
  28. Note also that any descriptor element which clients need to use to
  29. decide which servers to fetch info about, or which servers to fetch
  30. info from, needs to stay in the consensus.
  31. 2. Motivation
  32. See
  33. http://archives.seul.org/or/dev/Nov-2008/msg00000.html and
  34. http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially
  35. http://archives.seul.org/or/dev/Nov-2008/msg00007.html
  36. for a discussion of the options and why this is currently the best
  37. approach.
  38. 3. Design
  39. There are three pieces to the proposal. First, authorities will list in
  40. their votes (and thus in the consensus) the expected hash
  41. of microdescriptor for each relay. Second, directory mirrors will serve
  42. microdescriptors. Third, clients will ask for them and cache them.
  43. 3.1. Consensus changes
  44. If the authorities choose a consensus method of a given version or
  45. later, a microdescriptor format is implicit in that version.
  46. A microdescriptor should in every case be a pure function of the
  47. router descriptor and the conensus method.
  48. In votes, we need to include the hash of each expected microdescriptor
  49. in the routerstatus section. I suggest a new "m" line for each stanza,
  50. with the base64 of the SHA256 hash of the router's microdescriptor.
  51. For every consensus method that an authority supports, it includes a
  52. separate "m" line in each router section of its vote, containing:
  53. "m" SP methods SP digest NL
  54. where methods is a comma-separated list of the consensus methods
  55. that the authority believes will produce "digest".
  56. (As with base64 encoding of SHA1 hashes in consensuses, let's
  57. omit the trailing =s)
  58. The consensus microdescriptor-elements and "m" lines are then computed
  59. as described in Section 3.1.2 below.
  60. (This means we need a new consensus-method that knows
  61. how to compute the microdescriptor-elements and add "m" lines.)
  62. 3.1.1. Descriptor elements to include for now
  63. In the first version, the microdescriptor should contain the
  64. onion-key element and the family element from the router descriptor.
  65. 3.1.2. Computing consensus for microdescriptor-elements and "m" lines
  66. When we are generating a consensus, we use whichever m line
  67. unambiguously corresponds to the descriptor digest that will be
  68. included in the consensus. (If there are multiple m lines for that
  69. descriptor digest, we use whichever is most common. If they are
  70. equally common, we break ties in the favor of the lexically
  71. earliest. Either way, we should log a warning: That's likely a
  72. bug.)
  73. The "m" lines in a consensus contain only the digest, not a list of
  74. consensus methods.
  75. 3.1.3. A new flavor of consensus
  76. Rather than inserting "m" lines in the current consensus format,
  77. they should be included in a new consensus flavor (see proposal
  78. 162).
  79. This flavor can safely omit descriptor digests.
  80. We still need to decide whether to move ports into microdescriptors
  81. or not. In either case, they can be removed from the current "ns"
  82. flavor of consensus, since no current clients use them, and they
  83. take up about 5% of the compressed consensus.
  84. This new consensus flavor should be signed with the sha256 signature
  85. format as documented in proposal 162.
  86. 3.2. Directory mirrors serve microdescriptors
  87. Directory mirrors should then read the microdescriptor-elements line
  88. from the consensus, and learn how to answer requests. (Directory mirrors
  89. continue to serve normal relay descriptors too, a) to serve old clients
  90. and b) to be able to construct microdescriptors on the fly.)
  91. The microdescriptors with base64 hashes <D1>,<D2>,<D3> should be available at:
  92. http://<hostname>/tor/micro/d/<D1>-<D2>-<D3>.z
  93. (We use base64 for size and for consistency with the consensus
  94. format. We use -s instead of +s to separate these items, since
  95. All the microdescriptors from the current consensus should also be
  96. available at:
  97. http://<hostname>/tor/micro/all.z
  98. so a client that's bootstrapping doesn't need to send a 70KB URL just
  99. to name every microdescriptor it's looking for.
  100. Microdescriptors have no header or footer.
  101. The hash of the microdescriptor is simply the hash of the concatenated
  102. elements.
  103. Directory mirrors should check to make sure that the microdescriptors
  104. they're about to serve match the right hashes (either the hashes from
  105. the fetch URL or the hashes from the consensus, respectively).
  106. We will probably want to consider some sort of smart data structure to
  107. be able to quickly convert microdescriptor hashes into the appropriate
  108. microdescriptor. Clients will want this anyway when they load their
  109. microdescriptor cache and want to match it up with the consensus to
  110. see what's missing.
  111. 3.3. Clients fetch them and cache them
  112. When a client gets a new consensus, it looks to see if there are any
  113. microdescriptors it needs to learn. If it needs to learn more than
  114. some threshold of the microdescriptors (half?), it requests 'all',
  115. else it requests only the missing ones. Clients MAY try to
  116. determine whether the upload bandwidth for listing the
  117. microdescriptors they want is more or less than the download
  118. bandwidth for the microdescriptors they do not want.
  119. Clients maintain a cache of microdescriptors along with metadata like
  120. when it was last referenced by a consensus, and which identity key
  121. it corresponds to. They keep a microdescriptor
  122. until it hasn't been mentioned in any consensus for a week. Future
  123. clients might cache them for longer or shorter times.
  124. 3.3.1. Information leaks from clients
  125. If a client asks you for a set of microdescs, then you know she didn't
  126. have them cached before. How much does that leak? What about when
  127. we're all using our entry guards as directory guards, and we've seen
  128. that user make a bunch of circuits already?
  129. Fetching "all" when you need at least half is a good first order fix,
  130. but might not be all there is to it.
  131. Another future option would be to fetch some of the microdescriptors
  132. anonymously (via a Tor circuit).
  133. 4. Transition and deployment
  134. Phase one, the directory authorities should start voting on
  135. microdescriptors and microdescriptor elements, and putting them in the
  136. consensus.
  137. Phase two, directory mirrors should learn how to serve them, and learn
  138. how to read the consensus to find out what they should be serving.
  139. Phase three, clients should start fetching and caching them instead
  140. of normal descriptors.