140-consensus-diffs.txt 5.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
  1. Filename: 140-consensus-diffs.txt
  2. Title: Provide diffs between consensuses
  3. Author: Peter Palfrader
  4. Created: 13-Jun-2008
  5. Status: Accepted
  6. Target: 0.2.2.x
  7. 0. History
  8. 22-May-2009: Restricted the ed format even more strictly for ease of
  9. implementation. -nickm
  10. 1. Overview.
  11. Tor clients and servers need a list of which relays are on the
  12. network. This list, the consensus, is created by authorities
  13. hourly and clients fetch a copy of it, with some delay, hourly.
  14. This proposal suggests that clients download diffs of consensuses
  15. once they have a consensus instead of hourly downloading a full
  16. consensus.
  17. 2. Numbers
  18. After implementing proposal 138 which removes nodes that are not
  19. running from the list a consensus document is about 92 kilobytes
  20. in size after compression.
  21. The diff between two consecutive consensus, in ed format, is on
  22. average 13 kilobytes compressed.
  23. 3. Proposal
  24. 3.1 Clients
  25. If a client has a consensus that is recent enough it SHOULD
  26. try to download a diff to get the latest consensus rather than
  27. fetching a full one.
  28. [XXX: what is recent enough?
  29. time delta in hours / size of compressed diff
  30. 0 20
  31. 1 9650
  32. 2 17011
  33. 3 23150
  34. 4 29813
  35. 5 36079
  36. 6 39455
  37. 7 43903
  38. 8 48907
  39. 9 54549
  40. 10 60057
  41. 11 67810
  42. 12 71171
  43. 13 73863
  44. 14 76048
  45. 15 80031
  46. 16 84686
  47. 17 89862
  48. 18 94760
  49. 19 94868
  50. 20 94223
  51. 21 93921
  52. 22 92144
  53. 23 90228
  54. [ size of gzip compressed "diff -e" between the consensus on
  55. 2008-06-01-00:00:00 and the following consensuses that day.
  56. Consensuses have been modified to exclude down routers per
  57. proposal 138. ]
  58. Data suggests that for the first few hours diffs are very useful,
  59. saving about 60% for the first three hours, 30% for the first 10,
  60. and almost nothing once we are past 16 hours.
  61. ]
  62. 3.2 Servers
  63. Directory authorities and servers need to keep up to X [XXX: depends
  64. on how long clients try to download diffs per above] old consensus
  65. documents so they can build diffs. They should offer a diff to the
  66. most recent consensus at the URL
  67. http://tor.noreply.org/tor/status-vote/current/consensus/diff/<HASH>/<FPRLIST>
  68. where hash is the full digest of the consensus the client currently
  69. has, and FPRLIST is a list of (abbreviated) fingerprints of
  70. authorities the client trusts.
  71. Servers will only return a consensus if more than half of the requested
  72. authorities have signed the document, otherwise a 404 error will be sent
  73. back. The fingerprints can be shortened to a length of any multiple of
  74. two, using only the leftmost part of the encoded fingerprint. Tor uses
  75. 3 bytes (6 hex characters) of the fingerprint. (This is just like the
  76. conditional consensus downloads that Tor supports starting with
  77. 0.1.2.1-alpha.)
  78. If a server cannot offer a diff from the consensus identified by the
  79. hash but has a current consensus it MUST return the full consensus.
  80. [XXX: what should we do when the client already has the latest
  81. consensus? I can think of the following options:
  82. - send back 3xx not modified
  83. - send back 200 ok and an empty diff
  84. - send back 404 nothing newer here.
  85. I currently lean towards the empty diff.]
  86. 4. Diff Format
  87. Diffs start with the token "network-status-diff-version" followed by a
  88. space and the version number, currently "1".
  89. If a document does not start with network-status-diff it is assumed
  90. to be a full consensus download and would therefore currently start
  91. with "network-status-version 3".
  92. Following the network-status-diff header line is a diff, or patch, in
  93. limited ed format. We choose this format because it is easy to create
  94. and process with standard tools (patch, diff -e, ed). This will help
  95. us in developing and testing this proposal and it should make future
  96. debugging easier.
  97. [ If at one point in the future we decide that the space benefits from
  98. a custom diff format outweighs these benefits we can always
  99. introduce a new diff format and offer it at for instance
  100. ../diff2/... ]
  101. We support the following ed commands, each on a line by itself:
  102. - "<n1>d" Delete line n1
  103. - "<n1>,<n2>d" Delete lines n1 through n2, including
  104. - "<n1>c" Replace line n1 with the following block
  105. - "<n1>,<n2>c" Replace lines n1 through n2, including, with the
  106. following block.
  107. - "<n1>a" Append the following block after line n1.
  108. - "a" Append the following block after the current line.
  109. - "s/.//" Remove the first character in the current line.
  110. Note that line numbers always apply to the file after all previous
  111. commands have already been applied.
  112. The commands MUST apply to the file from back to front, such that
  113. lines are only ever referred to by their position in the original
  114. file.
  115. The "current line" is either the first line of the file, if this is
  116. the first command, the last line of a block we added in an append or
  117. change command, or the line immediate following a set of lines we just
  118. deleted (or the last line of the file if there are no lines after
  119. that).
  120. The replace and append command take blocks. These blocks are simply
  121. appended to the diff after the line with the command. A line with
  122. just a period (".") ends the block (and is not part of the lines
  123. to add). Note that it is impossible to insert a line with just
  124. a single dot. Recommended procedure is to insert a line with
  125. two dots, then remove the first character of that line using s/.//.