175-automatic-node-promotion.txt 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239
  1. Filename: xxx-automatic-node-promotion.txt
  2. Title: Automatically promoting Tor clients to nodes
  3. Author: Steven Murdoch
  4. Created: 12-Mar-2010
  5. Status: Draft
  6. Target:
  7. 1. Overview
  8. This proposal describes how Tor clients could determine when they
  9. have sufficient bandwidth capacity and are sufficiently reliable to
  10. become either bridges or Tor relays. When they meet this
  11. criteria, they will automatically promote themselves, based on user
  12. preferences. The proposal also defines the new controller messages
  13. and options which will control this process.
  14. Note that for the moment, only transitions between client and
  15. bridge are being considered. Transitions to public relay will
  16. be considered at a future date, but will use the same
  17. infrastructure for measuring capacity and reliability.
  18. 2. Motivation and history
  19. Tor has a growing user-base and one of the major impediments to the
  20. quality of service offered is the lack of network capacity. This is
  21. particularly the case for bridges, because these are gradually
  22. being blocked, and thus no longer of use to people within some
  23. countries. By automatically promoting Tor clients to bridges, and
  24. perhaps also to full public relays, this proposal aims to solve
  25. these problems.
  26. Only Tor clients which are sufficiently useful should be promoted,
  27. and the process of determining usefulness should be performed
  28. without reporting the existence of the client to the central
  29. authorities. The criteria used for determining usefulness will be
  30. in terms of bandwidth capacity and uptime, but parameters should be
  31. specified in the directory consensus. State stored at the client
  32. should be in no more detail than necessary, to prevent sensitive
  33. information being recorded.
  34. 3. Design
  35. 3.x Opt-in state model
  36. Tor can be in one of five node-promotion states:
  37. - off (O): Currently a client, and will stay as such
  38. - auto (A): Currently a client, but will consider promotion
  39. - bridge (B): Currently a bridge, and will stay as such
  40. - auto-bridge (AB): Currently a bridge, but will consider promotion
  41. - relay (R): Currently a public relay, and will stay as such
  42. The state can be fully controlled from the configuration file or
  43. controller, but the normal state transitions are as follows:
  44. Any state -> off: User has opted out of node promotion
  45. Off -> any state: Only permitted with user consent
  46. Auto -> auto-bridge: Tor has detected that it is sufficiently
  47. reliable to be a *bridge*
  48. Auto -> bridge: Tor has detected that it is sufficiently reliable
  49. to be a *relay*, but the user has chosen to remain a *bridge*
  50. Auto -> relay: Tor has detected that it is sufficiently reliable
  51. to be *relay*, and will skip being a *bridge*
  52. Auto-bridge -> relay: Tor has detected that it is sufficiently
  53. reliable to be a *relay*
  54. Note that this model does not support automatic demotion. If this
  55. is desirable, there should be some memory as to whether the
  56. previous state was relay, bridge, or auto-bridge. Otherwise the
  57. user may be prompted to become a relay, although he has opted to
  58. only be a bridge.
  59. 3.x User interaction policy
  60. There are a variety of options in how to involve the user into the
  61. decision as to whether and when to perform node promotion. The
  62. choice also may be different when Tor is running from Vidalia (and
  63. thus can readily prompt the user for information), and standalone
  64. (where Tor can only log messages, which may or may not be read).
  65. The option requiring minimal user interaction is to automatically
  66. promote nodes according to reliability, and allow the user to opt
  67. out, by changing settings in the configuration file or Vidalia user
  68. interface.
  69. Alternatively, if a user interface is available, Tor could prompt
  70. the user when it detects that a transition is available, and allow
  71. the user to choose which of the available options to select. If
  72. Vidalia is not available, it still may be possible to solicit an
  73. email address on install, and contact the operator to ask whether
  74. a transition to bridge or relay is permitted.
  75. Finally, Tor could by default not make any transition, and the user
  76. would need to opt in by stating the maximum level (bridge or
  77. relay) to which the node may automatically promote itself.
  78. 3.x Performance monitoring model
  79. To prevent a large number of clients activating as relays, but
  80. being too unreliable to be useful, clients should measure their
  81. performance. If this performance meets a parameterized acceptance
  82. criteria, a client should consider promotion. To measure
  83. reliability, this proposal adopts a simple user model:
  84. - A user decides to use Tor at times which follow a Poisson
  85. distribution
  86. - At each time, the user will be happy if the bridge chosen has
  87. adequate bandwidth and is reachable
  88. - If the chosen bridge is down or slow too many times, the user
  89. will consider Tor to be bad
  90. If we additionally assume that the recent history of relay
  91. performance matches the current performance, we can measure
  92. reliability by simulating this simple user.
  93. The following parameters are distributed to clients in the
  94. directory consensus:
  95. - min_bandwidth: Minimum self-measured bandwidth for a node to be
  96. considered useful, in bytes per second
  97. - check_period: How long, in seconds, to wait between checking
  98. reachability and bandwidth (on average)
  99. - num_samples: Number of recent samples to keep
  100. - num_useful: Minimum number of recent samples where the node was
  101. reachable and had at least min_bandwidth capacity, for a client
  102. to consider promoting to a bridge
  103. A different set of parameters may be used for considering when to
  104. promote a bridge to a full relay, but this will be the subject of a
  105. future revision of the proposal.
  106. 3.x Performance monitoring algorithm
  107. The simulation described above can be implemented as follows:
  108. Every 60 seconds:
  109. 1. Tor generates a random floating point number x in
  110. the interval [0, 1).
  111. 2. If x > (1 / (check_period / 60)) GOTO end; otherwise:
  112. 3. Tor sets the value last_check to the current_time (in seconds)
  113. 4. Tor measures reachability
  114. 5. If the client is reachable, Tor measures its bandwidth
  115. 6. If the client is reachable and the bandwidth is >=
  116. min_bandwidth, the test has succeeded, otherwise it has failed.
  117. 7. Tor adds the test result to the end of a ring-buffer containing
  118. the last num_samples results: measurement_results
  119. 8. Tor saves last_check and measurements_results to disk
  120. 9. If the length of measurements_results == num_samples and
  121. the number of successes >= num_useful, Tor should consider
  122. promotion to a bridge
  123. end.
  124. When Tor starts, it must fill in the samples for which it was not
  125. running. This can only happen once the consensus has downloaded,
  126. because the value of check_period is needed.
  127. 1. Tor generates a random number y from the Poisson distribution [1]
  128. with lambda = (current_time - last_check) * (1 / check_period)
  129. 2. Tor sets the value last_check to the current_time (in seconds)
  130. 3. Add y test failures to the ring buffer measurements_results
  131. 4. Tor saves last_check and measurements_results to disk
  132. In this way, a Tor client will measure its bandwidth and
  133. reachability every check_period seconds, on average. Provided
  134. check_period is sufficiently greater than a minute (say, at least an
  135. hour), the times of check will follow a Poisson distribution. [2]
  136. While this does require that Tor does record the state of a client
  137. over time, this does not leak much information. Only a binary
  138. reachable/non-reachable is stored, and the timing of samples becomes
  139. increasingly fuzzy as the data becomes less recent.
  140. On IP address changes, Tor should clear the ring-buffer, because
  141. from the perspective of users with the old IP address, this node
  142. might as well be a new one with no history. This policy may change
  143. once we start allowing the bridge authority to hand out new IP
  144. addresses given the fingerprint.
  145. 3.x Bandwidth measurement
  146. Tor needs to measure its bandwidth to test the usefulness as a
  147. bridge. A non-intrusive way to do this would be to passively measure
  148. the peak data transfer rate since the last reachability test. Once
  149. this exceeds min_bandwidth, Tor can set a flag that this node
  150. currently has sufficient bandwidth to pass the bandwidth component
  151. of the upcoming performance measurement.
  152. For the first version we may simply skip the bandwidth test,
  153. because the existing reachability test sends 500 kB over several
  154. circuits, and checks whether the node can transfer at least 50
  155. kB/s. This is probably good enough for a bridge, so this test
  156. might be sufficient to record a success in the ring buffer.
  157. 3.x New options
  158. 3.x New controller message
  159. 4. Migration plan
  160. We should start by setting a high bandwidth and uptime requirement
  161. in the consensus, so as to avoid overloading the bridge authority
  162. with too many bridges. Once we are confident our systems can scale,
  163. the criteria can be gradually shifted down to gain more bridges.
  164. 5. Related proposals
  165. 6. Open questions:
  166. - What user interaction policy should we take?
  167. - When (if ever) should we turn a relay into an exit relay?
  168. - What should the rate limits be for auto-promoted bridges/relays?
  169. Should we prompt the user for this?
  170. - Perhaps the bridge authority should tell potential bridges
  171. whether to enable themselves, by taking into account whether
  172. their IP address is blocked
  173. - How do we explain the possible risks of running a bridge/relay
  174. * Use of bandwidth/congestion
  175. * Publication of IP address
  176. * Blocking from IRC (even for non-exit relays)
  177. - What feedback should we give to bridge relays, to encourage then
  178. e.g. number of recent users (what about reserve bridges)?
  179. - Can clients back-off from doing these tests (yes, we should do
  180. this)
  181. [1] For algorithms to generate random numbers from the Poisson
  182. distribution, see: http://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables
  183. [2] "The sample size n should be equal to or larger than 20 and the
  184. probability of a single success, p, should be smaller than or equal to
  185. .05. If n >= 100, the approximation is excellent if np is also <= 10."
  186. http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc331.htm (e-Handbook of Statistical Methods)
  187. % vim: spell ai et: