xxx-bridge-disbursement.txt 6.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
  1. How to hand out bridges.
  2. Divide bridges into 'strategies' as they come in. Do this uniformly
  3. at random for now.
  4. For each strategy, we'll hand out bridges in a different way to
  5. clients. This document describes two strategies: email-based and
  6. IP-based.
  7. 0. Notation:
  8. HMAC(k,v) : an HMAC of v using the key k.
  9. A|B: The string A concatenated with the string B.
  10. 1. Email-based.
  11. Goal: bootstrap based on one or more popular email service's sybil
  12. prevention algorithms.
  13. Parameters:
  14. HMAC -- an HMAC function
  15. P -- a time period
  16. K -- the number of bridges to send in a period.
  17. Setup: Generate two nonces, N and M.
  18. As bridges arrive, put them into a ring according to HMAC(N,ID)
  19. where ID is the bridges's identity digest.
  20. Divide time into divisions of length P.
  21. When we get an email:
  22. If it's not from a supported email service, reject it.
  23. If we already sent a response to that email address (normalized)
  24. in this period, send _exactly_ the same response.
  25. If it is from a supported service, generate X = HMAC(M,PS|E) where E
  26. is the lowercased normalized email address for the user, and
  27. where PS is the start of the currrent period. Send
  28. the first K bridges in the ring after point X.
  29. [If we want to make sure that repeat queries are given exactly the
  30. same results, then we can't let the ring change during the
  31. time period. For a long time period like a month, that's quite a
  32. hassle. How about instead just keeping a replay cache of addresses
  33. that have been answered, and sending them a "sorry, you already got
  34. your addresses for the time period; perhaps you should try these
  35. other fine distribution strategies while you wait?" response? This
  36. approach would also resolve the "Make sure you can't construct a
  37. distinct address to match an existing one" note below. -RD]
  38. [I think, if we get a replay, we need to sen back the same
  39. answer as we did the first time, not say "try again."
  40. Otherwise we need to worry that an attacker can keep people
  41. from getting bridges by preemtively asking for them,
  42. or that an attacker may force them to prove they haven't
  43. gotten any bridges by asking. -NM]
  44. [While we're at it, if we do the replay cache thing and don't need
  45. repeatable answers, we could just pick K random answers from the
  46. pool. Is it beneficial that a bridge user who knows about a clump of
  47. nodes will be sharing them with other users who know about a similar
  48. (overlapping) clump? One good aspect is against an adversary who
  49. learns about a clump this way and watches those bridges to learn
  50. other users and discover *their* bridges: he doesn't learn about
  51. as many new bridges as he might if they were randomly distributed.
  52. A drawback is against an adversary who happens to pick two email
  53. addresses in P that include overlapping answers: he can measure
  54. the difference in clumps and estimate how quickly the bridge pool
  55. is growing. -RD]
  56. [Random is one more darn thing to implement; rings are already
  57. there. -NM]
  58. [If we make the period P be mailbox-specific, and make it a random
  59. value around some mean, then we make it harder for an attacker to
  60. know when to try using his small army of gmail addresses to gather
  61. another harvest. But we also make it harder for users to know when
  62. they can try again. -RD]
  63. [Letting the users know about when they can try again seems
  64. worthwhile. Otherwise users and attackers will all probe and
  65. probe and probe until they get an answer. No additional
  66. security will be achieved, but bandwidth will be lost. -NM]
  67. To normalize an email address:
  68. Start with the RFC822 address. Consider only the mailbox {???}
  69. portion of the address (username@domain). Put this into lowercase
  70. ascii.
  71. Questions:
  72. What to do with weird character encodings? Look up the RFC.
  73. Notes:
  74. Make sure that you can't force a single email address to appear
  75. in lots of different ways. IOW, if nickm@freehaven.net and
  76. NICKM@freehaven.net aren't treated the same, then I can get lots
  77. more bridges than I should.
  78. Make sure you can't construct a distinct address to match an
  79. existing one. IOW, if we treat nickm@X and nickm@Y as the same
  80. user, then anybody can register nickm@Z and use it to tell which
  81. bridges nickm@X got (or would get).
  82. Make sure that we actually check headers so we can't be trivially
  83. used to spam people.
  84. 2. IP-based.
  85. Goal: avoid handing out all the bridges to users in a similar IP
  86. space and time.
  87. Parameters:
  88. T_Flush -- how long it should take a user on a single network to
  89. see a whole cluster of bridges.
  90. N_C
  91. K -- the number of bridges we hand out in response to a single
  92. request.
  93. Setup: using an AS map or a geoip map or some other flawed input
  94. source, divide IP space into "areas" such that surveying a large
  95. collection of "areas" is hard. For v0, use /24 address blocks.
  96. Group areas into N_C clusters.
  97. Generate secrets L, M, N.
  98. Set the period P such that P*(bridges-per-cluster/K) = T_flush.
  99. Don't set P to greater than a week, or less than three hours.
  100. When we get a bridge:
  101. Based on HMAC(L,ID), assign the bridge to a cluster. Within each
  102. cluster, keep the bridges in a ring based on HMAC(M,ID).
  103. [Should we re-sort the rings for each new time period, so the ring
  104. for a given cluster is based on HMAC(M,PS|ID)? -RD]
  105. When we get a connection:
  106. If it's http, redirect it to https.
  107. Let area be the incoming IP network. Let PS be the current
  108. period. Compute X = HMAC(N, PS|area). Return the next K bridges
  109. in the ring after X.
  110. [Don't we want to compute C = HMAC(key, area) to learn what cluster
  111. to answer from, and then X = HMAC(key, PS|area) to pick a point in
  112. that ring? -RD]
  113. Need to clarify that some HMACs are for rings, and some are for
  114. partitions. How rings scale is clear. How do we grow the number of
  115. partitions? Looking at successive bits from the HMAC output is one way.
  116. 3. Open issues
  117. Denial of service attacks
  118. A good view of network topology
  119. at some point we should learn some reliability stats on our bridges. when
  120. we say above 'give out k bridges', we might give out 2 reliable ones and
  121. k-2 others. we count around the ring the same way we do now, to find them.