xxx-crypto-migration.txt 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384
  1. Title: Initial thoughts on migrating Tor to new cryptography
  2. Author: Nick Mathewson
  3. Created: 12 December 2010
  4. 1. Introduction
  5. Tor currently uses AES-128, RSA-1024, and SHA1. Even though these
  6. ciphers were a decent choice back in 2003, and even though attacking
  7. these algorithms is by no means the best way for a well-funded
  8. adversary to attack users (correlation attacks are still cheaper, even
  9. with pessimistic assumptions about the security of each cipher), we
  10. will want to move to better algorithms in the future. Indeed, if
  11. migrating to a new ciphersuite were simple, we would probably have
  12. already moved to RSA-1024/AES-128/SHA256 or something like that.
  13. So it's a good idea to start figuring out how we can move to better
  14. ciphers. Unfortunately, this is a bit nontrivial, so before we start
  15. doing the design work here, we should start by examining the issues
  16. involved. Robert Ransom and I both decided to spend this weekend
  17. writing up documents of this type so that we can see how much two
  18. people working independently agree on. I know more Tor than Robert;
  19. Robert knows far more cryptography than I do. With luck we'll
  20. complement each other's work nicely.
  21. A note on scope: This document WILL NOT attempt to pick a new cipher
  22. or set of ciphers. Instead, it's about how to migrate to new ciphers
  23. in general. Any algorithms mentioned other than those we use today
  24. are just for illustration.
  25. Also, I don't much consider the importance of updating each particular
  26. usage; only the methods that you'd use to do it.
  27. Also, this isn't a complete proposal.
  28. 2. General principles and tricks
  29. Before I get started, let's talk about some general design issues.
  30. 2.1. Many algorithms or few?
  31. Protocols like TLS and OpenPGP allow a wide choice of cryptographic
  32. algorithms; so long as the sender and receiver (or the responder and
  33. initiator) have at least one mutually acceptable algorithm, they can
  34. converge upon it and send each other messages.
  35. This isn't the best choice for anonymity designs. If two clients
  36. support a different set of algorithms, then an attacker can tell them
  37. apart. A protocol with N ciphersuites would in principle split
  38. clients into 2**N-1 sets. (In practice, nearly all users will use the
  39. default, and most users who choose _not_ to use the default will do so
  40. without considering the loss of anonymity. See "Anonymity Loves
  41. Company: Usability and the Network Effect".)
  42. On the other hand, building only one ciphersuite into Tor has a flaw
  43. of its own: it has proven difficult to migrate to another one. So
  44. perhaps instead of specifying only a single new ciphersuite, we should
  45. specify more than one, with plans to switch over (based on a flag in
  46. the consensus or some other secure signal) once the first choice of
  47. algorithms start looking iffy. This switch-based approach would seem
  48. especially easy for parameterizable stuff like key sizes.
  49. 2.2. Waiting for old clients and servers to upgrade
  50. The easiest way to implement a shift in algorithms would be to declare
  51. a "flag day": once we have the new versions of the protocols
  52. implemented, pick a day by which everybody must upgrade to the new
  53. software. Before this day, the software would have the old behavior;
  54. after this way, it would use the improved behavior.
  55. Tor tries to avoid flag days whenever possible; they have well-known
  56. issues. First, since a number of our users don't automatically
  57. update, it can take a while for people to upgrade to new versions of
  58. our software. Second and more worryingly, it's hard to get adequate
  59. testing for new behavior that is off-by-default. Flag days in other
  60. systems have been known to leave whole networks more or less
  61. inoperable for months; we should not trust in our skill to avoid
  62. similar problems.
  63. So if we're avoiding flag days, what can we do?
  64. * We can add _support_ for new behavior early, and have clients use it
  65. where it's available. (Clients know the advertised versions of the
  66. Tor servers they use-- but see 2.3 below for a danger here, and 2.4
  67. for a bigger danger.)
  68. * We can remove misfeatures that _prevent_ deployment of new
  69. behavior. For instance, if a certain key length has an arbitrary
  70. 1024-bit limit, we can remove that arbitrary limitation.
  71. * Once an optional new behavior is ubiquitous enough, the authorities
  72. can stop accepting descriptors from servers that do not have it
  73. until they upgrade.
  74. It is far easier to remove arbitrary limitations than to make other
  75. changes; such changes are generally safe to back-port to older stable
  76. release series. But in general, it's much better to avoid any plans
  77. that require waiting for any version of Tor to no longer be in common
  78. use: a stable release can take on the order of 2.5 years to start
  79. dropping off the radar. Thandy might fix that, but even if a perfect
  80. Thandy release comes out tomorrow, we'll still have lots of older
  81. clients and relays not using it.
  82. We'll have to approach the migration problem on a case-by-case basis
  83. as we consider the algorithms used by Tor and how to change them.
  84. 2.3. Early adopters and other partitioning dangers
  85. It's pretty much unavoidable that clients running software that speak
  86. the new version of any protocol will be distinguishable from those
  87. that cannot speak the new version. This is inevitable, though we
  88. could try to minimize the number of such partitioning sets by having
  89. features turned on in the same release rather than one-at-a-time.
  90. Another option here is to have new protocols controlled by a
  91. configuration tri-state with values "on", "off", and "auto". The
  92. "auto" value means to look at the consensus to decide wither to use
  93. the feature; the other two values are self-explanatory. We'd ship
  94. clients with the feature set to "auto" by default, with people only
  95. using "on" for testing.
  96. If we're worried about early client-side implementations of a protocol
  97. turning out to be broken, we can have the consensus value say _which_
  98. versions should turn on the protocol.
  99. 2.4. Avoid whole-circuit switches
  100. One risky kind of protocol migration is a feature that gets used only
  101. when all the routers in a circuit support it. If such a feature is
  102. implemented by few relays, then each relay learns a lot about the rest
  103. of the path by seeing it used. On the other hand, if the feature is
  104. implemented by most relays, then a relay learns a lot about the rest of
  105. the path when the feature is *not* used.
  106. It's okay to have a feature that can be only used if two consecutive
  107. routers in the patch support it: each router knows the ones adjacent
  108. to it, after all, so knowing what version of Tor they're running is no
  109. big deal.
  110. 2.5. The Second System Effect rears its ugly head
  111. Any attempt at improving Tor's crypto is likely to involve changes
  112. throughout the Tor protocol. We should be aware of the risks of
  113. falling into what Fred Brooks called the "Second System Effect": when
  114. redesigning a fielded system, it's always tempting to try to shovel in
  115. every possible change that one ever wanted to make to it.
  116. This is a fine time to make parts of our protocol that weren't
  117. previously versionable into ones that are easier to upgrade in the
  118. future. This probably _isn't_ time to redesign every aspect of the
  119. Tor protocol that anybody finds problematic.
  120. 2.6. Low-hanging fruit and well-lit areas
  121. Not all parts of Tor are tightly covered. If it's possible to upgrade
  122. different parts of the system at different rates from one another, we
  123. should consider doing the stuff we can do easier, earlier.
  124. But remember the story of the policeman who finds a drunk under a
  125. streetlamp, staring at the ground? The cop asks, "What are you
  126. doing?" The drunk says, "I'm looking for my keys!" "Oh, did you drop
  127. them around here?" says the policeman. "No," says the drunk, "But the
  128. light is so much better here!"
  129. Or less proverbially: Simply because a change is easiest, does not
  130. mean it is the best use of our time. We should avoid getting bogged
  131. down solving the _easy_ aspects of our system unless they happen also
  132. to be _important_.
  133. 2.7. Nice safe boring codes
  134. Let's avoid, to the extent that we can:
  135. - being the primary user of any cryptographic construction or
  136. protocol.
  137. - anything that hasn't gotten much attention in the literature.
  138. - anything we would have to implement from scratch
  139. - anything without a nice BSD-licensed C implementation
  140. Sometimes we'll have the choice of a more efficient algorithm or a
  141. more boring & well-analyzed one. We should not even consider trading
  142. conservative design for efficiency unless we are firmly in the
  143. critical path.
  144. 2.8. Key restrictions
  145. Our spec says that RSA exponents should be 65537, but our code never
  146. checks for that. If we want to bolster resistance against collision
  147. attacks, we could check this requirement. To the best of my
  148. knowledge, nothing violates it except for tools like "shallot" that
  149. generate cute memorable .onion names. If we want to be nice to
  150. shallot users, we could check the requirement for everything *except*
  151. hidden service identity keys.
  152. 3. Aspects of Tor's cryptography, and thoughts on how to upgrade them all
  153. 3.1. Link cryptography
  154. Tor uses TLS for its link cryptography; it is easy to add more
  155. ciphersuites to the acceptable list, or increase the length of
  156. link-crypto public keys, or increase the length of the DH parameter,
  157. or sign the X509 certificates with any digest algorithm that OpenSSL
  158. clients will support. Current Tor versions do not check any of these
  159. against expected values.
  160. The identity key used to sign the second certificate in the current
  161. handshake protocol, however, is harder to change, since it needs to
  162. match up with what we see in the router descriptor for the router
  163. we're connecting to. See notes on router identity below. So long as
  164. the certificate chain is ultimately authenticated by a RSA-1024 key,
  165. it's not clear whether making the link RSA key longer on its own
  166. really improves matters or not.
  167. Recall also that for anti-fingerprinting reasons, we're thinking of
  168. revising the protocol handshake sometime in the 0.2.3.x timeframe.
  169. If we do that, that might be a good time to make sure that we aren't
  170. limited by the old identity key size.
  171. 3.2. Circuit-extend crypto
  172. Currently, our code requires RSA onion keys to be 1024 bits long.
  173. Additionally, current nodes will not deliver an EXTEND cell unless it
  174. is the right length.
  175. For this, we might add a second, longer onion-key to router
  176. descriptors, and a second CREATE2 cell to open new circuits
  177. using this key type. It should contain not only the onionskin, but
  178. also information on onionskin version and ciphersuite. Onionskins
  179. generated for CREATE2 cells should use a larger DH group as well, and
  180. keys should be derived from DH results using a better digest algorithm.
  181. We should remove the length limit on EXTEND cells, backported to all
  182. supported stable versions; call these "EXTEND2" cells. Call these
  183. "lightly patched". Clients could use the new EXTEND2/CREATE2 format
  184. whenever using a lightly patched or new server to extend to a new
  185. server, and the old EXTEND/CREATE format otherwise.
  186. The new onion skin format should try to avoid the design oddities of
  187. our old one. Instead of its current iffy hybrid encryption scheme, it
  188. should probably do something more like a BEAR/LIONESS operation with a
  189. fixed key on the g^x value, followed by a public key encryption on the
  190. start of the encrypted data. (Robert reminded me about this
  191. construction.)
  192. The current EXTEND cell format ends with a router identity
  193. fingerprint, which is used by the extended-from router to authenticate
  194. the extended-to router when it connects. Changes to this will
  195. interact with changes to how long an identity key can be and to the
  196. link protocol; see notes on the link protocol above and about router
  197. identity below.
  198. 3.2.1. Circuit-extend crypto: fast case
  199. When we do unauthenticated circuit extends with CREATE/CREATED_FAST,
  200. the two input values are combined with SHA1. I believe that's okay;
  201. using any entropy here at all is overkill.
  202. 3.3. Relay crypto
  203. Upon receiving relay cells, a router transforms the payload portion of
  204. the cell with the appropriate key appropriate key, sees if it
  205. recognizes the cell (the recognized field is zero, the digest field is
  206. correct, the cell is outbound), and passes them on if not. It is
  207. possible for each hop in the circuit to handle the relay crypto
  208. differently; nobody but the client and the hop in question need to
  209. coordinate their operations.
  210. It's not clear, though, whether updating the relay crypto algorithms
  211. would help anything, unless we changed the whole relay cell processing
  212. format too. The stream cipher is good enough, and the use of 4 bytes
  213. of digest does not have enough bits to provide cryptographic strength,
  214. no matter what cipher we use.
  215. This is the likeliest area for the second-system effect to strike;
  216. there are lots of opportunities to try to be more clever than we are
  217. now.
  218. 3.4. Router identity
  219. This is one of the hardest things to change. Right now, routers are
  220. identified by a "fingerprint" equal to the SHA1 hash of their 1024-bit
  221. identity key as given in their router descriptor. No existing Tor
  222. will accept any other size of identity key, or any other hash
  223. algorithm. The identity key itself is used:
  224. - To sign the router descriptors
  225. - To sign link-key certificates
  226. - To determine the least significant bits of circuit IDs used on a
  227. Tor instance's links (see tor-spec §5.1)
  228. The fingerprint is used:
  229. - To identify a router identity key in EXTEND cells
  230. - To identify a router identity key in bridge lines
  231. - Throughout the controller interface
  232. - To fetch bridge descriptors for a bridge
  233. - To identify a particular router throughout the codebase
  234. - In the .exit notation.
  235. - By the controller to identify nodes
  236. - To identify servers in the logs
  237. - Probably other places too
  238. To begin to allow other key types, key lengths, and hash functions, we
  239. would either need to wait till all current Tors are obsolete, or allow
  240. routers to have more than one identity for a while.
  241. To allow routers to have more than one identity, we need to
  242. cross-certify identity keys. We can do this trivially, in theory, by
  243. listing both keys in the router descriptor and having both identities
  244. sign the descriptor. In practice, we will need to analyze this pretty
  245. carefully to avoid attacks where one key is completely fake aimed to
  246. trick old clients somehow.
  247. Upgrading the hash algorithm once would be easy: just say that all
  248. new-type keys get hashed using the new hash algorithm. Remaining
  249. future-proof could be tricky.
  250. This is one of the hardest areas to update; "SHA1 of identity key" is
  251. assumed in so many places throughout Tor that we'll probably need a
  252. lot of design work to work with something else.
  253. 3.5. Directory objects
  254. Fortunately, the problem is not so bad for consensuses themselves,
  255. because:
  256. - Authority identity keys are allowed to be RSA keys of any length;
  257. in practice I think they are all 3072 bits.
  258. - Authority signing keys are also allowed to be of any length.
  259. AFAIK the code works with longer signing keys just fine.
  260. - Currently, votes are hashed with both sha1 and sha256; adding
  261. more hash algorithms isn't so hard.
  262. - Microdescriptor consensuses are all signed using sha256. While
  263. regular consensuses are signed using sha1, exploitable collisions
  264. are hard to come up with, since once you had a collision, you
  265. would need to get a majority of other authorities to agree to
  266. generate it.
  267. Router descriptors are currently identified by SHA1 digests of their
  268. identity keys and descriptor digests in regular consensuses, and by
  269. SHA1 digests of identity keys and SHA256 digests of microdescriptors
  270. in microdesc consensuses. The consensus-flavors design allows us to
  271. generate new flavors of consensus that identity routers by new hashes
  272. of their identity keys. Alternatively, existing consensuses could be
  273. expanded to contain more hashes, though that would have some space
  274. concerns.
  275. Router descriptors themselves are signed using RSA-1024 identity keys
  276. and SHA1. For information on updating identity keys, see above.
  277. Router descriptors and extra-info documents cross-certify one another
  278. using SHA1.
  279. Microdescriptors are currently specified to contain exactly one
  280. onion key, of length 1024 bits.
  281. 3.6. The directory protocol
  282. Most objects are indexed by SHA1 hash of an identity key or a
  283. descriptor object. Adding more hash types wouldn't be a huge problem
  284. at the directory cache level.
  285. 3.7. The hidden service protocol
  286. Hidden services self-identify by a 1024-bit RSA key. Other key
  287. lengths are not supported. This key is turned into an 80 bit half
  288. SHA-1 hash for hidden service names.
  289. The most simple change here would be to set an interface for putting
  290. the whole ugly SHA1 hash in the hidden service name. Remember that
  291. this needs to coexist with the authentication system which also uses
  292. .onion hostnames; that hostnames top out around 255 characters and and
  293. their components top out at 63.
  294. Currently, ESTABLISH_INTRO cells take a key length parameter, so in
  295. theory they allow longer keys. The rest of the protocol assumes that
  296. this will be hashed into a 20-byte SHA1 identifier. Changing that
  297. would require changes at the introduction point as well as the hidden
  298. service.
  299. The parsing code for hidden service descriptors currently enforce a
  300. 1024-bit identity key, though this does not seem to be described in
  301. the specification. Changing that would be at least as hard as doing
  302. it for regular identity keys.
  303. Fortunately, hidden services are nearly completely orthogonal to
  304. everything else.