|
@@ -0,0 +1,384 @@
|
|
|
+
|
|
|
+Title: Initial thoughts on migrating Tor to new cryptography
|
|
|
+Author: Nick Mathewson
|
|
|
+Created: 12 December 2010
|
|
|
+
|
|
|
+1. Introduction
|
|
|
+
|
|
|
+ Tor currently uses AES-128, RSA-1024, and SHA1. Even though these
|
|
|
+ ciphers were a decent choice back in 2003, and even though attacking
|
|
|
+ these algorithms is by no means the best way for a well-funded
|
|
|
+ adversary to attack users (correlation attacks are still cheaper, even
|
|
|
+ with pessimistic assumptions about the security of each cipher), we
|
|
|
+ will want to move to better algorithms in the future. Indeed, if
|
|
|
+ migrating to a new ciphersuite were simple, we would probably have
|
|
|
+ already moved to RSA-1024/AES-128/SHA256 or something like that.
|
|
|
+
|
|
|
+ So it's a good idea to start figuring out how we can move to better
|
|
|
+ ciphers. Unfortunately, this is a bit nontrivial, so before we start
|
|
|
+ doing the design work here, we should start by examining the issues
|
|
|
+ involved. Robert Ransom and I both decided to spend this weekend
|
|
|
+ writing up documents of this type so that we can see how much two
|
|
|
+ people working independently agree on. I know more Tor than Robert;
|
|
|
+ Robert knows far more cryptography than I do. With luck we'll
|
|
|
+ complement each other's work nicely.
|
|
|
+
|
|
|
+ A note on scope: This document WILL NOT attempt to pick a new cipher
|
|
|
+ or set of ciphers. Instead, it's about how to migrate to new ciphers
|
|
|
+ in general. Any algorithms mentioned other than those we use today
|
|
|
+ are just for illustration.
|
|
|
+
|
|
|
+ Also, I don't much consider the importance of updating each particular
|
|
|
+ usage; only the methods that you'd use to do it.
|
|
|
+
|
|
|
+ Also, this isn't a complete proposal.
|
|
|
+
|
|
|
+2. General principles and tricks
|
|
|
+
|
|
|
+ Before I get started, let's talk about some general design issues.
|
|
|
+
|
|
|
+2.1. Many algorithms or few?
|
|
|
+
|
|
|
+ Protocols like TLS and OpenPGP allow a wide choice of cryptographic
|
|
|
+ algorithms; so long as the sender and receiver (or the responder and
|
|
|
+ initiator) have at least one mutually acceptable algorithm, they can
|
|
|
+ converge upon it and send each other messages.
|
|
|
+
|
|
|
+ This isn't the best choice for anonymity designs. If two clients
|
|
|
+ support a different set of algorithms, then an attacker can tell them
|
|
|
+ apart. A protocol with N ciphersuites would in principle split
|
|
|
+ clients into 2**N-1 sets. (In practice, nearly all users will use the
|
|
|
+ default, and most users who choose _not_ to use the default will do so
|
|
|
+ without considering the loss of anonymity. See "Anonymity Loves
|
|
|
+ Company: Usability and the Network Effect".)
|
|
|
+
|
|
|
+ On the other hand, building only one ciphersuite into Tor has a flaw
|
|
|
+ of its own: it has proven difficult to migrate to another one. So
|
|
|
+ perhaps instead of specifying only a single new ciphersuite, we should
|
|
|
+ specify more than one, with plans to switch over (based on a flag in
|
|
|
+ the consensus or some other secure signal) once the first choice of
|
|
|
+ algorithms start looking iffy. This switch-based approach would seem
|
|
|
+ especially easy for parameterizable stuff like key sizes.
|
|
|
+
|
|
|
+2.2. Waiting for old clients and servers to upgrade
|
|
|
+
|
|
|
+ The easiest way to implement a shift in algorithms would be to declare
|
|
|
+ a "flag day": once we have the new versions of the protocols
|
|
|
+ implemented, pick a day by which everybody must upgrade to the new
|
|
|
+ software. Before this day, the software would have the old behavior;
|
|
|
+ after this way, it would use the improved behavior.
|
|
|
+
|
|
|
+ Tor tries to avoid flag days whenever possible; they have well-known
|
|
|
+ issues. First, since a number of our users don't automatically
|
|
|
+ update, it can take a while for people to upgrade to new versions of
|
|
|
+ our software. Second and more worryingly, it's hard to get adequate
|
|
|
+ testing for new behavior that is off-by-default. Flag days in other
|
|
|
+ systems have been known to leave whole networks more or less
|
|
|
+ inoperable for months; we should not trust in our skill to avoid
|
|
|
+ similar problems.
|
|
|
+
|
|
|
+ So if we're avoiding flag days, what can we do?
|
|
|
+
|
|
|
+ * We can add _support_ for new behavior early, and have clients use it
|
|
|
+ where it's available. (Clients know the advertised versions of the
|
|
|
+ Tor servers they use-- but see 2.3 below for a danger here, and 2.4
|
|
|
+ for a bigger danger.)
|
|
|
+
|
|
|
+ * We can remove misfeatures that _prevent_ deployment of new
|
|
|
+ behavior. For instance, if a certain key length has an arbitrary
|
|
|
+ 1024-bit limit, we can remove that arbitrary limitation.
|
|
|
+
|
|
|
+ * Once an optional new behavior is ubiquitous enough, the authorities
|
|
|
+ can stop accepting descriptors from servers that do not have it
|
|
|
+ until they upgrade.
|
|
|
+
|
|
|
+ It is far easier to remove arbitrary limitations than to make other
|
|
|
+ changes; such changes are generally safe to back-port to older stable
|
|
|
+ release series. But in general, it's much better to avoid any plans
|
|
|
+ that require waiting for any version of Tor to no longer be in common
|
|
|
+ use: a stable release can take on the order of 2.5 years to start
|
|
|
+ dropping off the radar. Thandy might fix that, but even if a perfect
|
|
|
+ Thandy release comes out tomorrow, we'll still have lots of older
|
|
|
+ clients and relays not using it.
|
|
|
+
|
|
|
+ We'll have to approach the migration problem on a case-by-case basis
|
|
|
+ as we consider the algorithms used by Tor and how to change them.
|
|
|
+
|
|
|
+2.3. Early adopters and other partitioning dangers
|
|
|
+
|
|
|
+ It's pretty much unavoidable that clients running software that speak
|
|
|
+ the new version of any protocol will be distinguishable from those
|
|
|
+ that cannot speak the new version. This is inevitable, though we
|
|
|
+ could try to minimize the number of such partitioning sets by having
|
|
|
+ features turned on in the same release rather than one-at-a-time.
|
|
|
+
|
|
|
+ Another option here is to have new protocols controlled by a
|
|
|
+ configuration tri-state with values "on", "off", and "auto". The
|
|
|
+ "auto" value means to look at the consensus to decide wither to use
|
|
|
+ the feature; the other two values are self-explanatory. We'd ship
|
|
|
+ clients with the feature set to "auto" by default, with people only
|
|
|
+ using "on" for testing.
|
|
|
+
|
|
|
+ If we're worried about early client-side implementations of a protocol
|
|
|
+ turning out to be broken, we can have the consensus value say _which_
|
|
|
+ versions should turn on the protocol.
|
|
|
+
|
|
|
+2.4. Avoid whole-circuit switches
|
|
|
+
|
|
|
+ One risky kind of protocol migration is a feature that gets used only
|
|
|
+ when all the routers in a circuit support it. If such a feature is
|
|
|
+ implemented by few relays, then each relay learns a lot about the rest
|
|
|
+ of the path by seeing it used. On the other hand, if the feature is
|
|
|
+ implemented by most relays, then a relay learns a lot about the rest of
|
|
|
+ the path when the feature is *not* used.
|
|
|
+
|
|
|
+ It's okay to have a feature that can be only used if two consecutive
|
|
|
+ routers in the patch support it: each router knows the ones adjacent
|
|
|
+ to it, after all, so knowing what version of Tor they're running is no
|
|
|
+ big deal.
|
|
|
+
|
|
|
+2.5. The Second System Effect rears its ugly head
|
|
|
+
|
|
|
+ Any attempt at improving Tor's crypto is likely to involve changes
|
|
|
+ throughout the Tor protocol. We should be aware of the risks of
|
|
|
+ falling into what Fred Brooks called the "Second System Effect": when
|
|
|
+ redesigning a fielded system, it's always tempting to try to shovel in
|
|
|
+ every possible change that one ever wanted to make to it.
|
|
|
+
|
|
|
+ This is a fine time to make parts of our protocol that weren't
|
|
|
+ previously versionable into ones that are easier to upgrade in the
|
|
|
+ future. This probably _isn't_ time to redesign every aspect of the
|
|
|
+ Tor protocol that anybody finds problematic.
|
|
|
+
|
|
|
+2.6. Low-hanging fruit and well-lit areas
|
|
|
+
|
|
|
+ Not all parts of Tor are tightly covered. If it's possible to upgrade
|
|
|
+ different parts of the system at different rates from one another, we
|
|
|
+ should consider doing the stuff we can do easier, earlier.
|
|
|
+
|
|
|
+ But remember the story of the policeman who finds a drunk under a
|
|
|
+ streetlamp, staring at the ground? The cop asks, "What are you
|
|
|
+ doing?" The drunk says, "I'm looking for my keys!" "Oh, did you drop
|
|
|
+ them around here?" says the policeman. "No," says the drunk, "But the
|
|
|
+ light is so much better here!"
|
|
|
+
|
|
|
+ Or less proverbially: Simply because a change is easiest, does not
|
|
|
+ mean it is the best use of our time. We should avoid getting bogged
|
|
|
+ down solving the _easy_ aspects of our system unless they happen also
|
|
|
+ to be _important_.
|
|
|
+
|
|
|
+2.7. Nice safe boring codes
|
|
|
+
|
|
|
+ Let's avoid, to the extent that we can:
|
|
|
+ - being the primary user of any cryptographic construction or
|
|
|
+ protocol.
|
|
|
+ - anything that hasn't gotten much attention in the literature.
|
|
|
+ - anything we would have to implement from scratch
|
|
|
+ - anything without a nice BSD-licensed C implementation
|
|
|
+
|
|
|
+ Sometimes we'll have the choice of a more efficient algorithm or a
|
|
|
+ more boring & well-analyzed one. We should not even consider trading
|
|
|
+ conservative design for efficiency unless we are firmly in the
|
|
|
+ critical path.
|
|
|
+
|
|
|
+2.8. Key restrictions
|
|
|
+
|
|
|
+ Our spec says that RSA exponents should be 65537, but our code never
|
|
|
+ checks for that. If we want to bolster resistance against collision
|
|
|
+ attacks, we could check this requirement. To the best of my
|
|
|
+ knowledge, nothing violates it except for tools like "shallot" that
|
|
|
+ generate cute memorable .onion names. If we want to be nice to
|
|
|
+ shallot users, we could check the requirement for everything *except*
|
|
|
+ hidden service identity keys.
|
|
|
+
|
|
|
+3. Aspects of Tor's cryptography, and thoughts on how to upgrade them all
|
|
|
+
|
|
|
+3.1. Link cryptography
|
|
|
+
|
|
|
+ Tor uses TLS for its link cryptography; it is easy to add more
|
|
|
+ ciphersuites to the acceptable list, or increase the length of
|
|
|
+ link-crypto public keys, or increase the length of the DH parameter,
|
|
|
+ or sign the X509 certificates with any digest algorithm that OpenSSL
|
|
|
+ clients will support. Current Tor versions do not check any of these
|
|
|
+ against expected values.
|
|
|
+
|
|
|
+ The identity key used to sign the second certificate in the current
|
|
|
+ handshake protocol, however, is harder to change, since it needs to
|
|
|
+ match up with what we see in the router descriptor for the router
|
|
|
+ we're connecting to. See notes on router identity below. So long as
|
|
|
+ the certificate chain is ultimately authenticated by a RSA-1024 key,
|
|
|
+ it's not clear whether making the link RSA key longer on its own
|
|
|
+ really improves matters or not.
|
|
|
+
|
|
|
+ Recall also that for anti-fingerprinting reasons, we're thinking of
|
|
|
+ revising the protocol handshake sometime in the 0.2.3.x timeframe.
|
|
|
+ If we do that, that might be a good time to make sure that we aren't
|
|
|
+ limited by the old identity key size.
|
|
|
+
|
|
|
+3.2. Circuit-extend crypto
|
|
|
+
|
|
|
+ Currently, our code requires RSA onion keys to be 1024 bits long.
|
|
|
+ Additionally, current nodes will not deliver an EXTEND cell unless it
|
|
|
+ is the right length.
|
|
|
+
|
|
|
+ For this, we might add a second, longer onion-key to router
|
|
|
+ descriptors, and a second CREATE2 cell to open new circuits
|
|
|
+ using this key type. It should contain not only the onionskin, but
|
|
|
+ also information on onionskin version and ciphersuite. Onionskins
|
|
|
+ generated for CREATE2 cells should use a larger DH group as well, and
|
|
|
+ keys should be derived from DH results using a better digest algorithm.
|
|
|
+
|
|
|
+ We should remove the length limit on EXTEND cells, backported to all
|
|
|
+ supported stable versions; call these "EXTEND2" cells. Call these
|
|
|
+ "lightly patched". Clients could use the new EXTEND2/CREATE2 format
|
|
|
+ whenever using a lightly patched or new server to extend to a new
|
|
|
+ server, and the old EXTEND/CREATE format otherwise.
|
|
|
+
|
|
|
+ The new onion skin format should try to avoid the design oddities of
|
|
|
+ our old one. Instead of its current iffy hybrid encryption scheme, it
|
|
|
+ should probably do something more like a BEAR/LIONESS operation with a
|
|
|
+ fixed key on the g^x value, followed by a public key encryption on the
|
|
|
+ start of the encrypted data. (Robert reminded me about this
|
|
|
+ construction.)
|
|
|
+
|
|
|
+ The current EXTEND cell format ends with a router identity
|
|
|
+ fingerprint, which is used by the extended-from router to authenticate
|
|
|
+ the extended-to router when it connects. Changes to this will
|
|
|
+ interact with changes to how long an identity key can be and to the
|
|
|
+ link protocol; see notes on the link protocol above and about router
|
|
|
+ identity below.
|
|
|
+
|
|
|
+3.2.1. Circuit-extend crypto: fast case
|
|
|
+
|
|
|
+ When we do unauthenticated circuit extends with CREATE/CREATED_FAST,
|
|
|
+ the two input values are combined with SHA1. I believe that's okay;
|
|
|
+ using any entropy here at all is overkill.
|
|
|
+
|
|
|
+3.3. Relay crypto
|
|
|
+
|
|
|
+ Upon receiving relay cells, a router transforms the payload portion of
|
|
|
+ the cell with the appropriate key appropriate key, sees if it
|
|
|
+ recognizes the cell (the recognized field is zero, the digest field is
|
|
|
+ correct, the cell is outbound), and passes them on if not. It is
|
|
|
+ possible for each hop in the circuit to handle the relay crypto
|
|
|
+ differently; nobody but the client and the hop in question need to
|
|
|
+ coordinate their operations.
|
|
|
+
|
|
|
+ It's not clear, though, whether updating the relay crypto algorithms
|
|
|
+ would help anything, unless we changed the whole relay cell processing
|
|
|
+ format too. The stream cipher is good enough, and the use of 4 bytes
|
|
|
+ of digest does not have enough bits to provide cryptographic strength,
|
|
|
+ no matter what cipher we use.
|
|
|
+
|
|
|
+ This is the likeliest area for the second-system effect to strike;
|
|
|
+ there are lots of opportunities to try to be more clever than we are
|
|
|
+ now.
|
|
|
+
|
|
|
+3.4. Router identity
|
|
|
+
|
|
|
+ This is one of the hardest things to change. Right now, routers are
|
|
|
+ identified by a "fingerprint" equal to the SHA1 hash of their 1024-bit
|
|
|
+ identity key as given in their router descriptor. No existing Tor
|
|
|
+ will accept any other size of identity key, or any other hash
|
|
|
+ algorithm. The identity key itself is used:
|
|
|
+ - To sign the router descriptors
|
|
|
+ - To sign link-key certificates
|
|
|
+ - To determine the least significant bits of circuit IDs used on a
|
|
|
+ Tor instance's links (see tor-spec §5.1)
|
|
|
+
|
|
|
+ The fingerprint is used:
|
|
|
+ - To identify a router identity key in EXTEND cells
|
|
|
+ - To identify a router identity key in bridge lines
|
|
|
+ - Throughout the controller interface
|
|
|
+ - To fetch bridge descriptors for a bridge
|
|
|
+ - To identify a particular router throughout the codebase
|
|
|
+ - In the .exit notation.
|
|
|
+ - By the controller to identify nodes
|
|
|
+ - To identify servers in the logs
|
|
|
+ - Probably other places too
|
|
|
+
|
|
|
+ To begin to allow other key types, key lengths, and hash functions, we
|
|
|
+ would either need to wait till all current Tors are obsolete, or allow
|
|
|
+ routers to have more than one identity for a while.
|
|
|
+
|
|
|
+ To allow routers to have more than one identity, we need to
|
|
|
+ cross-certify identity keys. We can do this trivially, in theory, by
|
|
|
+ listing both keys in the router descriptor and having both identities
|
|
|
+ sign the descriptor. In practice, we will need to analyze this pretty
|
|
|
+ carefully to avoid attacks where one key is completely fake aimed to
|
|
|
+ trick old clients somehow.
|
|
|
+
|
|
|
+ Upgrading the hash algorithm once would be easy: just say that all
|
|
|
+ new-type keys get hashed using the new hash algorithm. Remaining
|
|
|
+ future-proof could be tricky.
|
|
|
+
|
|
|
+ This is one of the hardest areas to update; "SHA1 of identity key" is
|
|
|
+ assumed in so many places throughout Tor that we'll probably need a
|
|
|
+ lot of design work to work with something else.
|
|
|
+
|
|
|
+3.5. Directory objects
|
|
|
+
|
|
|
+ Fortunately, the problem is not so bad for consensuses themselves,
|
|
|
+ because:
|
|
|
+ - Authority identity keys are allowed to be RSA keys of any length;
|
|
|
+ in practice I think they are all 3072 bits.
|
|
|
+ - Authority signing keys are also allowed to be of any length.
|
|
|
+ AFAIK the code works with longer signing keys just fine.
|
|
|
+ - Currently, votes are hashed with both sha1 and sha256; adding
|
|
|
+ more hash algorithms isn't so hard.
|
|
|
+ - Microdescriptor consensuses are all signed using sha256. While
|
|
|
+ regular consensuses are signed using sha1, exploitable collisions
|
|
|
+ are hard to come up with, since once you had a collision, you
|
|
|
+ would need to get a majority of other authorities to agree to
|
|
|
+ generate it.
|
|
|
+
|
|
|
+ Router descriptors are currently identified by SHA1 digests of their
|
|
|
+ identity keys and descriptor digests in regular consensuses, and by
|
|
|
+ SHA1 digests of identity keys and SHA256 digests of microdescriptors
|
|
|
+ in microdesc consensuses. The consensus-flavors design allows us to
|
|
|
+ generate new flavors of consensus that identity routers by new hashes
|
|
|
+ of their identity keys. Alternatively, existing consensuses could be
|
|
|
+ expanded to contain more hashes, though that would have some space
|
|
|
+ concerns.
|
|
|
+
|
|
|
+ Router descriptors themselves are signed using RSA-1024 identity keys
|
|
|
+ and SHA1. For information on updating identity keys, see above.
|
|
|
+
|
|
|
+ Router descriptors and extra-info documents cross-certify one another
|
|
|
+ using SHA1.
|
|
|
+
|
|
|
+ Microdescriptors are currently specified to contain exactly one
|
|
|
+ onion key, of length 1024 bits.
|
|
|
+
|
|
|
+3.6. The directory protocol
|
|
|
+
|
|
|
+ Most objects are indexed by SHA1 hash of an identity key or a
|
|
|
+ descriptor object. Adding more hash types wouldn't be a huge problem
|
|
|
+ at the directory cache level.
|
|
|
+
|
|
|
+3.7. The hidden service protocol
|
|
|
+
|
|
|
+ Hidden services self-identify by a 1024-bit RSA key. Other key
|
|
|
+ lengths are not supported. This key is turned into an 80 bit half
|
|
|
+ SHA-1 hash for hidden service names.
|
|
|
+
|
|
|
+ The most simple change here would be to set an interface for putting
|
|
|
+ the whole ugly SHA1 hash in the hidden service name. Remember that
|
|
|
+ this needs to coexist with the authentication system which also uses
|
|
|
+ .onion hostnames; that hostnames top out around 255 characters and and
|
|
|
+ their components top out at 63.
|
|
|
+
|
|
|
+ Currently, ESTABLISH_INTRO cells take a key length parameter, so in
|
|
|
+ theory they allow longer keys. The rest of the protocol assumes that
|
|
|
+ this will be hashed into a 20-byte SHA1 identifier. Changing that
|
|
|
+ would require changes at the introduction point as well as the hidden
|
|
|
+ service.
|
|
|
+
|
|
|
+ The parsing code for hidden service descriptors currently enforce a
|
|
|
+ 1024-bit identity key, though this does not seem to be described in
|
|
|
+ the specification. Changing that would be at least as hard as doing
|
|
|
+ it for regular identity keys.
|
|
|
+
|
|
|
+ Fortunately, hidden services are nearly completely orthogonal to
|
|
|
+ everything else.
|
|
|
+
|