13 年之前 · d9e9938b0c
--- a/doc/spec/proposals/ideas/xxx-crypto-migration.txt
+++ b/doc/spec/proposals/ideas/xxx-crypto-migration.txt
@@ -0,0 +1,384 @@
 
				+
			
 
				+Title: Initial thoughts on migrating Tor to new cryptography
			
 
				+Author: Nick Mathewson
			
 
				+Created: 12 December 2010
			
 
				+
			
 
				+1. Introduction
			
 
				+
			
 
				+  Tor currently uses AES-128, RSA-1024, and SHA1.  Even though these
			
 
				+  ciphers were a decent choice back in 2003, and even though attacking
			
 
				+  these algorithms is by no means the best way for a well-funded
			
 
				+  adversary to attack users (correlation attacks are still cheaper, even
			
 
				+  with pessimistic assumptions about the security of each cipher), we
			
 
				+  will want to move to better algorithms in the future.  Indeed, if
			
 
				+  migrating to a new ciphersuite were simple, we would probably have
			
 
				+  already moved to RSA-1024/AES-128/SHA256 or something like that.
			
 
				+
			
 
				+  So it's a good idea to start figuring out how we can move to better
			
 
				+  ciphers.  Unfortunately, this is a bit nontrivial, so before we start
			
 
				+  doing the design work here, we should start by examining the issues
			
 
				+  involved.  Robert Ransom and I both decided to spend this weekend
			
 
				+  writing up documents of this type so that we can see how much two
			
 
				+  people working independently agree on.  I know more Tor than Robert;
			
 
				+  Robert knows far more cryptography than I do.  With luck we'll
			
 
				+  complement each other's work nicely.
			
 
				+
			
 
				+  A note on scope: This document WILL NOT attempt to pick a new cipher
			
 
				+  or set of ciphers.  Instead, it's about how to migrate to new ciphers
			
 
				+  in general.  Any algorithms mentioned other than those we use today
			
 
				+  are just for illustration.
			
 
				+
			
 
				+  Also, I don't much consider the importance of updating each particular
			
 
				+  usage; only the methods that you'd use to do it.
			
 
				+
			
 
				+  Also, this isn't a complete proposal.
			
 
				+
			
 
				+2. General principles and tricks
			
 
				+
			
 
				+  Before I get started, let's talk about some general design issues.
			
 
				+
			
 
				+2.1. Many algorithms or few?
			
 
				+
			
 
				+  Protocols like TLS and OpenPGP allow a wide choice of cryptographic
			
 
				+  algorithms; so long as the sender and receiver (or the responder and
			
 
				+  initiator) have at least one mutually acceptable algorithm, they can
			
 
				+  converge upon it and send each other messages.
			
 
				+
			
 
				+  This isn't the best choice for anonymity designs.  If two clients
			
 
				+  support a different set of algorithms, then an attacker can tell them
			
 
				+  apart.  A protocol with N ciphersuites would in principle split
			
 
				+  clients into 2**N-1 sets.  (In practice, nearly all users will use the
			
 
				+  default, and most users who choose _not_ to use the default will do so
			
 
				+  without considering the loss of anonymity.  See "Anonymity Loves
			
 
				+  Company: Usability and the Network Effect".)
			
 
				+
			
 
				+  On the other hand, building only one ciphersuite into Tor has a flaw
			
 
				+  of its own: it has proven difficult to migrate to another one.  So
			
 
				+  perhaps instead of specifying only a single new ciphersuite, we should
			
 
				+  specify more than one, with plans to switch over (based on a flag in
			
 
				+  the consensus or some other secure signal) once the first choice of
			
 
				+  algorithms start looking iffy.  This switch-based approach would seem
			
 
				+  especially easy for parameterizable stuff like key sizes.
			
 
				+
			
 
				+2.2. Waiting for old clients and servers to upgrade
			
 
				+
			
 
				+  The easiest way to implement a shift in algorithms would be to declare
			
 
				+  a "flag day": once we have the new versions of the protocols
			
 
				+  implemented, pick a day by which everybody must upgrade to the new
			
 
				+  software.  Before this day, the software would have the old behavior;
			
 
				+  after this way, it would use the improved behavior.
			
 
				+
			
 
				+  Tor tries to avoid flag days whenever possible; they have well-known
			
 
				+  issues.  First, since a number of our users don't automatically
			
 
				+  update, it can take a while for people to upgrade to new versions of
			
 
				+  our software.  Second and more worryingly, it's hard to get adequate
			
 
				+  testing for new behavior that is off-by-default.  Flag days in other
			
 
				+  systems have been known to leave whole networks more or less
			
 
				+  inoperable for months; we should not trust in our skill to avoid
			
 
				+  similar problems.
			
 
				+
			
 
				+  So if we're avoiding flag days, what can we do?
			
 
				+
			
 
				+  * We can add _support_ for new behavior early, and have clients use it
			
 
				+    where it's available.  (Clients know the advertised versions of the
			
 
				+    Tor servers they use-- but see 2.3 below for a danger here, and 2.4
			
 
				+    for a bigger danger.)
			
 
				+
			
 
				+  * We can remove misfeatures that _prevent_ deployment of new
			
 
				+    behavior.  For instance, if a certain key length has an arbitrary
			
 
				+    1024-bit limit, we can remove that arbitrary limitation.
			
 
				+
			
 
				+  * Once an optional new behavior is ubiquitous enough, the authorities
			
 
				+    can stop accepting descriptors from servers that do not have it
			
 
				+    until they upgrade.
			
 
				+
			
 
				+  It is far easier to remove arbitrary limitations than to make other
			
 
				+  changes; such changes are generally safe to back-port to older stable
			
 
				+  release series.  But in general, it's much better to avoid any plans
			
 
				+  that require waiting for any version of Tor to no longer be in common
			
 
				+  use: a stable release can take on the order of 2.5 years to start
			
 
				+  dropping off the radar.  Thandy might fix that, but even if a perfect
			
 
				+  Thandy release comes out tomorrow, we'll still have lots of older
			
 
				+  clients and relays not using it.
			
 
				+
			
 
				+  We'll have to approach the migration problem on a case-by-case basis
			
 
				+  as we consider the algorithms used by Tor and how to change them.
			
 
				+
			
 
				+2.3. Early adopters and other partitioning dangers
			
 
				+
			
 
				+  It's pretty much unavoidable that clients running software that speak
			
 
				+  the new version of any protocol will be distinguishable from those
			
 
				+  that cannot speak the new version.  This is inevitable, though we
			
 
				+  could try to minimize the number of such partitioning sets by having
			
 
				+  features turned on in the same release rather than one-at-a-time.
			
 
				+
			
 
				+  Another option here is to have new protocols controlled by a
			
 
				+  configuration tri-state with values "on", "off", and "auto".  The
			
 
				+  "auto" value means to look at the consensus to decide wither to use
			
 
				+  the feature; the other two values are self-explanatory.  We'd ship
			
 
				+  clients with the feature set to "auto" by default, with people only
			
 
				+  using "on" for testing.
			
 
				+
			
 
				+  If we're worried about early client-side implementations of a protocol
			
 
				+  turning out to be broken, we can have the consensus value say _which_
			
 
				+  versions should turn on the protocol.
			
 
				+
			
 
				+2.4. Avoid whole-circuit switches
			
 
				+
			
 
				+  One risky kind of protocol migration is a feature that gets used only
			
 
				+  when all the routers in a circuit support it.  If such a feature is
			
 
				+  implemented by few relays, then each relay learns a lot about the rest
			
 
				+  of the path by seeing it used.  On the other hand, if the feature is
			
 
				+  implemented by most relays, then a relay learns a lot about the rest of
			
 
				+  the path when the feature is *not* used.
			
 
				+
			
 
				+  It's okay to have a feature that can be only used if two consecutive
			
 
				+  routers in the patch support it: each router knows the ones adjacent
			
 
				+  to it, after all, so knowing what version of Tor they're running is no
			
 
				+  big deal.
			
 
				+
			
 
				+2.5. The Second System Effect rears its ugly head
			
 
				+
			
 
				+  Any attempt at improving Tor's crypto is likely to involve changes
			
 
				+  throughout the Tor protocol.  We should be aware of the risks of
			
 
				+  falling into what Fred Brooks called the "Second System Effect": when
			
 
				+  redesigning a fielded system, it's always tempting to try to shovel in
			
 
				+  every possible change that one ever wanted to make to it.
			
 
				+
			
 
				+  This is a fine time to make parts of our protocol that weren't
			
 
				+  previously versionable into ones that are easier to upgrade in the
			
 
				+  future.  This probably _isn't_ time to redesign every aspect of the
			
 
				+  Tor protocol that anybody finds problematic.
			
 
				+
			
 
				+2.6. Low-hanging fruit and well-lit areas
			
 
				+
			
 
				+  Not all parts of Tor are tightly covered.  If it's possible to upgrade
			
 
				+  different parts of the system at different rates from one another, we
			
 
				+  should consider doing the stuff we can do easier, earlier.
			
 
				+
			
 
				+  But remember the story of the policeman who finds a drunk under a
			
 
				+  streetlamp, staring at the ground?  The cop asks, "What are you
			
 
				+  doing?"  The drunk says, "I'm looking for my keys!"  "Oh, did you drop
			
 
				+  them around here?" says the policeman.  "No," says the drunk, "But the
			
 
				+  light is so much better here!"
			
 
				+
			
 
				+  Or less proverbially: Simply because a change is easiest, does not
			
 
				+  mean it is the best use of our time.  We should avoid getting bogged
			
 
				+  down solving the _easy_ aspects of our system unless they happen also
			
 
				+  to be _important_.
			
 
				+
			
 
				+2.7. Nice safe boring codes
			
 
				+
			
 
				+  Let's avoid, to the extent that we can:
			
 
				+    - being the primary user of any cryptographic construction or
			
 
				+      protocol.
			
 
				+    - anything that hasn't gotten much attention in the literature.
			
 
				+    - anything we would have to implement from scratch
			
 
				+    - anything without a nice BSD-licensed C implementation
			
 
				+
			
 
				+  Sometimes we'll have the choice of a more efficient algorithm or a
			
 
				+  more boring & well-analyzed one.  We should not even consider trading
			
 
				+  conservative design for efficiency unless we are firmly in the
			
 
				+  critical path.
			
 
				+
			
 
				+2.8. Key restrictions
			
 
				+
			
 
				+  Our spec says that RSA exponents should be 65537, but our code never
			
 
				+  checks for that.  If we want to bolster resistance against collision
			
 
				+  attacks, we could check this requirement.  To the best of my
			
 
				+  knowledge, nothing violates it except for tools like "shallot" that
			
 
				+  generate cute memorable .onion names.  If we want to be nice to
			
 
				+  shallot users, we could check the requirement for everything *except*
			
 
				+  hidden service identity keys.
			
 
				+
			
 
				+3. Aspects of Tor's cryptography, and thoughts on how to upgrade them all
			
 
				+
			
 
				+3.1. Link cryptography
			
 
				+
			
 
				+  Tor uses TLS for its link cryptography; it is easy to add more
			
 
				+  ciphersuites to the acceptable list, or increase the length of
			
 
				+  link-crypto public keys, or increase the length of the DH parameter,
			
 
				+  or sign the X509 certificates with any digest algorithm that OpenSSL
			
 
				+  clients will support.  Current Tor versions do not check any of these
			
 
				+  against expected values.
			
 
				+
			
 
				+  The identity key used to sign the second certificate in the current
			
 
				+  handshake protocol, however, is harder to change, since it needs to
			
 
				+  match up with what we see in the router descriptor for the router
			
 
				+  we're connecting to.  See notes on router identity below.  So long as
			
 
				+  the certificate chain is ultimately authenticated by a RSA-1024 key,
			
 
				+  it's not clear whether making the link RSA key longer on its own
			
 
				+  really improves matters or not.
			
 
				+
			
 
				+  Recall also that for anti-fingerprinting reasons, we're thinking of
			
 
				+  revising the protocol handshake sometime in the 0.2.3.x timeframe.
			
 
				+  If we do that, that might be a good time to make sure that we aren't
			
 
				+  limited by the old identity key size.
			
 
				+
			
 
				+3.2. Circuit-extend crypto
			
 
				+
			
 
				+  Currently, our code requires RSA onion keys to be 1024 bits long.
			
 
				+  Additionally, current nodes will not deliver an EXTEND cell unless it
			
 
				+  is the right length.
			
 
				+
			
 
				+  For this, we might add a second, longer onion-key to router
			
 
				+  descriptors, and a second CREATE2 cell to open new circuits
			
 
				+  using this key type.  It should contain not only the onionskin, but
			
 
				+  also information on onionskin version and ciphersuite.  Onionskins
			
 
				+  generated for CREATE2 cells should use a larger DH group as well, and
			
 
				+  keys should be derived from DH results using a better digest algorithm.
			
 
				+
			
 
				+  We should remove the length limit on EXTEND cells, backported to all
			
 
				+  supported stable versions; call these "EXTEND2" cells.  Call these
			
 
				+  "lightly patched".  Clients could use the new EXTEND2/CREATE2 format
			
 
				+  whenever using a lightly patched or new server to extend to a new
			
 
				+  server, and the old EXTEND/CREATE format otherwise.
			
 
				+
			
 
				+  The new onion skin format should try to avoid the design oddities of
			
 
				+  our old one.  Instead of its current iffy hybrid encryption scheme, it
			
 
				+  should probably do something more like a BEAR/LIONESS operation with a
			
 
				+  fixed key on the g^x value, followed by a public key encryption on the
			
 
				+  start of the encrypted data.  (Robert reminded me about this
			
 
				+  construction.)
			
 
				+
			
 
				+  The current EXTEND cell format ends with a router identity
			
 
				+  fingerprint, which is used by the extended-from router to authenticate
			
 
				+  the extended-to router when it connects.  Changes to this will
			
 
				+  interact with changes to how long an identity key can be and to the
			
 
				+  link protocol; see notes on the link protocol above and about router
			
 
				+  identity below.
			
 
				+
			
 
				+3.2.1. Circuit-extend crypto: fast case
			
 
				+
			
 
				+  When we do unauthenticated circuit extends with CREATE/CREATED_FAST,
			
 
				+  the two input values are combined with SHA1.  I believe that's okay;
			
 
				+  using any entropy here at all is overkill.
			
 
				+
			
 
				+3.3. Relay crypto
			
 
				+
			
 
				+  Upon receiving relay cells, a router transforms the payload portion of
			
 
				+  the cell with the appropriate key appropriate key, sees if it
			
 
				+  recognizes the cell (the recognized field is zero, the digest field is
			
 
				+  correct, the cell is outbound), and passes them on if not.  It is
			
 
				+  possible for each hop in the circuit to handle the relay crypto
			
 
				+  differently; nobody but the client and the hop in question need to
			
 
				+  coordinate their operations.
			
 
				+
			
 
				+  It's not clear, though, whether updating the relay crypto algorithms
			
 
				+  would help anything, unless we changed the whole relay cell processing
			
 
				+  format too.  The stream cipher is good enough, and the use of 4 bytes
			
 
				+  of digest does not have enough bits to provide cryptographic strength,
			
 
				+  no matter what cipher we use.
			
 
				+
			
 
				+  This is the likeliest area for the second-system effect to strike;
			
 
				+  there are lots of opportunities to try to be more clever than we are
			
 
				+  now.
			
 
				+
			
 
				+3.4. Router identity
			
 
				+
			
 
				+  This is one of the hardest things to change.  Right now, routers are
			
 
				+  identified by a "fingerprint" equal to the SHA1 hash of their 1024-bit
			
 
				+  identity key as given in their router descriptor.  No existing Tor
			
 
				+  will accept any other size of identity key, or any other hash
			
 
				+  algorithm.  The identity key itself is used:
			
 
				+    - To sign the router descriptors
			
 
				+    - To sign link-key certificates
			
 
				+    - To determine the least significant bits of circuit IDs used on a
			
 
				+      Tor instance's links (see tor-spec §5.1)
			
 
				+
			
 
				+  The fingerprint is used:
			
 
				+    - To identify a router identity key in EXTEND cells
			
 
				+    - To identify a router identity key in bridge lines
			
 
				+    - Throughout the controller interface
			
 
				+    - To fetch bridge descriptors for a bridge
			
 
				+    - To identify a particular router throughout the codebase
			
 
				+    - In the .exit notation.
			
 
				+    - By the controller to identify nodes
			
 
				+    - To identify servers in the logs
			
 
				+    - Probably other places too
			
 
				+
			
 
				+  To begin to allow other key types, key lengths, and hash functions, we
			
 
				+  would either need to wait till all current Tors are obsolete, or allow
			
 
				+  routers to have more than one identity for a while.
			
 
				+
			
 
				+  To allow routers to have more than one identity, we need to
			
 
				+  cross-certify identity keys.  We can do this trivially, in theory, by
			
 
				+  listing both keys in the router descriptor and having both identities
			
 
				+  sign the descriptor.  In practice, we will need to analyze this pretty
			
 
				+  carefully to avoid attacks where one key is completely fake aimed to
			
 
				+  trick old clients somehow.
			
 
				+
			
 
				+  Upgrading the hash algorithm once would be easy: just say that all
			
 
				+  new-type keys get hashed using the new hash algorithm.  Remaining
			
 
				+  future-proof could be tricky.
			
 
				+
			
 
				+  This is one of the hardest areas to update; "SHA1 of identity key" is
			
 
				+  assumed in so many places throughout Tor that we'll probably need a
			
 
				+  lot of design work to work with something else.
			
 
				+
			
 
				+3.5. Directory objects
			
 
				+
			
 
				+  Fortunately, the problem is not so bad for consensuses themselves,
			
 
				+  because:
			
 
				+    - Authority identity keys are allowed to be RSA keys of any length;
			
 
				+      in practice I think they are all 3072 bits.
			
 
				+    - Authority signing keys are also allowed to be of any length.
			
 
				+      AFAIK the code works with longer signing keys just fine.
			
 
				+    - Currently, votes are hashed with both sha1 and sha256; adding
			
 
				+      more hash algorithms isn't so hard.
			
 
				+    - Microdescriptor consensuses are all signed using sha256.  While
			
 
				+      regular consensuses are signed using sha1, exploitable collisions
			
 
				+      are hard to come up with, since once you had a collision, you
			
 
				+      would need to get a majority of other authorities to agree to
			
 
				+      generate it.
			
 
				+
			
 
				+  Router descriptors are currently identified by SHA1 digests of their
			
 
				+  identity keys and descriptor digests in regular consensuses, and by
			
 
				+  SHA1 digests of identity keys and SHA256 digests of microdescriptors
			
 
				+  in microdesc consensuses.  The consensus-flavors design allows us to
			
 
				+  generate new flavors of consensus that identity routers by new hashes
			
 
				+  of their identity keys.  Alternatively, existing consensuses could be
			
 
				+  expanded to contain more hashes, though that would have some space
			
 
				+  concerns.
			
 
				+
			
 
				+  Router descriptors themselves are signed using RSA-1024 identity keys
			
 
				+  and SHA1.  For information on updating identity keys, see above.
			
 
				+
			
 
				+  Router descriptors and extra-info documents cross-certify one another
			
 
				+  using SHA1.
			
 
				+
			
 
				+  Microdescriptors are currently specified to contain exactly one
			
 
				+  onion key, of length 1024 bits.
			
 
				+
			
 
				+3.6. The directory protocol
			
 
				+
			
 
				+  Most objects are indexed by SHA1 hash of an identity key or a
			
 
				+  descriptor object.  Adding more hash types wouldn't be a huge problem
			
 
				+  at the directory cache level.
			
 
				+
			
 
				+3.7. The hidden service protocol
			
 
				+
			
 
				+  Hidden services self-identify by a 1024-bit RSA key.  Other key
			
 
				+  lengths are not supported.  This key is turned into an 80 bit half
			
 
				+  SHA-1 hash for hidden service names.
			
 
				+
			
 
				+  The most simple change here would be to set an interface for putting
			
 
				+  the whole ugly SHA1 hash in the hidden service name.  Remember that
			
 
				+  this needs to coexist with the authentication system which also uses
			
 
				+  .onion hostnames; that hostnames top out around 255 characters and and
			
 
				+  their components top out at 63.
			
 
				+
			
 
				+  Currently, ESTABLISH_INTRO cells take a key length parameter, so in
			
 
				+  theory they allow longer keys.  The rest of the protocol assumes that
			
 
				+  this will be hashed into a 20-byte SHA1 identifier.  Changing that
			
 
				+  would require changes at the introduction point as well as the hidden
			
 
				+  service.
			
 
				+
			
 
				+  The parsing code for hidden service descriptors currently enforce a
			
 
				+  1024-bit identity key, though this does not seem to be described in
			
 
				+  the specification.  Changing that would be at least as hard as doing
			
 
				+  it for regular identity keys.
			
 
				+
			
 
				+  Fortunately, hidden services are nearly completely orthogonal to
			
 
				+  everything else.
			
 
				+
			
--- a/doc/spec/proposals/ideas/xxx-crypto-requirements.txt
+++ b/doc/spec/proposals/ideas/xxx-crypto-requirements.txt
@@ -0,0 +1,72 @@
 
				+Title: Requirements for Tor's circuit cryptography
			
 
				+Author: Robert Ransom
			
 
				+Created: 12 December 2010
			
 
				+
			
 
				+Overview
			
 
				+
			
 
				+  This draft is intended to specify the meaning of 'secure' for a Tor
			
 
				+  circuit protocol, hopefully in enough detail that
			
 
				+  mathematically-inclined cryptographers can use this definition to
			
 
				+  prove that a Tor circuit protocol (or component thereof) is secure
			
 
				+  under reasonably well-accepted assumptions.
			
 
				+
			
 
				+  Tor's current circuit protocol consists of the CREATE, CREATED, RELAY,
			
 
				+  DESTROY, CREATE_FAST, CREATED_FAST, and RELAY_EARLY cells (including
			
 
				+  all subtypes of RELAY and RELAY_EARLY cells).  Tor currently has two
			
 
				+  circuit-extension handshake protocols: one consists of the CREATE and
			
 
				+  CREATED cells; the other, used only over the TLS connection to the
			
 
				+  first node in a circuit, consists of the CREATE_FAST and CREATED_FAST
			
 
				+  cells.
			
 
				+
			
 
				+Requirements
			
 
				+
			
 
				+  1. Every circuit-extension handshake protocol must provide forward
			
 
				+  secrecy -- the protocol must allow both the client and the relay to
			
 
				+  destroy, immediately after a circuit is closed, enough key material
			
 
				+  that no attacker who can eavesdrop on all handshake and circuit cells
			
 
				+  and who can seize and inspect the client and relay after the circuit
			
 
				+  is closed will be able to decrypt any non-handshake data sent along
			
 
				+  the circuit.
			
 
				+
			
 
				+  In particular, the protocol must not require that a key which can be
			
 
				+  used to decrypt non-handshake data be stored for a predetermined
			
 
				+  period of time, as such a key must be written to persistent storage.
			
 
				+
			
 
				+  2. Every circuit-extension handshake protocol must specify what key
			
 
				+  material must be used only once in order to allow unlinkability of
			
 
				+  circuit-extension handshakes.
			
 
				+
			
 
				+  3. Every circuit-extension handshake protocol must authenticate the relay
			
 
				+  to the client -- an attacker who can eavesdrop on all handshake and
			
 
				+  circuit cells and who can participate in handshakes with the client
			
 
				+  must not be able to determine a symmetric session key that a circuit
			
 
				+  will use without either knowing a secret key corresponding to a
			
 
				+  handshake-authentication public key published by the relay or breaking
			
 
				+  a cryptosystem for which the relay published a
			
 
				+  handshake-authentication public key.
			
 
				+
			
 
				+  4. Every circuit-extension handshake protocol must ensure that neither
			
 
				+  the client nor the relay can cause the handshake to result in a
			
 
				+  predetermined symmetric session key.
			
 
				+
			
 
				+  5. Every circuit-extension handshake protocol should ensure that an
			
 
				+  attacker who can predict the relay's ephemeral secret input to the
			
 
				+  handshake and can eavesdrop on all handshake and circuit cells, but
			
 
				+  does not know a secret key corresponding to the
			
 
				+  handshake-authentication public key used in the handshake, cannot
			
 
				+  break the handshake-authentication public key's cryptosystem, and
			
 
				+  cannot predict the client's ephemeral secret input to the handshake,
			
 
				+  cannot predict the symmetric session keys used for the resulting
			
 
				+  circuit.
			
 
				+
			
 
				+  6. The circuit protocol must specify an end-to-end flow-control
			
 
				+  mechanism, and must allow for the addition of new mechanisms.
			
 
				+
			
 
				+  7. The circuit protocol should specify the statistics to be exchanged
			
 
				+  between circuit endpoints in order to support end-to-end flow control,
			
 
				+  and should specify how such statistics can be verified.
			
 
				+
			
 
				+
			
 
				+  8. The circuit protocol should allow an endpoint to verify that the other
			
 
				+  endpoint is participating in an end-to-end flow-control protocol
			
 
				+  honestly.
			
--- a/doc/spec/proposals/ideas/xxx-pluggable-transport.txt
+++ b/doc/spec/proposals/ideas/xxx-pluggable-transport.txt
@@ -0,0 +1,306 @@
 
				+Filename: xxx-pluggable-transport.txt
			
 
				+Title: Pluggable transports for circumvention
			
 
				+Author: Jacob Appelbaum, Nick Mathewson
			
 
				+Created: 15-Oct-2010
			
 
				+Status: Draft
			
 
				+
			
 
				+Overview
			
 
				+
			
 
				+  This proposal describes a way to decouple protocol-level obfuscation
			
 
				+  from the core Tor protocol in order to better resist client-bridge
			
 
				+  censorship.  Our approach is to specify a means to add pluggable
			
 
				+  transport implementations to Tor clients and bridges so that they can
			
 
				+  negotiate a superencipherment for the Tor protocol.
			
 
				+
			
 
				+Scope
			
 
				+
			
 
				+  This is a document about transport plugins; it does not cover
			
 
				+  discovery improvements, or bridgedb improvements.  While these
			
 
				+  requirements might be solved by a program that also functions as a
			
 
				+  transport plugin, this proposal only covers the requirements and
			
 
				+  operation of transport plugins.
			
 
				+
			
 
				+Motivation
			
 
				+
			
 
				+  Frequently, people want to try a novel circumvention method to help
			
 
				+  users connect to Tor bridges.  Some of these methods are already
			
 
				+  pretty easy to deploy: if the user knows an unblocked VPN or open
			
 
				+  SOCKS proxy, they can just use that with the Tor client today.
			
 
				+
			
 
				+  Less easy to deploy are methods that require participation by both the
			
 
				+  client and the bridge.  In order of increasing sophistication, we
			
 
				+  might want to support:
			
 
				+
			
 
				+  1. A protocol obfuscation tool that transforms the output of a TLS
			
 
				+     connection into something that looks like HTTP as it leaves the
			
 
				+     client, and back to TLS as it arrives at the bridge.
			
 
				+  2. An additional authentication step that a client would need to
			
 
				+     perform for a given bridge before being allowed to connect.
			
 
				+  3. An information passing system that uses a side-channel in some
			
 
				+     existing protocol to convey traffic between a client and a bridge
			
 
				+     without the two of them ever communicating directly.
			
 
				+  4. A set of clients to tunnel client->bridge traffic over an existing
			
 
				+     large p2p network, such that the bridge is known by an identifier
			
 
				+     in that network rather than by an IP address.
			
 
				+
			
 
				+  We could in theory support these almost fine with Tor as it stands
			
 
				+  today: every Tor client can take a SOCKS proxy to use for its outgoing
			
 
				+  traffic, so a suitable client proxy could handle the client's traffic
			
 
				+  and connections on its behalf, while a corresponding program on the
			
 
				+  bridge side could handle the bridge's side of the protocol
			
 
				+  transformation.  Nevertheless, there are some reasons to add support
			
 
				+  for transportation plugins to Tor itself:
			
 
				+
			
 
				+  1. It would be good for bridges to have a standard way to advertise
			
 
				+     which transports they support, so that clients can have multiple
			
 
				+     local transport proxies, and automatically use the right one for
			
 
				+     the right bridge.
			
 
				+
			
 
				+  2. There are some changes to our architecture that we'll need for a
			
 
				+     system like this to work.  For testing purposes, if a bridge blocks
			
 
				+     off its regular ORPort and instead has an obfuscated ORPort, the
			
 
				+     bridge authority has no way to test it.  Also, unless the bridge
			
 
				+     has some way to tell that the bridge-side proxy at 127.0.0.1 is not
			
 
				+     the origin of all the connections it is relaying, it might decide
			
 
				+     that there are too many connections from 127.0.0.1, and start
			
 
				+     paring them down to avoid a DoS.
			
 
				+
			
 
				+  3. Censorship and anticensorship techniques often evolve faster than
			
 
				+     the typical Tor release cycle.  As such, it's a good idea to
			
 
				+     provide ways to test out new anticensorship mechanisms on a more
			
 
				+     rapid basis.
			
 
				+
			
 
				+  4. Transport obfuscation is a relatively distinct problem
			
 
				+     from the other privacy problems that Tor tries to solve, and it
			
 
				+     requires a fairly distinct skill-set from hacking the rest of Tor.
			
 
				+     By decoupling transport obfuscation from the Tor core, we hope to
			
 
				+     encourage people working on transport obfuscation who would
			
 
				+     otherwise not be interested in hacking Tor.
			
 
				+
			
 
				+  5. Finally, we hope that defining a generic transport obfuscation plugin
			
 
				+     mechanism will be useful to other anticensorship projects.
			
 
				+
			
 
				+Non-Goals
			
 
				+
			
 
				+  We're not going to talk about automatic verification of plugin
			
 
				+  correctness and safety via sandboxing, proof-carrying code, or
			
 
				+  whatever.
			
 
				+
			
 
				+  We need to do more with discovery and distribution, but that's not
			
 
				+  what this proposal is about.  We're pretty convinced that the problems
			
 
				+  are sufficiently orthogonal that we should be fine so long as we don't
			
 
				+  preclude a single program from implementing both transport and
			
 
				+  discovery extensions.
			
 
				+
			
 
				+  This proposal is not about what transport plugins are the best ones
			
 
				+  for people to write.  We do, however, make some general
			
 
				+  recommendations for plugin authors in an appendix.
			
 
				+
			
 
				+  We've considered issues involved with completely replacing Tor's TLS
			
 
				+  with another encryption layer, rather than layering it inside the
			
 
				+  obfuscation layer.  We describe how to do this in an appendix to the
			
 
				+  current proposal, though we are not currently sure whether it's a good
			
 
				+  idea to implement.
			
 
				+
			
 
				+  We deliberately reject any design that would involve linking more code
			
 
				+  into Tor's process space.
			
 
				+
			
 
				+Design overview
			
 
				+
			
 
				+  To write a new transport protocol, an implementer must provide two
			
 
				+  pieces: a "Client Proxy" to run at the initiator side, and a "Server
			
 
				+  Proxy" to run a the server side.  These two pieces may or may not be
			
 
				+  implemented by the same program.
			
 
				+
			
 
				+  Each client may run any number of Client Proxies.  Each one acts like
			
 
				+  a SOCKS proxy that accepts accept connections on localhost.  Each one
			
 
				+  runs on a different port, and implements one or more transport
			
 
				+  methods.  If the protocol has any parameters, they passed from Tor
			
 
				+  inside the regular username/password parts of the SOCKS protocol.
			
 
				+
			
 
				+  Bridges (and maybe relays) may run any number of Server Proxies: these
			
 
				+  programs provide an interface like stunnel-server (or whatever the
			
 
				+  option is): they get connections from the network (typically by
			
 
				+  listening for connections on the network) and relay them to the
			
 
				+  Bridge's real ORPort.
			
 
				+
			
 
				+  To configure one of these programs, it should be sufficient simply to
			
 
				+  list it in your torrc.  The program tells Tor which transports it
			
 
				+  provides.
			
 
				+
			
 
				+  Bridges (and maybe relays) report in their descriptors which transport
			
 
				+  protocols they support.  This information can be copied into bridge
			
 
				+  lines.  Bridges using a transport protocol may have multiple bridge
			
 
				+  lines.
			
 
				+
			
 
				+  Any methods that are wildly successful, we can bake into Tor.
			
 
				+
			
 
				+Specifications: Client behavior
			
 
				+
			
 
				+  Bridge lines can now follow the extended format "bridge method
			
 
				+  address:port [[keyid=]id-fingerprint] [k=v] [k=v] [k=v]". To connect
			
 
				+  to such a bridge, a client must open a local connection to the SOCKS
			
 
				+  proxy for "method", and ask it to connect to address:port.  If
			
 
				+  [id-fingerprint] is provided, it should expect the public identity key
			
 
				+  on the TLS connection to match the digest provided in
			
 
				+  [id-fingerprint].  If any [k=v] items are provided, they are
			
 
				+  configuration parameters for the proxy: Tor should separate them with
			
 
				+  NUL bytes and put them user and password fields of the request,
			
 
				+  splitting them across the fields as necessary.  The "id-fingerprint"
			
 
				+  field is always provided in a field named "keyid", if it was given.
			
 
				+
			
 
				+  Example: if the bridge line is "bridge trebuchet www.example.com:3333
			
 
				+     rocks=20 height=5.6m" AND if the Tor client knows that the
			
 
				+     'trebuchet' method is provided by a SOCKS5 proxy on
			
 
				+     127.0.0.1:19999, the client should connect to that proxy, ask it to
			
 
				+     connect to www.example.com, and provide the string
			
 
				+     "rocks=20\0height=5.6m" as the username, the password, or split
			
 
				+     across the username and password.
			
 
				+
			
 
				+  There are two ways to tell Tor clients about protocol proxies:
			
 
				+  external proxies and managed proxies.  An external proxy is configured
			
 
				+  with "ClientTransportPlugin trebuchet socks5 127.0.0.1:9999".  This
			
 
				+  tells Tor that another program is already running to handle
			
 
				+  'trubuchet' connections, and Tor doesn't need to worry about it.  A
			
 
				+  managed proxy is configured with "ClientTransportPlugin trebuchet
			
 
				+  /usr/libexec/tor-proxies/trebuchet [options]", and tells Tor to launch
			
 
				+  an external program on-demand to provide a socks proxy for 'trebuchet'
			
 
				+  connections. The Tor client only launches one instance of each
			
 
				+  external program, even if the same executable is listed for more than
			
 
				+  one method.
			
 
				+
			
 
				+  The same program can implement a managed or an external proxy: it just
			
 
				+  needs to take an argument saying which one to be.
			
 
				+
			
 
				+Client proxy behavior
			
 
				+
			
 
				+   When launched from the command-line by a Tor client, a transport
			
 
				+   proxy needs to tell Tor which methods and ports it supports.  It does
			
 
				+   this by printing one or more CMETHOD: lines to its stdout.  These look
			
 
				+   like
			
 
				+
			
 
				+   CMETHOD: trebuchet SOCKS5 127.0.0.1:19999 ARGS:rocks,height \
			
 
				+              OPT-ARGS:tensile-strength
			
 
				+
			
 
				+   The ARGS field lists mandatory parameters that must appear in every
			
 
				+   bridge line for this method. The OPT-ARGS field lists optional
			
 
				+   parameters.  If no ARGS or OPT-ARGS field is provided, Tor should not
			
 
				+   check the parameters in bridge lines for this method.
			
 
				+
			
 
				+   The proxy should print a single "METHODS:DONE" line after it is
			
 
				+   finished telling Tor about the methods it provides.
			
 
				+
			
 
				+   The transport proxy MUST exit cleanly when it receives a SIGTERM from
			
 
				+   Tor.
			
 
				+
			
 
				+   The Tor client MUST ignore lines beginning with a keyword and a colon
			
 
				+   if it does not recognize the keyword.
			
 
				+
			
 
				+   In the future, if we need a control mechanism, we can use the
			
 
				+   stdin/stdout from Tor to the transport proxy.
			
 
				+
			
 
				+   A transport proxy MUST handle SOCKS connect requests using the SOCKS
			
 
				+   version it advertises.
			
 
				+
			
 
				+   Tor clients SHOULD NOT use any method from a client proxy unless it
			
 
				+   is both listed as a possible method for that proxy in torrc, and it
			
 
				+   is listed by the proxy as a method it supports.
			
 
				+
			
 
				+   [XXXX say something about versioning.]
			
 
				+
			
 
				+Server behavior
			
 
				+
			
 
				+   Server proxies are configured similarly to client proxies.
			
 
				+
			
 
				+   
			
 
				+
			
 
				+Server proxy behavior
			
 
				+
			
 
				+
			
 
				+
			
 
				+   [so, we can have this work like client proxies, where the bridge
			
 
				+   launches some programs, and they tell the bridge, "I am giving you
			
 
				+   method X with parameters Y"?  Do you have to take all the methods? If
			
 
				+   not, which do you specify?]
			
 
				+
			
 
				+   [Do we allow programs that get started independently?]
			
 
				+
			
 
				+   [We'll need to figure out how this works with port forwarding.  Is
			
 
				+   port forwarding the bridge's problem, the proxy's problem, or some
			
 
				+   combination of the two?]
			
 
				+
			
 
				+   [If we're using the bridge authority/bridgedb system for distributing
			
 
				+   bridge info, the right place to advertise bridge lines is probably
			
 
				+   the extrainfo document.  We also need a way to tell the bridge
			
 
				+   authority "don't give out a default bridge line for me"]
			
 
				+
			
 
				+Server behavior
			
 
				+
			
 
				+Bridge authority behavior
			
 
				+
			
 
				+Implementation plan
			
 
				+
			
 
				+   Turn this into a draft proposal
			
 
				+
			
 
				+   Circulate and discuss on or-dev.
			
 
				+
			
 
				+   We should ship a couple of null plugin implementations in one or two
			
 
				+   popular, portable languages so that people get an idea of how to
			
 
				+   write the stuff.
			
 
				+
			
 
				+   1. We should have one that's just a proof of concept that does
			
 
				+      nothing but transfer bytes back and forth.
			
 
				+
			
 
				+   1. We should not do a rot13 one.
			
 
				+
			
 
				+   2. We should implement a basic proxy that does not transform the bytes at all
			
 
				+
			
 
				+   1. We should implement DNS or HTTP using other software (as goodell
			
 
				+      did years ago with DNS) as an example of wrapping existing code into
			
 
				+      our plugin model.
			
 
				+
			
 
				+   2. The obfuscated-ssh superencipherment is pretty trivial and pretty
			
 
				+   useful.  It makes the protocol stringwise unfingerprintable.
			
 
				+
			
 
				+      1. Nick needs to be told firmly not to bikeshed the obfuscated-ssh
			
 
				+        superencipherment too badly
			
 
				+
			
 
				+         1. Go ahead, bikeshed my day
			
 
				+
			
 
				+   1. If we do a raw-traffic proxy, openssh tunnels would be the logical choice.
			
 
				+
			
 
				+Appendix: recommendations for transports
			
 
				+
			
 
				+  Be free/open-source software.  Also, if you think your code might
			
 
				+  someday do so well at circumvention that it should be implemented
			
 
				+  inside Tor, it should use the same license as Tor.
			
 
				+
			
 
				+  Use libraries that Tor already requires. (You can rely on openssl and
			
 
				+  libevent being present if current Tor is present.)
			
 
				+
			
 
				+  Be portable: most Tor users are on Windows, and most Tor developers
			
 
				+  are not, so designing your code for just one of these platforms will
			
 
				+  make it either get a small userbase, or poor auditing.
			
 
				+
			
 
				+  Think secure: if your code is in a C-like language, and it's hard to
			
 
				+  read it and become convinced it's safe then, it's probably not safe.
			
 
				+
			
 
				+  Think small: we want to minimize the bytes that a Windows user needs
			
 
				+  to download for a transport client.
			
 
				+
			
 
				+  Specify: if you can't come up with a good explanation
			
 
				+
			
 
				+  Avoid security-through-obscurity if possible.  Specify.
			
 
				+
			
 
				+  Resist trivial fingerprinting: There should be no good string or regex
			
 
				+  to search for to distinguish your protocol from protocols permitted by
			
 
				+  censors.
			
 
				+
			
 
				+  Imitate a real profile: There are many ways to implement most
			
 
				+  protocols -- and in many cases, most possible variants of a given
			
 
				+  protocol won't actually exist in the wild.
			
 
				+
			
 
				+Appendix: Raw-traffic transports
			
 
				+
			
 
				+  This section describes an optional extension to the proposal above.
			
 
				+  We  are not sure whether it is a good idea.