20 years ago · a3d1671f95
--- a/doc/tor-spec-udp.txt
+++ b/doc/tor-spec-udp.txt
@@ -0,0 +1,366 @@
 
				+[This proposed Tor extension has not been implemented yet. It is currently
			
 
				+in request-for-comments state. -RD]
			
 
				+
			
 
				+                  Tor Unreliable Datagram Extension Proposal
			
 
				+
			
 
				+                               Marc Liberatore
			
 
				+
			
 
				+Abstract
			
 
				+
			
 
				+Contents
			
 
				+
			
 
				+0. Introduction
			
 
				+
			
 
				+  Tor is a distributed overlay network designed to anonymize low-latency
			
 
				+  TCP-based applications.  The current tor specification supports only
			
 
				+  TCP-based traffic.  This limitation prevents the use of tor to anonymize
			
 
				+  other important applications, notably voice over IP software.  This document
			
 
				+  is a proposal to extend the tor specification to support UDP traffic.
			
 
				+
			
 
				+  The basic design philosophy of this extension is to add support for
			
 
				+  tunneling unreliable datagrams through tor with as few modifications to the
			
 
				+  protocol as possible.  As currently specified, tor cannot directly support
			
 
				+  such tunneling, as connections between nodes are built using transport layer
			
 
				+  security (TLS) atop TCP.  The latency incurred by TCP is likely unacceptable
			
 
				+  to the operation of most UDP-based application level protocols.
			
 
				+
			
 
				+  Thus, we propose the addition of links between nodes using datagram
			
 
				+  transport layer security (DTLS).  These links allow packets to traverse a
			
 
				+  route through tor quickly, but their unreliable nature requires minor
			
 
				+  changes to the tor protocol.  This proposal outlines the necessary
			
 
				+  additions and changes to the tor specification to support UDP traffic.
			
 
				+
			
 
				+  We note that a separate set of DTLS links between nodes creates a second
			
 
				+  overlay, distinct from the that composed of TLS links.  This separation and
			
 
				+  resulting decrease in each anonymity set's size will make certain attacks
			
 
				+  easier.  However, it is our belief that VoIP support in tor will
			
 
				+  dramatically increase its appeal, and correspondingly, the size of its user
			
 
				+  base, number of deployed nodes, and total traffic relayed.  These increases
			
 
				+  should help offset the loss of anonymity that two distinct networks imply.
			
 
				+
			
 
				+1. Overview of Tor-UDP and its complications
			
 
				+
			
 
				+  As described above, this proposal extends the Tor specification to support
			
 
				+  UDP with as few changes as possible.  Tor's overlay network is managed
			
 
				+  through TLS based connections; we will re-use this control plane to set up
			
 
				+  and tear down circuits that relay UDP traffic.  These circuits be built atop
			
 
				+  DTLS, in a fashion analogous to how Tor currently sends TCP traffic over
			
 
				+  TLS.
			
 
				+
			
 
				+  The unreliability of DTLS circuits creates problems for Tor at two levels:
			
 
				+
			
 
				+      1. Tor's encryption of the relay layer does not allow independent
			
 
				+      decryption of individual records. If record N is not received, then
			
 
				+      record N+1 will not decrypt correctly, as the counter for AES/CTR is
			
 
				+      maintained implicitly.
			
 
				+
			
 
				+      2. Tor's end-to-end integrity checking works under the assumption that
			
 
				+      all RELAY cells are delivered.  This assumption is invalid when cells
			
 
				+      are sent over DTLS.
			
 
				+
			
 
				+  The fix for the first problem is straightforward: add an explicit sequence
			
 
				+  number to each cell.  To fix the second problem, we introduce a
			
 
				+  system of nonces and hashes to RELAY packets.
			
 
				+
			
 
				+  In the following sections, we mirror the layout of the Tor Protocol
			
 
				+  Specification, presenting the necessary modifications to the Tor protocol as
			
 
				+  a series of deltas.
			
 
				+
			
 
				+2. Connections
			
 
				+
			
 
				+  Tor-UDP uses DTLS for encryption of some links.  All DTLS links must have
			
 
				+  corresponding TLS links, as all control messages are sent over TLS.  All
			
 
				+  implementations MUST support the DTLS ciphersuite "[TODO]".
			
 
				+
			
 
				+  DTLS connections are formed using the same protocol as TLS connections.
			
 
				+  This occurs upon request, following at CREATE_UDP or CREATE_FAST_UDP cell,
			
 
				+  as detailed in section 4.6.
			
 
				+
			
 
				+  Once a paired TLS/DTLS connection is established, the two sides send cells
			
 
				+  to one another.  All but two types of cells are sent over TLS links.  RELAY
			
 
				+  cells containing the commands RELAY_UDP_DATA and RELAY_UDP_DROP, specified
			
 
				+  below, are sent over DTLS links.  [Should all cells still be 512 bytes long?
			
 
				+  Perhaps upon completion of a preliminary implementation, we should do a
			
 
				+  performance evaluation for some class of UDP traffic, such as VoIP. - ML]
			
 
				+  Cells may be sent embedded in TLS or DTLS records of any size or divided
			
 
				+  across such records.  The framing of these records MUST NOT leak any more
			
 
				+  information than the above differentiation on the basis of cell type.  [I am
			
 
				+  uncomfortable with this leakage, but don't see any simple, elegant way
			
 
				+  around it. -ML]
			
 
				+
			
 
				+  As with TLS connections, DTLS connections are not permanent.
			
 
				+
			
 
				+3. Cell format
			
 
				+
			
 
				+  Each cell contains the following fields:
			
 
				+
			
 
				+        CircID                                [2 bytes]
			
 
				+        Command                               [1 byte]
			
 
				+        Sequence Number                       [2 bytes]
			
 
				+        Payload (padded with 0 bytes)         [507 bytes]
			
 
				+                                         [Total size: 512 bytes]
			
 
				+
			
 
				+  The 'Command' field holds one of the following values:
			
 
				+       0 -- PADDING         (Padding)                     (See Sec 6.2)
			
 
				+       1 -- CREATE          (Create a circuit)            (See Sec 4)
			
 
				+       2 -- CREATED         (Acknowledge create)          (See Sec 4)
			
 
				+       3 -- RELAY           (End-to-end data)             (See Sec 5)
			
 
				+       4 -- DESTROY         (Stop using a circuit)        (See Sec 4)
			
 
				+       5 -- CREATE_FAST     (Create a circuit, no PK)     (See Sec 4)
			
 
				+       6 -- CREATED_FAST    (Circuit created, no PK)      (See Sec 4)
			
 
				+       7 -- CREATE_UDP      (Create a UDP circuit)        (See Sec 4)
			
 
				+       8 -- CREATED_UDP     (Acknowledge UDP create)      (See Sec 4)
			
 
				+       9 -- CREATE_FAST_UDP (Create a UDP circuit, no PK) (See Sec 4)
			
 
				+      10 -- CREATED_FAST_UDP(UDP circuit created, no PK)  (See Sec 4)
			
 
				+
			
 
				+  The sequence number allows for AES/CTR decryption of RELAY cells
			
 
				+  independently of one another; this functionality is required to support
			
 
				+  cells sent over DTLS.  The sequence number is described in more detail in
			
 
				+  section 4.5.
			
 
				+
			
 
				+  [Should the sequence number only appear in RELAY packets?  The overhead is
			
 
				+  small, and I'm hesitant to force more code paths on the implementor.]
			
 
				+
			
 
				+  [Having separate commands for UDP circuits seems necessary, unless we can
			
 
				+  assume a flag day event for a large number of tor nodes.]
			
 
				+
			
 
				+4. Circuit management
			
 
				+
			
 
				+4.2. Setting circuit keys
			
 
				+
			
 
				+  Keys are set up for UDP circuits in the same fashion as for TCP circuits.
			
 
				+  Each UDP circuit shares keys with its corresponding TCP circuit.
			
 
				+
			
 
				+4.3. Creating circuits
			
 
				+
			
 
				+  UDP circuits are created as TCP circuits, using the *_UDP cells as
			
 
				+  appropriate.
			
 
				+
			
 
				+4.4. Tearing down circuits
			
 
				+
			
 
				+  UDP circuits are torn down as TCP circuits, using the *_UDP cells as
			
 
				+  appropriate.
			
 
				+
			
 
				+4.5. Routing relay cells
			
 
				+
			
 
				+  When an OR receives a RELAY cell, it checks the cell's circID and
			
 
				+  determines whether it has a corresponding circuit along that
			
 
				+  connection.  If not, the OR drops the RELAY cell.
			
 
				+
			
 
				+  Otherwise, if the OR is not at the OP edge of the circuit (that is,
			
 
				+  either an 'exit node' or a non-edge node), it de/encrypts the payload
			
 
				+  with AES/CTR, as follows:
			
 
				+       'Forward' relay cell (same direction as CREATE):
			
 
				+           Use Kf as key; decrypt, using sequence number to synchronize
			
 
				+           ciphertext and keystream.
			
 
				+       'Back' relay cell (opposite direction from CREATE):
			
 
				+           Use Kb as key; encrypt, using sequence number to synchronize
			
 
				+           ciphertext and keystream.
			
 
				+  Note that in counter mode, decrypt and encrypt are the same operation.
			
 
				+
			
 
				+  Each stream encrypted by a Kf or Kb has a corresponding unique state,
			
 
				+  captured by a sequence number; the originator of each such stream chooses
			
 
				+  the initial sequence number randomly, and increments it only with RELAY
			
 
				+  cells.  [This counts cells; unlike, say, TCP, tor uses fixed-size cells, so
			
 
				+  there's no need for counting bytes directly.  Right? - ML]
			
 
				+
			
 
				+  The OR then decides whether it recognizes the relay cell, by
			
 
				+  inspecting the payload as described in section 5.1 below.  If the OR
			
 
				+  recognizes the cell, it processes the contents of the relay cell.
			
 
				+  Otherwise, it passes the decrypted relay cell along the circuit if
			
 
				+  the circuit continues.  If the OR at the end of the circuit
			
 
				+  encounters an unrecognized relay cell, an error has occurred: the OR
			
 
				+  sends a DESTROY cell to tear down the circuit.
			
 
				+
			
 
				+  When a relay cell arrives at an OP, the OP decrypts the payload
			
 
				+  with AES/CTR as follows:
			
 
				+        OP receives data cell:
			
 
				+           For I=N...1,
			
 
				+               Decrypt with Kb_I, using the sequence number as above.  If the
			
 
				+               payload is recognized (see section 5.1), then stop and process
			
 
				+               the payload.
			
 
				+
			
 
				+  For more information, see section 5 below.
			
 
				+
			
 
				+4.6. CREATE_UDP and CREATED_UDP cells
			
 
				+
			
 
				+  Users set up UDP circuits incrementally.  The procedure is similar to that
			
 
				+  for TCP circuits, as described in section 4.1.  In addition to the TLS
			
 
				+  connection to the first node, the OP also attempts to open a DTLS
			
 
				+  connection.  If this succeeds, the OP sends a CREATE_UDP cell, with a
			
 
				+  payload in the same format as a CREATE cell.  To extend a UDP circuit past
			
 
				+  the first hop, the OP sends an EXTEND_UDP relay cell (see section 5) which
			
 
				+  instructs the last node in the circuit to send a CREATE_UDP cell to extend
			
 
				+  the circuit.
			
 
				+
			
 
				+  The relay payload for an EXTEND_UDP relay cell consists of:
			
 
				+         Address                       [4 bytes]
			
 
				+         TCP port                      [2 bytes]
			
 
				+         UDP port                      [2 bytes]
			
 
				+         Onion skin                    [186 bytes]
			
 
				+         Identity fingerprint          [20 bytes]
			
 
				+  
			
 
				+  The address field and ports denote the IPV4 address and ports of the next OR
			
 
				+  in the circuit.
			
 
				+
			
 
				+  The payload for a CREATED_UDP cell or the relay payload for an
			
 
				+  RELAY_EXTENDED_UDP cell is identical to that of the corresponding CREATED or
			
 
				+  RELAY_EXTENDED cell.  Both circuits are established using the same key.
			
 
				+
			
 
				+  Note that the existence of a UDP circuit implies the
			
 
				+  existence of a corresponding TCP circuit, sharing keys, sequence numbers,
			
 
				+  and any other relevant state.
			
 
				+
			
 
				+4.6.1 CREATE_FAST_UDP/CREATED_FAST_UDP cells
			
 
				+
			
 
				+  As above, the OP must successfully connect using DTLS before attempting to
			
 
				+  send a CREATE_FAST_UDP cell.  Otherwise, the procedure is the same as in
			
 
				+  section 4.1.1.
			
 
				+
			
 
				+5. Application connections and stream management
			
 
				+
			
 
				+5.1. Relay cells
			
 
				+
			
 
				+  Within a circuit, the OP and the exit node use the contents of RELAY cells
			
 
				+  to tunnel end-to-end commands, TCP connections ("Streams"), and UDP packets
			
 
				+  across circuits.  End-to-end commands and UDP packets can be initiated by
			
 
				+  either edge; streams are initiated by the OP.
			
 
				+
			
 
				+  The payload of each unencrypted RELAY cell consists of:
			
 
				+        Relay command           [1 byte]
			
 
				+        'Recognized'            [2 bytes]
			
 
				+        StreamID                [2 bytes]
			
 
				+        Digest                  [4 bytes]
			
 
				+        Length                  [2 bytes]
			
 
				+        Data                    [498 bytes]
			
 
				+
			
 
				+  The relay commands are:
			
 
				+        1 -- RELAY_BEGIN        [forward]
			
 
				+        2 -- RELAY_DATA         [forward or backward]
			
 
				+        3 -- RELAY_END          [forward or backward]
			
 
				+        4 -- RELAY_CONNECTED    [backward]
			
 
				+        5 -- RELAY_SENDME       [forward or backward]
			
 
				+        6 -- RELAY_EXTEND       [forward]
			
 
				+        7 -- RELAY_EXTENDED     [backward]
			
 
				+        8 -- RELAY_TRUNCATE     [forward]
			
 
				+        9 -- RELAY_TRUNCATED    [backward]
			
 
				+       10 -- RELAY_DROP         [forward or backward]
			
 
				+       11 -- RELAY_RESOLVE      [forward]
			
 
				+       12 -- RELAY_RESOLVED     [backward]
			
 
				+       13 -- RELAY_BEGIN_UDP    [forward]
			
 
				+       14 -- RELAY_DATA_UDP     [forward or backward]
			
 
				+       15 -- RELAY_EXTEND_UDP   [forward]
			
 
				+       16 -- RELAY_EXTENDED_UDP [backward]
			
 
				+       17 -- RELAY_DROP_UDP     [forward or backward]
			
 
				+
			
 
				+  Commands labelled as "forward" must only be sent by the originator
			
 
				+  of the circuit. Commands labelled as "backward" must only be sent by
			
 
				+  other nodes in the circuit back to the originator. Commands marked
			
 
				+  as either can be sent either by the originator or other nodes.
			
 
				+
			
 
				+  The 'recognized' field in any unencrypted relay payload is always set to
			
 
				+  zero. 
			
 
				+
			
 
				+  The 'digest' field can have two meanings.  For all cells sent over TLS
			
 
				+  connections (that is, all commands and all non-UDP RELAY data), it is
			
 
				+  computed as the first four bytes of the running SHA-1 digest of all the
			
 
				+  bytes that have been sent reliably and have been destined for this hop of
			
 
				+  the circuit or originated from this hop of the circuit, seeded from Df or Db
			
 
				+  respectively (obtained in section 4.2 above), and including this RELAY
			
 
				+  cell's entire payload (taken with the digest field set to zero).  Cells sent
			
 
				+  over DTLS connections do not affect this running digest.  Each cell sent
			
 
				+  over DTLS (that is, RELAY_DATA_UDP and RELAY_DROP_UDP) has the digest field
			
 
				+  set to the SHA-1 digest of the current RELAY cells' entire payload, with the
			
 
				+  digest field set to zero.  Coupled with a randomly-chosen streamID, this
			
 
				+  provides per-cell integrity checking on UDP cells.
			
 
				+
			
 
				+  When the 'recognized' field of a RELAY cell is zero, and the digest
			
 
				+  is correct, the cell is considered "recognized" for the purposes of
			
 
				+  decryption (see section 4.5 above).
			
 
				+
			
 
				+  (The digest does not include any bytes from relay cells that do
			
 
				+  not start or end at this hop of the circuit. That is, it does not
			
 
				+  include forwarded data. Therefore if 'recognized' is zero but the
			
 
				+  digest does not match, the running digest at that node should
			
 
				+  not be updated, and the cell should be forwarded on.)
			
 
				+
			
 
				+  All RELAY cells pertaining to the same tunneled TCP stream have the
			
 
				+  same streamID.  Such streamIDs are chosen arbitrarily by the OP.  RELAY
			
 
				+  cells that affect the entire circuit rather than a particular
			
 
				+  stream use a StreamID of zero.
			
 
				+
			
 
				+  All RELAY cells pertaining to the same UDP tunnel have the same streamID.
			
 
				+  This streamID is chosen randomly by the OP, but cannot be zero.
			
 
				+
			
 
				+  The 'Length' field of a relay cell contains the number of bytes in
			
 
				+  the relay payload which contain real payload data. The remainder of
			
 
				+  the payload is padded with NUL bytes.
			
 
				+
			
 
				+  If the RELAY cell is recognized but the relay command is not
			
 
				+  understood, the cell must be dropped and ignored. Its contents
			
 
				+  still count with respect to the digests, though. [Before
			
 
				+  0.1.1.10, Tor closed circuits when it received an unknown relay
			
 
				+  command. Perhaps this will be more forward-compatible. -RD]
			
 
				+
			
 
				+5.2.1.  Opening UDP tunnels and transferring data
			
 
				+
			
 
				+  To open a new anonymized UDP connection, the OP chooses an open
			
 
				+  circuit to an exit that may be able to connect to the destination
			
 
				+  address, selects a random streamID not yet used on that circuit,
			
 
				+  and constructs a RELAY_BEGIN_UDP cell with a payload encoding the address
			
 
				+  and port of the destination host.  The payload format is:
			
 
				+
			
 
				+        ADDRESS | ':' | PORT | [00]
			
 
				+
			
 
				+  where  ADDRESS can be a DNS hostname, or an IPv4 address in
			
 
				+  dotted-quad format, or an IPv6 address surrounded by square brackets;
			
 
				+  and where PORT is encoded in decimal.
			
 
				+
			
 
				+  [What is the [00] for? -NM]
			
 
				+  [It's so the payload is easy to parse out with string funcs -RD]
			
 
				+
			
 
				+  Upon receiving this cell, the exit node resolves the address as necessary.
			
 
				+  If the address cannot be resolved, the exit node replies with a RELAY_END
			
 
				+  cell.  (See 5.4 below.)  Otherwise, the exit node replies with a
			
 
				+  RELAY_CONNECTED cell, whose payload is in one of the following formats:
			
 
				+      The IPv4 address to which the connection was made [4 octets]
			
 
				+      A number of seconds (TTL) for which the address may be cached [4 octets]
			
 
				+   or
			
 
				+      Four zero-valued octets [4 octets]
			
 
				+      An address type (6)     [1 octet]
			
 
				+      The IPv6 address to which the connection was made [16 octets]
			
 
				+      A number of seconds (TTL) for which the address may be cached [4 octets]
			
 
				+  [XXXX Versions of Tor before 0.1.1.6 ignore and do not generate the TTL
			
 
				+  field.  No version of Tor currently generates the IPv6 format.]
			
 
				+
			
 
				+  The OP waits for a RELAY_CONNECTED cell before sending any data.
			
 
				+  Once a connection has been established, the OP and exit node
			
 
				+  package UDP data in RELAY_DATA_UDP cells, and upon receiving such
			
 
				+  cells, echo their contents to the corresponding socket.
			
 
				+  RELAY_DATA_UDP cells sent to unrecognized streams are dropped.
			
 
				+
			
 
				+  Relay RELAY_DROP_UDP cells are long-range dummies; upon receiving such
			
 
				+  a cell, the OR or OP must drop it.
			
 
				+
			
 
				+5.3. Closing streams
			
 
				+
			
 
				+  UDP tunnels are closed in a fashion corresponding to TCP connections.
			
 
				+
			
 
				+6. Flow Control
			
 
				+
			
 
				+  UDP streams are not subject to flow control.
			
 
				+
			
 
				+7.2. Router descriptor format.
			
 
				+
			
 
				+The items' formats are as follows:
			
 
				+   "router" nickname address ORPort SocksPort DirPort UDPPort
			
 
				+
			
 
				+      Indicates the beginning of a router descriptor.  "address" must be
			
 
				+      an IPv4 address in dotted-quad format. The last three numbers
			
 
				+      indicate the TCP ports at which this OR exposes
			
 
				+      functionality. ORPort is a port at which this OR accepts TLS
			
 
				+      connections for the main OR protocol; SocksPort is deprecated and
			
 
				+      should always be 0; DirPort is the port at which this OR accepts
			
 
				+      directory-related HTTP connections; and UDPPort is a port at which
			
 
				+      this OR accepts DTLS connections for UDP data.  If any port is not
			
 
				+      supported, the value 0 is given instead of a port number.