22 gadi atpakaļ · 961f338386
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -526,11 +526,12 @@ privileges.  Currently, each OR maintains a long-term TLS \cite{TLS}
 
				 connection to every other
			
 
				 OR.  (We examine some ways to relax this clique-topology assumption in
			
 
				 Section~\ref{subsec:restricted-routes}.) A subset of the ORs also act as
			
 
				-directory servers, tracking which routers are currently in the network;
			
 
				-see Section~\ref{subsec:dirservers} for directory server details. Users
			
 
				-run local software called an onion proxy (OP) to fetch directories,
			
 
				+directory servers, tracking which routers are in the network;
			
 
				+see Section~\ref{subsec:dirservers} for directory server details.
			
 
				+Each user
			
 
				+runs local software called an onion proxy (OP) to fetch directories,
			
 
				 establish paths (called \emph{virtual circuits}) across the network,
			
 
				-and handle connections from user applications. Onion proxies accept
			
 
				+and handle connections from user applications.  These onion proxies accept
			
 
				 TCP streams and multiplex them across the virtual circuit. The onion
			
 
				 router on the other side 
			
 
				 % I don't mean other side, I mean wherever it is on the circuit. But
			
@@ -547,8 +548,8 @@ the identity key of a router is considered equivalent to creating a
 
				 new router. The onion (decryption) key is used for decrypting requests
			
 
				 from users to set up a circuit and negotiate ephemeral keys. Finally,
			
 
				 link keys are used by the TLS protocol when communicating between
			
 
				-onion routers.  We discuss rotating these keys in
			
 
				-Section~\ref{subsec:rotating-keys}.
			
 
				+onion routers.  Both short-term keys are rotated periodically and
			
 
				+independantly, to limit the impact of compromised keys.
			
 
				 
			
 
				 Section~\ref{subsec:cells} discusses the structure of the fixed-size
			
 
				 \emph{cells} that are the unit of communication in Tor. We describe
			
@@ -561,34 +562,39 @@ fairness issues.
 
				 \SubSection{Cells}
			
 
				 \label{subsec:cells}
			
 
				 
			
 
				-% I think we should describe connections before cells. -NM
			
 
				-
			
 
				-Traffic passes from one OR to another, or between a user's OP and an OR,
			
 
				-in fixed-size cells. Each cell is 256 bytes (but see
			
 
				-Section~\ref{sec:conclusion}
			
 
				-for a discussion of allowing large cells and small cells on the same
			
 
				-network), and consists of a header and a payload. The header includes an
			
 
				-anonymous circuit identifier (ACI) that specifies which circuit the
			
 
				-% Should we replace ACI with circID ? What is this 'anonymous circuit'
			
 
				-% thing anyway? -RD
			
 
				-cell refers to
			
 
				-(many circuits can be multiplexed over the single TCP connection between
			
 
				-ORs or between an OP and an OR), and a command to describe what to do
			
 
				-with the cell's payload. Cells are either \emph{control} cells, which are
			
 
				-interpreted by the node that receives them, or \emph{relay} cells,
			
 
				-which carry end-to-end stream data. Controls cells can be one of:
			
 
				+ORs communicate with one another, and with users' OPs, via TLS
			
 
				+connections with ephemeral keys.  This prevents an attacker from
			
 
				+impersonating an OR, conceals the contents of the connection with
			
 
				+perfect forward secrecy, and prevents an attacker from modifying data
			
 
				+on the wire.
			
 
				+
			
 
				+Traffic passes along these connections in fixed-size cells.  Each cell
			
 
				+is 256 bytes (but see Section~\ref{sec:conclusion} for a discussion of
			
 
				+allowing large cells and small cells on the same network), and
			
 
				+consists of a header and a payload. The header includes a circuit
			
 
				+identifier (circID) that specifies which circuit the cell refers to
			
 
				+(many circuits are be multiplexed over the single TLS connection), and
			
 
				+a command to describe what to do with the cell's payload.  (Circuit
			
 
				+identifiers are connection-specific; a single circuit has a different
			
 
				+circID on each connection it uses.)
			
 
				+% XXX Say that each OR can have many circuits with same circID, so
			
 
				+% XXX long as they're on different connections, and that ORs know 
			
 
				+% XXX which circIDs/connection pairs are linked by a circuit.
			
 
				+Based on their command, cells are either \emph{control} cells, which are
			
 
				+always interpreted by the node that receives them, or \emph{relay} cells,
			
 
				+which carry end-to-end stream data.   The controls cells commands are:
			
 
				 \emph{padding} (currently used for keepalive, but also usable for link
			
 
				 padding); \emph{create} or \emph{created} (used to set up a new circuit);
			
 
				-or \emph{destroy} (to tear down a circuit).
			
 
				-% We need to say that ACIs are connection-specific: each circuit has
			
 
				-% a different ACI along each connection. -NM
			
 
				-% agreed -RD
			
 
				+and \emph{destroy} (to tear down a circuit).
			
 
				 
			
 
				 Relay cells have an additional header (the relay header) after the
			
 
				 cell header, containing the stream identifier (many streams can
			
 
				 be multiplexed over a circuit); an end-to-end checksum for integrity
			
 
				-checking; the length of the relay payload; and a relay command. Relay
			
 
				-commands can be one of: \emph{relay
			
 
				+checking; the length of the relay payload; and a relay command.  
			
 
				+% XXX Mention _here_ that relay headers are {en|de}crypted as they
			
 
				+% XXX progress along the circuit.
			
 
				+The
			
 
				+relay commands are: \emph{relay
			
 
				 data} (for data flowing down the stream), \emph{relay begin} (to open a
			
 
				 stream), \emph{relay end} (to close a stream cleanly), \emph{relay
			
 
				 teardown} (to close a broken stream), \emph{relay connected}
			
@@ -599,7 +605,7 @@ and to acknowledge), \emph{relay truncate} and \emph{relay truncated}
 
				 sendme} (used for congestion control), and \emph{relay drop} (used to
			
 
				 implement long-range dummies).
			
 
				 
			
 
				-We describe each of these cell types in more detail below.
			
 
				+We describe each of these cell types and commands in more detail below.
			
 
				 
			
 
				 \SubSection{Circuits and streams}
			
 
				 \label{subsec:circuits}
			
@@ -614,41 +620,60 @@ open many TCP streams.
 
				 
			
 
				 In Tor, each circuit can be shared by many TCP streams.  To avoid
			
 
				 delays, users construct circuits preemptively.  To limit linkability
			
 
				-among the streams, users rotate connections by building a new circuit
			
 
				+among their streams, users' OPs build a new circuit
			
 
				 periodically if the previous one has been used,
			
 
				-and expire old used circuits that are no longer in use. Tor considers
			
 
				-making a new circuit once a minute: thus
			
 
				+and expire old used circuits that no longer have any open streams.  
			
 
				+OPs consider making a new circuit once a minute: thus
			
 
				 even heavy users spend a negligible amount of time and CPU in
			
 
				 building circuits, but only a limited number of requests can be linked
			
 
				-to each other by a given exit node. Also, because circuits are built
			
 
				-in the background, failed routers do not affect user experience.
			
 
				+to each other through a given exit node. Also, because circuits are built
			
 
				+in the background, OPs can recover from failed circuit creation
			
 
				+without delaying streams and thereby harming user experience.
			
 
				 
			
 
				 \subsubsection{Constructing a circuit}
			
 
				 \label{subsubsec:constructing-a-circuit}
			
 
				 
			
 
				+%XXXX Discuss what happens with circIDs here.
			
 
				+
			
 
				 Users construct a circuit incrementally, negotiating a symmetric key with
			
 
				-each hop one at a time. To begin creating a new circuit, the user
			
 
				+each OR on the circuit, one hop at a time. To begin creating a new
			
 
				+circuit, the user
			
 
				 (call her Alice) sends a \emph{create} cell to the first node in her
			
 
				-chosen path. The cell's payload is the first half of the
			
 
				-Diffie-Hellman handshake, encrypted to the onion key of the OR (call
			
 
				+chosen path. This cell's payload contains the first half of the
			
 
				+Diffie-Hellman handshake ($g^x$), encrypted to the onion key of the OR (call
			
 
				 him Bob). Bob responds with a \emph{created} cell containing the second
			
 
				 half of the DH handshake, along with a hash of the negotiated key
			
 
				 $K=g^{xy}$.
			
 
				 
			
 
				-To extend a circuit past the first hop, Alice sends a \emph{relay extend}
			
 
				-cell to the last node in the circuit, specifying the address of the new
			
 
				-OR and an encrypted $g^x$ for it. That node copies the half-handshake
			
 
				-into a \emph{create} cell, and passes it to the new OR to extend the
			
 
				-circuit. When it responds with a \emph{created} cell, the penultimate OR
			
 
				-copies the payload into a \emph{relay extended} cell and passes it back.
			
 
				-% Nick: please fix my "that OR" pronouns -RD
			
 
				-
			
 
				-The onion-level handshake protocol achieves unilateral entity
			
 
				-authentication (Alice knows she's handshaking with Bob, Bob doesn't
			
 
				-care who is opening the circuit---Alice has no key and is trying to
			
 
				-remain anonymous) and unilateral key authentication (Alice and Bob
			
 
				-agree on a key, and Alice knows Bob is the only other person who should
			
 
				-know it). We also want perfect forward secrecy and key freshness.
			
 
				+Once the circuit has been established, Alice and Bob can send one
			
 
				+another relay cells encrypted with the negotiated
			
 
				+key.\footnote{Actually, the negotiated key is used to derive two
			
 
				+  symmetric keys: one for each direction.}  More detail is given in
			
 
				+the next section.
			
 
				+
			
 
				+To extend the circuit further, Alice sends a \emph{relay extend} cell
			
 
				+to Bob, specifying the address of the next OR (call her Carol), and
			
 
				+an encrypted $g^{x_2}$ for her.  Bob copies the half-handshake into a
			
 
				+\emph{create} cell, and passes it to Carol to extend the circuit.
			
 
				+When Carol responds with a \emph{created} cell, Bob wraps the payload
			
 
				+into a \emph{relay extended} cell and passes it back to Alice.  Now
			
 
				+the circuit is extended to Carol, and Alice and Carol share a common key
			
 
				+$K_2 = g^{x_2 y_2}$.
			
 
				+
			
 
				+In order to extend the circuit to a third node or beyond, Alice
			
 
				+proceeds as above, always telling the last node in the circuit to
			
 
				+extend one hop further.
			
 
				+% XXX Briefly mention path selection.
			
 
				+
			
 
				+This circuit-level handshake protocol achieves unilateral entity
			
 
				+authentication (Alice knows she's handshaking with Bob/Carol, but
			
 
				+Bob/Carol doesn't care who is opening the circuit---Alice has no key
			
 
				+and is trying to remain anonymous) and unilateral key authentication
			
 
				+(Alice and Bob/Carol agree on a key, and Alice knows Bob/Carol is the
			
 
				+only other person who should know it). It also achieves forward
			
 
				+secrecy and key freshness.  Formally, the protocol is as follows
			
 
				+(Where $E_{PK_{Bob}}(\cdot)$ is encryption with Bob's public key,
			
 
				+$H$ is a secure hash function, and $|$ is concatenation.)
			
 
				 
			
 
				 \begin{equation}
			
 
				 \begin{aligned}
			
@@ -657,20 +682,28 @@ know it). We also want perfect forward secrecy and key freshness.
 
				 \end{aligned}
			
 
				 \end{equation}
			
 
				 
			
 
				-The second step shows both that it was Bob
			
 
				-who received $g^x$, and that it was Bob who came up with $y$. We use
			
 
				-PK encryption in the first step (rather than, say, using the first two
			
 
				-steps of STS, which has a signature in the second step) because we
			
 
				-don't have enough room in a single cell for a public key and also a
			
 
				-signature. Preliminary analysis with the NRL protocol analyzer \cite{meadows96}
			
 
				-shows the above protocol to be secure (including providing PFS) under the
			
 
				-traditional Dolev-Yao model.
			
 
				+In the second step, Bob proves that it was he who who received $g^x$,
			
 
				+and who came up with $y$. We use PK encryption in the first step
			
 
				+(rather than, say, using the first two steps of STS, which has a
			
 
				+signature in the second step) because a single cell is too small to
			
 
				+hold both a public key and a signature. Preliminary analysis with the
			
 
				+NRL protocol analyzer \cite{meadows96} shows the above protocol to be
			
 
				+secure (including providing PFS) under the traditional Dolev-Yao
			
 
				+model.
			
 
				 
			
 
				 \subsubsection{Relay cells}
			
 
				-Once Alice has established the circuit (so she shares a key with each
			
 
				+Once Alice has established the circuit (so she shares keys with each
			
 
				 OR on the circuit), she can send relay cells.
			
 
				-The stream ID in the relay header indicates to which stream the cell belongs.
			
 
				-A relay cell can be addressed to any of the ORs on the circuit. To
			
 
				+% XXX Describe _here_ what happens with relay cells that are not 
			
 
				+% XXX targeted at a given node; how they're decrypted; how they're
			
 
				+% XXX encrypted.  The easiest expository order should probably be: What ORs
			
 
				+% XXX Do With Unrecognized Streams; What Alice Does To Build Relay
			
 
				+% XXX Cells; What ORs Do With Streams They Recognize.
			
 
				+Recall that every relay header has a stream ID in the relay header
			
 
				+that indicates to
			
 
				+which stream the cell belongs.
			
 
				+This stream ID allows a relay cell to be addressed to any of the ORs
			
 
				+on the circuit. To
			
 
				 construct a relay cell addressed to a given OR, Alice iteratively
			
 
				 encrypts the cell payload (that is, the relay header and payload)
			
 
				 with the symmetric key of each hop up to that OR. Then, at each hop
			
@@ -685,18 +718,22 @@ Alice may choose different exit points because of their exit policies,
 
				 or to keep the ORs from knowing that two streams
			
 
				 originate at the same person.
			
 
				 
			
 
				-To tear down a circuit, Alice sends a destroy control cell. Each OR
			
 
				-in the circuit receives the destroy cell, closes all open streams on
			
 
				-that circuit, and passes a new destroy cell forward. But since circuits
			
 
				+To tear down a whole circuit, Alice sends a \emph{destroy} control
			
 
				+cell. Each OR
			
 
				+in the circuit receives the \emph{destroy} cell, closes all open streams on
			
 
				+that circuit, and passes a new \emph{destroy} cell forward. But since circuits
			
 
				 can be built incrementally, they can also be torn down incrementally:
			
 
				 Alice can instead send a relay truncate cell to a node along the circuit. That
			
 
				-node will send a destroy cell forward, and reply with an acknowledgment
			
 
				-(relay truncated). Alice might truncate her circuit so she can extend it
			
 
				+node will send a \emph{destroy} cell forward, and reply with an acknowledgment
			
 
				+(a \emph{relay truncated} cell).  Alice might truncate her circuit so
			
 
				+she can extend it
			
 
				 to different nodes without signaling to the first few nodes (or somebody
			
 
				 observing them) that she is changing her circuit. That is, nodes in the
			
 
				-middle are not even aware that the circuit was truncated, because the
			
 
				-relay cells are encrypted. Similarly, if a node on the circuit goes down,
			
 
				-the adjacent node can send a relay truncated back to Alice. Thus the
			
 
				+middle of a truncated are not even aware when the circuit is
			
 
				+truncated, because they see only the encrypted relay cells.
			
 
				+Similarly, if a node on the circuit goes down,
			
 
				+the adjacent node can send a \emph{relay truncated} cell back to
			
 
				+Alice.  Thus the
			
 
				 ``break a node and see which circuits go down'' attack is weakened.
			
 
				 
			
 
				 \SubSection{Opening and closing streams}
			
@@ -882,6 +919,7 @@ Currently, non-data relay cells do not affect the windows. Thus we
 
				 avoid potential deadlock issues, e.g. because a stream can't send a
			
 
				 relay sendme cell because its packaging window is empty.
			
 
				 
			
 
				+% XXX Bad heading
			
 
				 \subsubsection{Needs more research}
			
 
				 
			
 
				 We don't need to reimplement full TCP windows (with sequence numbers,
			
@@ -1892,6 +1930,7 @@ issues remaining to be ironed out. In particular:
 
				   robustness/latency trade-offs, our performance trade-offs (including
			
 
				   cell size), our abuse-prevention mechanisms, and
			
 
				   our overall usability.
			
 
				+% XXX large and small cells on same network.
			
 
				 % XXX work with morphmix spec
			
 
				 \end{tightlist}
			
 
				 
			
@@ -1933,6 +1972,8 @@ issues remaining to be ironed out. In particular:
 
				 %     Hyphens are for multi-part words; en dashs imply movement or
			
 
				 %        opposition (The Alice--Bob connection); and em dashes are
			
 
				 %        for punctuation---like that.
			
 
				+%     A relay cell; a control cell; a \emph{create} cell; a
			
 
				+%     \emph{relay truncated} cell.  Never ``a \emph{relay truncated}.''
			
 
				 %
			
 
				 %     'Substitute ``Damn'' every time you're inclined to write ``very;'' your
			
 
				 %     editor will delete it and the writing will be just as it should be.'