22 years ago · 27dd67e3a0
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -40,6 +40,7 @@
 
				 %\fi
			
 
				 
			
 
				 \title{Tor: The Second-Generation Onion Router}
			
 
				+% Putting the 'Private' back in 'Virtual Private Network'
			
 
				 
			
 
				 %\author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
			
 
				 %Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
			
@@ -52,14 +53,12 @@
 
				 We present Tor, a circuit-based low-latency anonymous communication
			
 
				 system. Tor is the successor to Onion Routing
			
 
				 and addresses many limitations in the original Onion Routing design.
			
 
				-Tor works in a real-world Internet environment,
			
 
				-% it's user-space too
			
 
				+Tor works in a real-world Internet environment, requires no special
			
 
				+privileges such as root- or kernel-level access,
			
 
				 requires little synchronization or coordination between nodes, and
			
 
				-provides a reasonable tradeoff between anonymity and usability/efficiency
			
 
				-%protects against known anonymity-breaking attacks as well
			
 
				-%as or better than other systems with similar design parameters.
			
 
				-% and we present a big list of open problems at the end
			
 
				-% and we present a new practical design for rendezvous points
			
 
				+provides a reasonable tradeoff between anonymity and usability/efficiency.
			
 
				+We include a new practical design for rendezvous points, as well
			
 
				+as a big list of open problems.
			
 
				 \end{abstract}
			
 
				 
			
 
				 %\begin{center}
			
@@ -205,7 +204,7 @@ unreliable nodes in the first place.
 
				 %We further provide a
			
 
				 %simple mechanism that allows connections to be established despite recent
			
 
				 %node failure or slightly dated information from a directory server. Tor
			
 
				-%permits onion routers to have \emph{router twins} --- nodes that share
			
 
				+%permits onion routers to have \emph{router twins}---nodes that share
			
 
				 %the same private decryption key. Note that because connections now have
			
 
				 %perfect forward secrecy, an onion router still cannot read the traffic
			
 
				 %on a connection established through its twin even while that connection
			
@@ -365,6 +364,30 @@ Cebolla \cite{cebolla}, and AnonNet \cite{anonnet} build the circuit
 
				 in stages, extending it one hop at a time. This approach makes perfect
			
 
				 forward secrecy feasible.
			
 
				 
			
 
				+Circuit-based anonymity designs must choose which protocol layer
			
 
				+to anonymize. They may choose to intercept IP packets directly, and
			
 
				+relay them whole (stripping the source address) as the contents of
			
 
				+the circuit \cite{tarzan:ccs02,freedom2-arch}.  Alternatively, like
			
 
				+Tor, they may accept TCP streams and relay the data in those streams
			
 
				+along the circuit, ignoring the breakdown of that data into TCP frames
			
 
				+\cite{anonnet,morphmix:fc04}. Finally, they may accept application-level
			
 
				+protocols (such as HTTP) and relay the application requests themselves
			
 
				+along the circuit.  
			
 
				+This protocol-layer decision represents a compromise between flexibility
			
 
				+and anonymity.  For example, a system that understands HTTP can strip
			
 
				+identifying information from those requests; can take advantage of caching
			
 
				+to limit the number of requests that leave the network; and can batch
			
 
				+or encode those requests in order to minimize the number of connections.
			
 
				+On the other hand, an IP-level anonymizer can handle nearly any protocol,
			
 
				+even ones unforeseen by their designers (though these systems require
			
 
				+kernel-level modifications to some operating systems, and so are more
			
 
				+complex and less portable). TCP-level anonymity networks like Tor present
			
 
				+a middle approach: they are fairly application neutral (so long as the
			
 
				+application supports, or can be tunneled across, TCP), but by treating
			
 
				+application connections as data streams rather than raw TCP packets,
			
 
				+they avoid the well-known inefficiencies of tunneling TCP over TCP
			
 
				+\cite{tcp-over-tcp-is-bad}. [XXX what's a better cite?]
			
 
				+
			
 
				 Distributed-trust anonymizing systems need to prevent attackers from
			
 
				 adding too many servers and thus compromising too many user paths.
			
 
				 Tor relies on a centrally maintained set of well-known servers. Tarzan
			
@@ -768,7 +791,7 @@ more complex.
 
				 Rather than doing integrity checking of the relay cells at each hop,
			
 
				 which would increase packet size
			
 
				 by a function of path length\footnote{This is also the argument against
			
 
				-using recent cipher modes like EAX \cite{eax} --- we don't want the added
			
 
				+using recent cipher modes like EAX \cite{eax}---we don't want the added
			
 
				 message-expansion overhead at each hop, and we don't want to leak the path
			
 
				 length (or pad to some max path length).}, we choose to
			
 
				 % accept passive timing attacks, 
			
@@ -904,8 +927,9 @@ see Section~\ref{sec:maintaining-anonymity} for more discussion.
 
				 Providing Tor as a public service provides many opportunities for an
			
 
				 attacker to mount denial-of-service attacks against the network.  While
			
 
				 flow control and rate limiting (discussed in
			
 
				-section~\ref{subsec:congestion}) prevents users from consuming more
			
 
				-bandwidth than nodes are willing to provide, opportunities remain for
			
 
				+Section~\ref{subsec:congestion}) prevent users from consuming more
			
 
				+bandwidth than routers are willing to provide, opportunities remain for
			
 
				+users to
			
 
				 consume more network resources than their fair share, or to render the
			
 
				 network unusable for other users.
			
 
				 
			
@@ -913,85 +937,44 @@ First of all, there are a number of CPU-consuming denial-of-service
 
				 attacks wherein an attacker can force an OR to perform expensive
			
 
				 cryptographic operations.  For example, an attacker who sends a
			
 
				 \emph{create} cell full of junk bytes can force an OR to perform an RSA
			
 
				-decrypt its half of the Diffie-Helman handshake.  Similarly, an attacker
			
 
				+decrypt.  Similarly, an attacker can
			
 
				 fake the start of a TLS handshake, forcing the OR to carry out its
			
 
				 (comparatively expensive) half of the handshake at no real computational
			
 
				 cost to the attacker.
			
 
				 
			
 
				-To address these attacks, several approaches exist.  First, ORs may
			
 
				+Several approaches exist to address these attacks. First, ORs may
			
 
				 demand proof-of-computation tokens \cite{hashcash} before beginning new
			
 
				 TLS handshakes or accepting \emph{create} cells.  So long as these
			
 
				 tokens are easy to verify and computationally expensive to produce, this
			
 
				 approach limits the DoS attack multiplier.  Additionally, ORs may limit
			
 
				 the rate at which they accept create cells and TLS connections, so that
			
 
				-the computational work of doing so does not drown out the (comparatively
			
 
				-inexpensive) work of symmetric cryptography needed to keep users'
			
 
				-packets flowing.  This rate limiting could, however, allows an attacker
			
 
				-to slow down other users as they build new circuits.
			
 
				+the computational work of processing them does not drown out the (comparatively
			
 
				+inexpensive) work of symmetric cryptography needed to keep cells
			
 
				+flowing.  This rate limiting could, however, allows an attacker
			
 
				+to slow down other users when they build new circuits.
			
 
				 
			
 
				 % What about link-to-link rate limiting?
			
 
				 
			
 
				-% This paragraph needs more references.
			
 
				 More worrisome are distributed denial of service attacks wherein an
			
 
				 attacker uses a large number of compromised hosts throughout the network
			
 
				 to consume the Tor network's resources.  Although these attacks are not
			
 
				 new to the networking literature, some proposed approaches are a poor
			
 
				 fit to anonymous networks.  For example, solutions based on backtracking
			
 
				-harmful traffic present a significant risk that an anonymity-breaking
			
 
				-adversary could exploit the backtracking mechanism to compromise users'
			
 
				-anonymity.  [XXX So, what should we say here? -NM]
			
 
				-
			
 
				-% Now would be a good point to talk about twins.   What the do, what
			
 
				-% they can't.
			
 
				+harmful traffic \cite{XXX} could allow an anonymity-breaking
			
 
				+adversary to exploit the backtracking mechanism.
			
 
				 
			
 
				 Attackers also have an opportunity to attack the Tor network by mounting
			
 
				-attacks on the hosts and network links running it. If an attacker can
			
 
				-successfully disrupt a single circuit or link along a virtual circuit,
			
 
				-all currently open streams passing along that part of the circuit
			
 
				-become unrecoverable, and are closed.  The current Tor design treats
			
 
				-such attacks as intermittent network failures, and depends on users and
			
 
				-applications to respond or recover as appropriate.  A possible future
			
 
				-design could use an end-to-end based TCP-like acknowledgment protocol,
			
 
				-so that no streams are lost unless the entry or exit point themselves
			
 
				-are disrupted.  This solution would require more buffering at exits,
			
 
				-however, and its network properties still need to be investigated. [XXX
			
 
				-  That sounds really evasive. We should say more.]
			
 
				-
			
 
				-%[XXX Mention that OR-to-OR connections should be highly reliable
			
 
				-%  (whatever that means).  If they aren't, everything can stall.]
			
 
				-
			
 
				-%=====================
			
 
				-% This stuff should go elsewhere.  Probably section 2.
			
 
				-
			
 
				-Channel-based anonymity designs must choose which protocol layer to
			
 
				-anonymize.  They may choose to intercept IP packets directly, and relay
			
 
				-them whole (stripping the source address) as the contents of the
			
 
				-circuit \cite{tarzan:ccs02,freedom2-arch}.  Alternatively,
			
 
				-they may
			
 
				-accept TCP streams and relay the data in those streams along the
			
 
				-circuit, ignoring the breakdown of that data into TCP frames. (Tor
			
 
				-takes this approach, as does Rennhard's anonymity network \cite{anonnet}
			
 
				-and MorphMix \cite{morphmix:fc04}.)  Finally, they may accept
			
 
				-application-level protocols (such as HTTP) and relay the application
			
 
				-requests themselves along the circuit.
			
 
				-
			
 
				-This protocol-layer decision represents a compromise between flexibility
			
 
				-and anonymity.  For example, a system that understands HTTP can strip
			
 
				-identifying information from those requests; can take advantage of
			
 
				-caching to limit the number of requests that leave the network; and can
			
 
				-batch or encode those requests in order to minimize the number of
			
 
				-connections.  On the other hand, an IP-level anonymizer can handle
			
 
				-nearly any protocol, even ones unforeseen by their designers.  TCP-level
			
 
				-anonymity networks like Tor present a middle approach: they are fairly
			
 
				-application neutral (so long as the application supports, or can be
			
 
				-tunneled across, TCP), but by treating application connections as data
			
 
				-streams rather than raw TCP packets, they avoid the well-known
			
 
				-inefficiencies of tunneling TCP over TCP \cite{tcp-over-tcp-is-bad}.
			
 
				-% Is there a better tcp-over-tcp-is-bad reference?
			
 
				-
			
 
				-%Also mention that weirdo IP trickery requires kernel patches to most
			
 
				-%operating systems? -NM
			
 
				-
			
 
				+attacks on its hosts and network links. Disrupting a single circuit or
			
 
				+link breaks all currently open streams passing along that part of the
			
 
				+circuit. Indeed, this same loss of service occurs when a router crashes
			
 
				+or its operator restarts it. The current Tor design treats such attacks
			
 
				+as intermittent network failures, and depends on users and applications
			
 
				+to respond or recover as appropriate. A future design could use an
			
 
				+end-to-end based TCP-like acknowledgment protocol, so that no streams are
			
 
				+lost unless the entry or exit point itself is disrupted. This solution
			
 
				+would require more buffering at the network edges, however, and the
			
 
				+performance and anonymity implications from this extra complexity still
			
 
				+require investigation.
			
 
				 
			
 
				 \SubSection{Exit policies and abuse}
			
 
				 \label{subsec:exitpolicies}