22 年前 · 933d531f15
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -81,7 +81,7 @@ build a \emph{circuit}, in which each node (or ``onion router'' or ``OR'')
 
				 in the path knows its predecessor and successor, but no other nodes in
			
 
				 the circuit.  Traffic flowing down the circuit is sent in fixed-size
			
 
				 \emph{cells}, which are unwrapped by a symmetric key at each node
			
 
				-(like the layers of an onion) and relayed downstream. The 
			
 
				+(like the layers of an onion) and relayed downstream. The
			
 
				 Onion Routing project published several design and analysis papers
			
 
				 \cite{or-ih96,or-jsac98,or-discex00,or-pet00}. While a wide area Onion
			
 
				 Routing network was deployed briefly, the only long-running and
			
@@ -144,7 +144,7 @@ streams along each circuit to improve efficiency and anonymity.
 
				 
			
 
				 \textbf{Leaky-pipe circuit topology:} Through in-band signaling
			
 
				 within the circuit, Tor initiators can direct traffic to nodes partway
			
 
				-down the circuit. This novel approach 
			
 
				+down the circuit. This novel approach
			
 
				 allows traffic to exit the circuit from the middle---possibly
			
 
				 frustrating traffic shape and volume attacks based on observing the end
			
 
				 of the circuit. (It also allows for long-range padding if
			
@@ -257,7 +257,7 @@ difficult for them to prevent an attacker who can eavesdrop both ends of the
 
				 communication from correlating the timing and volume
			
 
				 of traffic entering the anonymity network with traffic leaving it.  These
			
 
				 protocols are also vulnerable against active attacks in which an
			
 
				-adversary introduces timing patterns into traffic entering the network and 
			
 
				+adversary introduces timing patterns into traffic entering the network and
			
 
				 looks
			
 
				 for correlated patterns among exiting traffic.
			
 
				 Although some work has been done to frustrate
			
@@ -274,7 +274,7 @@ confirmation (cf.\ Section~\ref{subsec:threat-model}).
 
				 The simplest low-latency designs are single-hop proxies such as the
			
 
				 {\bf Anonymizer} \cite{anonymizer}, wherein a single trusted server strips the
			
 
				 data's origin before relaying it.  These designs are easy to
			
 
				-analyze, but users must trust the anonymizing proxy. 
			
 
				+analyze, but users must trust the anonymizing proxy.
			
 
				 Concentrating the traffic to a single point increases the anonymity set
			
 
				 (the people a given user is hiding among), but it is vulnerable if the
			
 
				 adversary can observe all traffic going into and out of the proxy.
			
@@ -294,7 +294,7 @@ The {\bf Java Anon Proxy} (also known as JAP or Web MIXes) uses fixed shared
 
				 routes known as \emph{cascades}.  As with a single-hop proxy, this
			
 
				 approach aggregates users into larger anonymity sets, but again an
			
 
				 attacker only needs to observe both ends of the cascade to bridge all
			
 
				-the system's traffic.  The Java Anon Proxy's design 
			
 
				+the system's traffic.  The Java Anon Proxy's design
			
 
				 calls for padding between end users and the head of the cascade
			
 
				 \cite{web-mix}. However, it is not demonstrated whether the current
			
 
				 implementation's padding policy improves anonymity.
			
@@ -340,7 +340,7 @@ Tor, they may accept TCP streams and relay the data in those streams
 
				 along the circuit, ignoring the breakdown of that data into TCP segments
			
 
				 \cite{morphmix:fc04,anonnet}. Finally, they may accept application-level
			
 
				 protocols (such as HTTP) and relay the application requests themselves
			
 
				-along the circuit.  
			
 
				+along the circuit.
			
 
				 Making this protocol-layer decision requires a compromise between flexibility
			
 
				 and anonymity.  For example, a system that understands HTTP, such as Crowds,
			
 
				 can strip
			
@@ -449,7 +449,7 @@ normalization} like Privoxy or the Anonymizer. If anonymization from
 
				 the responder is desired for complex and variable
			
 
				 protocols like HTTP, Tor must be layered with a filtering proxy such
			
 
				 as Privoxy to hide differences between clients, and expunge protocol
			
 
				-features that leak identity. 
			
 
				+features that leak identity.
			
 
				 Note that by this separation Tor can also provide services that
			
 
				 are anonymous to the network yet authenticated to the responder, like
			
 
				 SSH. Similarly, Tor does not currently integrate
			
@@ -473,7 +473,7 @@ compromise some fraction of the onion routers.
 
				 In low-latency anonymity systems that use layered encryption, the
			
 
				 adversary's typical goal is to observe both the initiator and the
			
 
				 responder. By observing both ends, passive attackers can confirm a
			
 
				-suspicion that Alice is 
			
 
				+suspicion that Alice is
			
 
				 talking to Bob if the timing and volume patterns of the traffic on the
			
 
				 connection are distinct enough; active attackers can induce timing
			
 
				 signatures on the traffic to force distinct patterns. Rather
			
@@ -509,7 +509,7 @@ each of these attacks.
 
				 \Section{The Tor Design}
			
 
				 \label{sec:design}
			
 
				 
			
 
				-The Tor network is an overlay network; each onion router (OR) 
			
 
				+The Tor network is an overlay network; each onion router (OR)
			
 
				 runs as a normal
			
 
				 user-level process without any special privileges.
			
 
				 Each onion router maintains a long-term TLS \cite{TLS}
			
@@ -524,7 +524,7 @@ runs local software called an onion proxy (OP) to fetch directories,
 
				 establish circuits across the network,
			
 
				 and handle connections from user applications.  These onion proxies accept
			
 
				 TCP streams and multiplex them across the circuits. The onion
			
 
				-router on the other side 
			
 
				+router on the other side
			
 
				 of the circuit connects to the destinations of
			
 
				 the TCP streams and relays data.
			
 
				 
			
@@ -578,8 +578,8 @@ and \emph{destroy} (to tear down a circuit).
 
				 Relay cells have an additional header (the relay header) after the
			
 
				 cell header, containing a stream identifier (many streams can
			
 
				 be multiplexed over a circuit); an end-to-end checksum for integrity
			
 
				-checking; the length of the relay payload; and a relay command.  
			
 
				-The entire contents of the relay header and the relay cell payload 
			
 
				+checking; the length of the relay payload; and a relay command.
			
 
				+The entire contents of the relay header and the relay cell payload
			
 
				 are encrypted or decrypted together as the relay cell moves along the
			
 
				 circuit, using the 128-bit AES cipher in counter mode to generate a
			
 
				 cipher stream.
			
@@ -622,7 +622,7 @@ without delaying streams and thereby harming user experience.\\
 
				 A user's OP constructs circuits incrementally, negotiating a
			
 
				 symmetric key with each OR on the circuit, one hop at a time. To begin
			
 
				 creating a new circuit, the OP (call her Alice) sends a
			
 
				-\emph{create} cell to the first node in her chosen path (call him Bob).  
			
 
				+\emph{create} cell to the first node in her chosen path (call him Bob).
			
 
				 (She chooses a new
			
 
				 circID $C_{AB}$ not currently used on the connection from her to Bob.)
			
 
				 The \emph{create} cell's
			
@@ -694,7 +694,7 @@ whether the decrypted streamID is recognized---either because it
 
				 corresponds to an open stream at this OR for the given circuit, or because
			
 
				 it is the control streamID (zero).  If the OR recognizes the
			
 
				 streamID, it accepts the relay cell and processes it as described
			
 
				-below.  Otherwise, 
			
 
				+below.  Otherwise,
			
 
				 the OR looks up the circID and OR for the
			
 
				 next step in the circuit, replaces the circID as appropriate, and
			
 
				 sends the decrypted relay cell to the next OR.  (If the OR at the end
			
@@ -713,19 +713,19 @@ encrypts the cell payload (that is, the relay header and payload) with
 
				 the symmetric key of each hop up to that OR.  Because the streamID is
			
 
				 encrypted to a different value at each step, only at the targeted OR
			
 
				 will it have a meaningful value.\footnote{
			
 
				-  % Should we just say that 2^56 is itself negligible?  
			
 
				-  % Assuming 4-hop circuits with 10 streams per hop, there are 33 
			
 
				+  % Should we just say that 2^56 is itself negligible?
			
 
				+  % Assuming 4-hop circuits with 10 streams per hop, there are 33
			
 
				   % possible bad streamIDs before the last circuit.  This still
			
 
				   % gives an error only once every 2 million terabytes (approx).
			
 
				 With 56 bits of streamID per cell, the probability of an accidental
			
 
				 collision is far lower than the chance of hardware failure.}
			
 
				 This \emph{leaky pipe} circuit topology
			
 
				-allows Alice's streams to exit at different ORs on a single circuit.  
			
 
				+allows Alice's streams to exit at different ORs on a single circuit.
			
 
				 Alice may choose different exit points because of their exit policies,
			
 
				 or to keep the ORs from knowing that two streams
			
 
				 originate from the same person.
			
 
				 
			
 
				-When an OR later replies to Alice with a relay cell, it 
			
 
				+When an OR later replies to Alice with a relay cell, it
			
 
				 encrypts the cell's relay header and payload with the single key it
			
 
				 shares with Alice, and sends the cell back toward Alice along the
			
 
				 circuit.  Subsequent ORs add further layers of encryption as they
			
@@ -836,7 +836,7 @@ Thus, we check integrity only at the edges of each stream. When Alice
 
				 negotiates a key with a new hop, they each initialize a SHA-1
			
 
				 digest with a derivative of that key,
			
 
				 thus beginning with randomness that only the two of them know. From
			
 
				-then on they each incrementally add to the SHA-1 digest the contents of 
			
 
				+then on they each incrementally add to the SHA-1 digest the contents of
			
 
				 all relay cells they create, and include with each relay cell the
			
 
				 first four bytes of the current digest.  Each also keeps a SHA-1
			
 
				 digest of data received, to verify that the received hashes are correct.
			
@@ -851,7 +851,7 @@ of computing the digests is minimal compared to doing the AES
 
				 encryption performed at each hop of the circuit. We use only four
			
 
				 bytes per cell to minimize overhead; the chance that an adversary will
			
 
				 correctly guess a valid hash
			
 
				-%, plus the payload the current cell, 
			
 
				+%, plus the payload the current cell,
			
 
				 is
			
 
				 acceptably low, given that Alice or Bob tear down the circuit if they
			
 
				 receive a bad hash.
			
@@ -861,7 +861,7 @@ receive a bad hash.
 
				 
			
 
				 Volunteers are generally more willing to run services that can limit
			
 
				 their own bandwidth usage. To accommodate them, Tor servers use a
			
 
				-token bucket approach \cite{tannenbaum96} to 
			
 
				+token bucket approach \cite{tannenbaum96} to
			
 
				 enforce a long-term average rate of incoming bytes, while still
			
 
				 permitting short-term bursts above the allowed bandwidth. Current bucket
			
 
				 sizes are set to ten seconds' worth of traffic.
			
@@ -908,7 +908,7 @@ reimplement full TCP windows (with sequence numbers,
 
				 the ability to drop cells when we're full and retransmit later, and so
			
 
				 on),
			
 
				 because TCP already guarantees in-order delivery of each
			
 
				-cell. 
			
 
				+cell.
			
 
				 %But we need to investigate further the effects of the current
			
 
				 %parameters on throughput and latency, while also keeping privacy in mind;
			
 
				 %see Section~\ref{sec:maintaining-anonymity} for more discussion.
			
@@ -950,9 +950,9 @@ Currently, non-data relay cells do not affect the windows. Thus we
 
				 avoid potential deadlock issues, for example, arising because a stream
			
 
				 can't send a \emph{relay sendme} cell when its packaging window is empty.
			
 
				 
			
 
				-These arbitrarily chosen parameters 
			
 
				+These arbitrarily chosen parameters
			
 
				 %are probably not optimal; more
			
 
				-%research remains to find which parameters 
			
 
				+%research remains to find which parameters
			
 
				 seem to give tolerable throughput and delay; more research remains.
			
 
				 
			
 
				 \Section{Other design decisions}
			
@@ -1042,7 +1042,7 @@ given host or network---an external adversary cannot eavesdrop traffic
 
				 between the private exit and the final destination, and so is less sure of
			
 
				 Alice's destination and activities. Most onion routers will function as
			
 
				 \emph{restricted exits} that permit connections to the world at large,
			
 
				-but prevent access to certain abuse-prone addresses and services. 
			
 
				+but prevent access to certain abuse-prone addresses and services.
			
 
				 Additionally, in some cases the OR can authenticate clients to
			
 
				 prevent exit abuse without harming anonymity \cite{or-discex00}.
			
 
				 
			
@@ -1134,7 +1134,7 @@ an adversary could take over the network by creating many servers
 
				 server administrator before they are included. Mechanisms for automated
			
 
				 node approval are an area of active research, and are discussed more
			
 
				 in Section~\ref{sec:maintaining-anonymity}.
			
 
				-  
			
 
				+
			
 
				 Of course, a variety of attacks remain. An adversary who controls
			
 
				 a directory server can track clients by providing them different
			
 
				 information---perhaps by listing only nodes under its control, or by
			
@@ -1214,7 +1214,7 @@ identity even in the presence of router failure. Bob's service must
 
				 not be tied to a single OR, and Bob must be able to tie his service
			
 
				 to new ORs. \textbf{Smear-resistant:}
			
 
				 A social attacker who offers an illegal or disreputable location-hidden
			
 
				-service should not be able to ``frame'' a rendezvous router by 
			
 
				+service should not be able to ``frame'' a rendezvous router by
			
 
				 making observers believe the router created that service.
			
 
				 %slander-resistant? defamation-resistant?
			
 
				 \textbf{Application-transparent:} Although we require users
			
@@ -1257,7 +1257,7 @@ application integration is described more fully below.
 
				       rendezvous cookie that it will use to recognize Bob.
			
 
				 \item Alice opens an anonymous stream to one of Bob's introduction
			
 
				       points, and gives it a message (encrypted to Bob's public key)
			
 
				-      which tells him 
			
 
				+      which tells him
			
 
				       about herself, her chosen RP and the rendezvous cookie, and the
			
 
				       first half of a DH
			
 
				       handshake. The introduction point sends the message to Bob.
			
@@ -1296,7 +1296,7 @@ service. During normal situations, Bob's service might simply be offered
 
				 directly from mirrors, while Bob gives out tokens to high-priority users. If
			
 
				 the mirrors are knocked down,
			
 
				 %by distributed DoS attacks or even
			
 
				-%physical attack, 
			
 
				+%physical attack,
			
 
				 those users can switch to accessing Bob's service via
			
 
				 the Tor rendezvous system.
			
 
				 
			
@@ -1369,7 +1369,7 @@ reveal traffic patterns (both sent and received). Profiling via user
 
				 connection patterns requires further processing, because multiple
			
 
				 application streams may be operating simultaneously or in series over
			
 
				 a single circuit.
			
 
				-  
			
 
				+
			
 
				 \emph{Observing user content.} While content at the user end is encrypted,
			
 
				 connections to responders may not be (indeed, the responding website
			
 
				 itself may be hostile). While filtering content is not a primary goal
			
@@ -1394,20 +1394,20 @@ by running the OP on the Tor node or behind a firewall. This approach
 
				 requires an observer to separate traffic originating at the onion
			
 
				 router from traffic passing through it: a global observer can do this,
			
 
				 but it might be beyond a limited observer's capabilities.
			
 
				-  
			
 
				+
			
 
				 \emph{End-to-end size correlation.} Simple packet counting
			
 
				 will also be effective in confirming
			
 
				 endpoints of a stream. However, even without padding, we have some
			
 
				 limited protection: the leaky pipe topology means different numbers
			
 
				 of packets may enter one end of a circuit than exit at the other.
			
 
				-  
			
 
				+
			
 
				 \emph{Website fingerprinting.} All the effective passive
			
 
				 attacks above are traffic confirmation attacks,
			
 
				 which puts them outside our design goals. There is also
			
 
				 a passive traffic analysis attack that is potentially effective.
			
 
				 Rather than searching exit connections for timing and volume
			
 
				 correlations, the adversary may build up a database of
			
 
				-``fingerprints'' containing file sizes and access patterns for 
			
 
				+``fingerprints'' containing file sizes and access patterns for
			
 
				 targeted websites. He can later confirm a user's connection to a given
			
 
				 site simply by consulting the database. This attack has
			
 
				 been shown to be effective against SafeWeb \cite{hintz-pet02}.
			
@@ -1415,7 +1415,7 @@ It may be less effective against Tor, since
 
				 streams are multiplexed within the same circuit, and
			
 
				 fingerprinting will be limited to
			
 
				 the granularity of cells (currently 256 bytes). Additional
			
 
				-defenses could include 
			
 
				+defenses could include
			
 
				 larger cell sizes, padding schemes to group websites
			
 
				 into large sets, and link
			
 
				 padding or long-range dummies.\footnote{Note that this fingerprinting
			
@@ -1464,7 +1464,7 @@ connection.  There is also a danger that application
 
				 protocols and associated programs can be induced to reveal information
			
 
				 about the initiator. Tor depends on Privoxy and similar protocol cleaners
			
 
				 to solve this latter problem.
			
 
				-  
			
 
				+
			
 
				 \emph{Run an onion proxy.} It is expected that end users will
			
 
				 nearly always run their own local onion proxy. However, in some
			
 
				 settings, it may be necessary for the proxy to run
			
@@ -1478,7 +1478,7 @@ of the Tor network can increase the value of this traffic
 
				 by attacking non-observed nodes to shut them down, reduce
			
 
				 their reliability, or persuade users that they are not trustworthy.
			
 
				 The best defense here is robustness.
			
 
				-  
			
 
				+
			
 
				 \emph{Run a hostile OR.}  In addition to being a local observer,
			
 
				 an isolated hostile node can create circuits through itself, or alter
			
 
				 traffic patterns to affect traffic at other nodes. Nonetheless, a hostile
			
@@ -1488,8 +1488,8 @@ run multiple ORs, and can persuade the directory servers
 
				 that those ORs are trustworthy and independent, then occasionally
			
 
				 some user will choose one of those ORs for the start and another
			
 
				 as the end of a circuit. If an adversary
			
 
				-controls $m>1$ out of $N$ nodes, he should be able to correlate at most 
			
 
				-$\left(\frac{m}{N}\right)^2$ of the traffic in this way---although an 
			
 
				+controls $m>1$ out of $N$ nodes, he should be able to correlate at most
			
 
				+$\left(\frac{m}{N}\right)^2$ of the traffic in this way---although an
			
 
				 adversary
			
 
				 could possibly attract a disproportionately large amount of traffic
			
 
				 by running an OR with an unusually permissive exit policy, or by
			
@@ -1497,7 +1497,7 @@ degrading the reliability of other routers.
 
				 
			
 
				 \emph{Introduce timing into messages.} This is simply a stronger
			
 
				 version of passive timing attacks already discussed earlier.
			
 
				-  
			
 
				+
			
 
				 \emph{Tagging attacks.} A hostile node could ``tag'' a
			
 
				 cell by altering it. If the
			
 
				 stream were, for example, an unencrypted request to a Web site,
			
@@ -1506,14 +1506,14 @@ the association. However, integrity checks on cells prevent
 
				 this attack.
			
 
				 
			
 
				 \emph{Replace contents of unauthenticated protocols.}  When
			
 
				-relaying an unauthenticated protocol like HTTP, a hostile exit node 
			
 
				+relaying an unauthenticated protocol like HTTP, a hostile exit node
			
 
				 can impersonate the target server. Clients
			
 
				 should prefer protocols with end-to-end authentication.
			
 
				 
			
 
				 \emph{Replay attacks.} Some anonymity protocols are vulnerable
			
 
				 to replay attacks.  Tor is not; replaying one side of a handshake
			
 
				 will result in a different negotiated session key, and so the rest
			
 
				-of the recorded session can't be used.  
			
 
				+of the recorded session can't be used.
			
 
				 
			
 
				 \emph{Smear attacks.} An attacker could use the Tor network for
			
 
				 socially disapproved acts, to bring the
			
@@ -1558,7 +1558,7 @@ ORs in the final directory as he wishes. We must ensure that directory
 
				 server operators are independent and attack-resistant.
			
 
				 
			
 
				 \emph{Encourage directory server dissent.}  The directory
			
 
				-agreement protocol assumes that directory server operators agree on 
			
 
				+agreement protocol assumes that directory server operators agree on
			
 
				 the set of directory servers.  An adversary who can persuade some
			
 
				 of the directory server operators to distrust one another could
			
 
				 split the quorum into mutually hostile camps, thus partitioning
			
@@ -1567,7 +1567,7 @@ this attack.
 
				 
			
 
				 \emph{Trick the directory servers into listing a hostile OR.}
			
 
				 Our threat model explicitly assumes directory server operators will
			
 
				-be able to filter out most hostile ORs. 
			
 
				+be able to filter out most hostile ORs.
			
 
				 % If this is not true, an
			
 
				 % attacker can flood the directory with compromised servers.
			
 
				 
			
@@ -1579,7 +1579,7 @@ accepting TLS connections from ORs but ignoring all cells. Directory
 
				 servers must actively test ORs by building circuits and streams as
			
 
				 appropriate.  The tradeoffs of a similar approach are discussed in
			
 
				 \cite{mix-acc}.\\
			
 
				-  
			
 
				+
			
 
				 \noindent{\large\bf Attacks against rendezvous points}\\
			
 
				 \emph{Make many introduction requests.}  An attacker could
			
 
				 try to deny Bob service by flooding his introduction points with
			
@@ -1587,7 +1587,7 @@ requests.  Because the introduction points can block requests that
 
				 lack authorization tokens, however, Bob can restrict the volume of
			
 
				 requests he receives, or require a certain amount of computation for
			
 
				 every request he receives.
			
 
				-  
			
 
				+
			
 
				 \emph{Attack an introduction point.} An attacker could
			
 
				 disrupt a location-hidden service by disabling its introduction
			
 
				 points.  But because a service's identity is attached to its public
			
@@ -1612,7 +1612,7 @@ with a session key shared by Alice and Bob.
 
				 
			
 
				 \Section{Open Questions in Low-latency Anonymity}
			
 
				 \label{sec:maintaining-anonymity}
			
 
				- 
			
 
				+
			
 
				 In addition to the non-goals in
			
 
				 Section~\ref{subsec:non-goals}, many other questions must be solved
			
 
				 before we can be confident of Tor's security.
			
@@ -1645,7 +1645,7 @@ three nodes unrelated to herself and her destination.
 
				 %
			
 
				 %Thus normally she chooses
			
 
				 %three nodes, but if she is running an OR and her destination is on an OR,
			
 
				-%she uses five. 
			
 
				+%she uses five.
			
 
				 Should Alice choose a nondeterministic path length (say,
			
 
				 increasing it from a geometric distribution) to foil an attacker who
			
 
				 uses timing to learn that he is the fifth hop and thus concludes that
			
@@ -1684,7 +1684,7 @@ immediately beneficial because of real-world adversaries that can't
 
				 observe Alice's router, but can run routers of their own?
			
 
				 
			
 
				 To scale to many users, and to prevent an attacker from observing the
			
 
				-whole network at once, it may be necessary 
			
 
				+whole network at once, it may be necessary
			
 
				 to support far more servers than Tor currently anticipates.
			
 
				 This introduces several issues.  First, if approval by a centralized set
			
 
				 of directory servers is no longer feasible, what mechanism should be used
			
@@ -1724,7 +1724,7 @@ Tor brings together many innovations into a unified deployable system. The
 
				 next immediate steps include:
			
 
				 
			
 
				 \emph{Scalability:} Tor's emphasis on deployability and design simplicity
			
 
				-has led us to adopt a clique topology, semi-centralized 
			
 
				+has led us to adopt a clique topology, semi-centralized
			
 
				 directories, and a full-network-visibility model for client
			
 
				 knowledge. These properties will not scale past a few hundred servers.
			
 
				 Section~\ref{sec:maintaining-anonymity} describes some promising
			
@@ -1831,7 +1831,7 @@ our overall usability.
 
				 %     'Cypherpunk', 'Cypherpunks', 'Cypherpunk remailer'
			
 
				 %     'Onion Routing design', 'onion router' [note capitalization]
			
 
				 %     'SOCKS'
			
 
				-%     Try not to use \cite as a noun.  
			
 
				+%     Try not to use \cite as a noun.
			
 
				 %     'Authorizating' sounds great, but it isn't a word.
			
 
				 %     'First, second, third', not 'Firstly, secondly, thirdly'.
			
 
				 %     'circuit', not 'channel'