21 years ago · bcb084d3ba
--- a/doc/design-paper/challenges.tex
+++ b/doc/design-paper/challenges.tex
@@ -48,7 +48,7 @@ Anonymous communication is full of surprises.  This paper discusses some
 
				 unexpected challenges arising from our experiences deploying Tor, a
			
 
				 low-latency general-purpose anonymous communication system.  We will discuss
			
 
				 some of the difficulties we have experienced and how we have met them (or how
			
 
				-we plan to meet them, if we know).  We will also discuss some less
			
 
				+we plan to meet them, if we know).  We also discuss some less
			
 
				 troublesome open problems that we must nevertheless eventually address.
			
 
				 %We will describe both those future challenges that we intend to explore and
			
 
				 %those that we have decided not to explore and why.
			
@@ -56,15 +56,15 @@ troublesome open problems that we must nevertheless eventually address.
 
				 Tor is an overlay network for anonymizing TCP streams over the
			
 
				 Internet~\cite{tor-design}.  It addresses limitations in earlier Onion
			
 
				 Routing designs~\cite{or-ih96,or-jsac98,or-discex00,or-pet00} by adding
			
 
				-perfect forward secrecy, congestion control, directory servers, integrity
			
 
				-checking, configurable exit policies, and location-hidden services using
			
 
				+perfect forward secrecy, congestion control, directory servers, data
			
 
				+integrity, configurable exit policies, and location-hidden services using
			
 
				 rendezvous points.  Tor works on the real-world Internet, requires no special
			
 
				 privileges or kernel modifications, requires little synchronization or
			
 
				 coordination between nodes, and provides a reasonable tradeoff between
			
 
				 anonymity, usability, and efficiency.
			
 
				 
			
 
				-We first publicly deployed a Tor network in October 2003; since then it has
			
 
				-grown to over a hundred volunteer Tor nodes
			
 
				+We first deployed a public Tor network in October 2003; since then it has
			
 
				+grown to over a hundred volunteer-operated nodes
			
 
				 and as much as 80 megabits of
			
 
				 average traffic per second.  Tor's research strategy has focused on deploying
			
 
				 a network to as many users as possible; thus, we have resisted designs that
			
@@ -72,21 +72,19 @@ would compromise deployability by imposing high resource demands on node
 
				 operators, and designs that would compromise usability by imposing
			
 
				 unacceptable restrictions on which applications we support.  Although this
			
 
				 strategy has
			
 
				-its drawbacks (including a weakened threat model, as discussed below), it has
			
 
				+drawbacks (including a weakened threat model, as discussed below), it has
			
 
				 made it possible for Tor to serve many thousands of users and attract
			
 
				 funding from diverse sources whose goals range from security on a
			
 
				-national scale down to the liberties of each individual.
			
 
				+national scale down to individual liberties.
			
 
				 
			
 
				-While~\cite{tor-design} gives an overall view of Tor's
			
 
				-design and goals, this paper describes policy, social, and technical
			
 
				+In~\cite{tor-design} we gave an overall view of Tor's
			
 
				+design and goals.  Here we describe some policy, social, and technical
			
 
				 issues that we face as we continue deployment.
			
 
				-Rather than trying to provide complete solutions to every problem here, we
			
 
				-lay out the assumptions and constraints that we have observed while
			
 
				-deploying Tor in the wild.  In doing so, we aim to create a research agenda
			
 
				-for others to help in addressing these issues.  We believe that the issues
			
 
				-described here will be of general interest to any and all
			
 
				-projects attempting to build
			
 
				-and deploy practical, useable anonymity networks in the wild.
			
 
				+Rather than providing complete solutions to every problem, we
			
 
				+instead lay out the challenges and constraints that we have observed while
			
 
				+deploying Tor in the wild.  In doing so, we aim to provide a research agenda
			
 
				+of general interest to projects attempting to build
			
 
				+and deploy practical, usable anonymity networks in the wild.
			
 
				 
			
 
				 %While the Tor design paper~\cite{tor-design} gives an overall view its
			
 
				 %design and goals,
			
@@ -122,46 +120,48 @@ compare Tor to other low-latency anonymity designs.
 
				 Tor provides \emph{forward privacy}, so that users can connect to
			
 
				 Internet sites without revealing their logical or physical locations
			
 
				 to those sites or to observers.  It also provides \emph{location-hidden
			
 
				-services}, so that critical servers can support authorized users without
			
 
				-giving adversaries an effective vector for physical or online attacks.
			
 
				-The design provides these protections even when a portion of its own
			
 
				-infrastructure is controlled by an adversary.
			
 
				-
			
 
				-To create a private network pathway with Tor, the client software
			
 
				-incrementally builds a \emph{circuit} of encrypted connections through
			
 
				-Tor nodes on the network. The circuit is extended one hop at a time, and
			
 
				-each node along the way knows only which node gave it data and which
			
 
				-node it is giving data to. No individual Tor node ever knows the complete
			
 
				-path that a data packet has taken. The client negotiates a separate set
			
 
				-of encryption keys for each hop along the circuit. % to ensure that each
			
 
				-%hop can't trace these connections as they pass through.
			
 
				-Because each node sees no more than one hop in the
			
 
				-circuit, neither an eavesdropper nor a compromised node can use traffic
			
 
				-analysis to link the connection's source and destination.
			
 
				-For efficiency, the Tor software uses the same circuit for all the TCP
			
 
				-connections that happen within the same short period.
			
 
				-Later requests use a new
			
 
				+services}, so that servers can support authorized users without
			
 
				+giving an effective vector for physical or online attackers.
			
 
				+Tor provides these protections even when a portion of its
			
 
				+infrastructure is compromised.
			
 
				+
			
 
				+To connect to a remove server via Tor, the client software learns a signed
			
 
				+list of Tor nodes from one of several central \emph{directory servers}, and
			
 
				+incrementally creates a private pathway or \emph{circuit} of encrypted
			
 
				+connections through authenticated Tor nodes on the network, negotiating a
			
 
				+separate set of encryption keys for each hop along the circuit.  The circuit
			
 
				+is extended one node at a time, and each node along the way knows only the
			
 
				+immediately previous and following nodes in the circuit, so no individual Tor
			
 
				+node knows the complete path that each fixed-sized data packet (or
			
 
				+\emph{cell}) will take.
			
 
				+%Because each node sees no more than one hop in the
			
 
				+%circuit,
			
 
				+Thus, neither an eavesdropper nor a compromised node can
			
 
				+see both the connection's source and destination.  Later requests use a new
			
 
				 circuit, to complicate long-term linkability between different actions by
			
 
				 a single user.
			
 
				 
			
 
				-Tor also makes it possible for users to hide their locations while
			
 
				-offering various kinds of services, such as web publishing or an instant
			
 
				-messaging server. Using ``rendezvous points'', other Tor users can
			
 
				-connect to these hidden services, each without knowing the other's network
			
 
				-identity.
			
 
				+Tor also helps servers hide their locations while
			
 
				+providing services such as web publishing or instant
			
 
				+messaging.  Using ``rendezvous points'', other Tor users can
			
 
				+connect to these authenticated hidden services, neither one learning the
			
 
				+other's network identity.
			
 
				 
			
 
				 Tor attempts to anonymize the transport layer, not the application layer.
			
 
				-This is useful for applications such as ssh
			
 
				+This approach is useful for applications such as SSH
			
 
				 where authenticated communication is desired. However, when anonymity from
			
 
				 those with whom we communicate is desired,
			
 
				 application protocols that include personally identifying information need
			
 
				 additional application-level scrubbing proxies, such as
			
 
				-Privoxy~\cite{privoxy} for HTTP\@.  Furthermore, Tor does not permit arbitrary
			
 
				-IP packets; it only anonymizes TCP streams and DNS request, and only supports
			
 
				-connections via SOCKS (see Section~\ref{subsec:tcp-vs-ip}).
			
 
				-
			
 
				-Most node operators do not want to allow arbitary TCP connections to leave
			
 
				-their server.  To address this, Tor provides \emph{exit policies} so that
			
 
				+Privoxy~\cite{privoxy} for HTTP\@.  Furthermore, Tor does not relay arbitrary
			
 
				+IP packets; it only anonymizes TCP streams and DNS requests
			
 
				+%, and only supports
			
 
				+%connections via SOCKS
			
 
				+(but see Section~\ref{subsec:tcp-vs-ip}).
			
 
				+
			
 
				+Most node operators do not want to allow arbitary TCP traffic.% to leave
			
 
				+%their server.
			
 
				+To address this, Tor provides \emph{exit policies} so
			
 
				 each exit node can block the IP addresses and ports it is unwilling to allow.
			
 
				 Tor nodes advertise their exit policies to the directory servers, so that
			
 
				 client can tell which nodes will support their connections.
			
@@ -169,18 +169,20 @@ client can tell which nodes will support their connections.
 
				 As of January 2005, the Tor network has grown to around a hundred nodes
			
 
				 on four continents, with a total capacity exceeding 1Gbit/s. Appendix A
			
 
				 shows a graph of the number of working nodes over time, as well as a
			
 
				-graph of the number of bytes being handled by the network over time. At
			
 
				-this point the network is sufficiently diverse for further development
			
 
				-and testing; but of course we always encourage and welcome new nodes
			
 
				-to join the network.
			
 
				+graph of the number of bytes being handled by the network over time.
			
 
				+The network is now sufficiently diverse for further development
			
 
				+and testing; but of course we always encourage new nodes
			
 
				+to join.
			
 
				 
			
 
				 Tor research and development has been funded by ONR and DARPA
			
 
				 for use in securing government
			
 
				 communications, and by the Electronic Frontier Foundation, for use
			
 
				 in maintaining civil liberties for ordinary citizens online. The Tor
			
 
				 protocol is one of the leading choices
			
 
				-to be the anonymizing layer in the European Union's PRIME directive to
			
 
				-help maintain privacy in Europe. The University of Dresden in Germany
			
 
				+for anonymizing layer in the European Union's PRIME directive to
			
 
				+help maintain privacy in Europe.
			
 
				+% XXXX We should credit the specific group, not the whole university.
			
 
				+The University of Dresden in Germany
			
 
				 has integrated an independent implementation of the Tor protocol into
			
 
				 their popular Java Anon Proxy anonymizing client.
			
 
				 % This wide variety of
			
@@ -192,16 +194,16 @@ their popular Java Anon Proxy anonymizing client.
 
				 {\bf Threat models and design philosophy.}
			
 
				 The ideal Tor network would be practical, useful and and anonymous. When
			
 
				 trade-offs arise between these properties, Tor's research strategy has been
			
 
				-to insist on remaining useful enough to attract many users,
			
 
				+to remain useful enough to attract many users,
			
 
				 and practical enough to support them.  Only subject to these
			
 
				-constraints do we aim to maximize
			
 
				+constraints do we try to maximize
			
 
				 anonymity.\footnote{This is not the only possible
			
 
				 direction in anonymity research: designs exist that provide more anonymity
			
 
				 than Tor at the expense of significantly increased resource requirements, or
			
 
				 decreased flexibility in application support (typically because of increased
			
 
				 latency).  Such research does not typically abandon aspirations towards
			
 
				 deployability or utility, but instead tries to maximize deployability and
			
 
				-utility subject to a certain degree of inherent anonymity (inherent because
			
 
				+utility subject to a certain degree of structural anonymity (structural because
			
 
				 usability and practicality affect usage which affects the actual anonymity
			
 
				 provided by the network \cite{econymics,back01}).}
			
 
				 %{We believe that these
			
@@ -210,38 +212,63 @@ provided by the network \cite{econymics,back01}).}
 
				 %of what makes a system ``practical'' for volunteer operators and ``useful''
			
 
				 %for home users, and helps illuminate undernoticed issues which any deployed
			
 
				 %volunteer anonymity network will need to address.}
			
 
				-Because of this strategy, Tor has a weaker threat model than many anonymity
			
 
				-designs in the literature.   In particular, because we
			
 
				+Because of our strategy, Tor has a weaker threat model than many designs in
			
 
				+the literature.  In particular, because we
			
 
				 support interactive communications without impractically expensive padding,
			
 
				 we fall prey to a variety
			
 
				 of intra-network~\cite{back01,attack-tor-oak05,flow-correlation04} and
			
 
				 end-to-end~\cite{danezis-pet2004,SS03} anonymity-breaking attacks.
			
 
				 
			
 
				-
			
 
				 Tor does not attempt to defend against a global observer.  In general, an
			
 
				 attacker who can observe both ends of a connection through the Tor network
			
 
				 can correlate the timing and volume of data on that connection as it enters
			
 
				-and leaves the network, and so link a user to her chosen communication
			
 
				-parties.  Known solutions to this attack would seem to require introducing a
			
 
				+and leaves the network, and so link communication partners.
			
 
				+Known solutions to this attack would seem to require introducing a
			
 
				 prohibitive degree of traffic padding between the user and the network, or
			
 
				 introducing an unacceptable degree of latency (but see Section
			
 
				 \ref{subsec:mid-latency}).  Also, it is not clear that these methods would
			
 
				-work at all against even a minimally active adversary that can introduce timing
			
 
				+work at all against even a minimally active adversary who could introduce timing
			
 
				 patterns or additional traffic.  Thus, Tor only attempts to defend against
			
 
				-external observers who cannot observe both sides of a user's connection.
			
 
				-
			
 
				-The distinction between traffic correlation and traffic analysis is
			
 
				-not as cut and dried as we might wish. In \cite{hintz-pet02} it was
			
 
				-shown that if data volumes of various popular
			
 
				-responder destinations are catalogued, it may not be necessary to
			
 
				-observe both ends of a stream to learn a source-destination link.
			
 
				-This should be fairly effective without simultaneously observing both
			
 
				-ends of the connection. However, it is still essentially confirming
			
 
				-suspected communicants where the responder suspects are ``stored'' rather
			
 
				-than observed at the same time as the client.
			
 
				+external observers who cannot observe both sides of a user's connections.
			
 
				+
			
 
				+
			
 
				+Against internal attackers who sign up Tor nodes, the situation is more
			
 
				+complicated.  In the simplest case, if an adversary has compromised $c$ of
			
 
				+$n$ nodes on the Tor network, then the adversary will be able to compromise
			
 
				+a random circuit with probability $\frac{c^2}{n^2}$ (since the circuit
			
 
				+initiator chooses hops randomly).  But there are
			
 
				+complicating factors:
			
 
				+(1)~If the user continues to build random circuits over time, an adversary
			
 
				+  is pretty certain to see a statistical sample of the user's traffic, and
			
 
				+  thereby can build an increasingly accurate profile of her behavior.  (See
			
 
				+  Section~\ref{subsec:helper-nodes} for possible solutions.)
			
 
				+(2)~An adversary who controls a popular service outside the Tor network
			
 
				+  can be certain to observe all connections to that service; he
			
 
				+  can therefore trace connections to that service with probability
			
 
				+  $\frac{c}{n}$.
			
 
				+(3)~Users do not in fact choose nodes with uniform probability; they
			
 
				+  favor nodes with high bandwidth or uptime, and exit nodes that
			
 
				+  permit connections to their favorite services.
			
 
				+See Section~\ref{subsec:routing-zones} for discussion of larger
			
 
				+adversaries and our dispersal goals.
			
 
				+
			
 
				+% I'm trying to make this paragraph work without reference to the
			
 
				+% analysis/confirmation distinction, which we haven't actually introduced
			
 
				+% yet, and which we realize isn't very stable anyway.  Also, I don't want to
			
 
				+% deprecate these attacks if we can't demonstrate that they don't work, since
			
 
				+% in case they *do* turn out to work well against Tor, we'll look pretty
			
 
				+% foolish. -NM
			
 
				+More powerful attacks may exist. In \cite{hintz-pet02} it was
			
 
				+shown that an attacker who can catalog data volumes of popular
			
 
				+responder destinations (say, websites with consistant data volumes) may not
			
 
				+need to
			
 
				+observe both ends of a stream to learn source-destination links for those
			
 
				+responders.
			
 
				+%However, it is still essentially confirming
			
 
				+%suspected communicants where the responder suspects are ``stored'' rather
			
 
				+%than observed at the same time as the client.
			
 
				 Similarly latencies of going through various routes can be
			
 
				-catalogued~\cite{back01} to connect endpoints.
			
 
				-This is likely to entail high variability and massive storage since
			
 
				+cataloged~\cite{back01} to connect endpoints.
			
 
				 % XXX hintz-pet02 just looked at data volumes of the sites. this
			
 
				 % doesn't require much variability or storage. I think it works
			
 
				 % quite well actually. Also, \cite{kesdogan:pet2002} takes the
			
@@ -251,52 +278,26 @@ This is likely to entail high variability and massive storage since
 
				 % I was trying to be terse and simultaneously referring to both the
			
 
				 % Hintz stuff and the Back et al. stuff from Info Hiding 01. I've
			
 
				 % separated the two and added the references. -PFS
			
 
				-routes through the network to each site will be random even if they
			
 
				-have relatively unique latency characteristics. So this does not seem
			
 
				-an immediate practical threat. Further along similar lines, the same
			
 
				+It has not yet been shown whether these attacks will succeed or fail
			
 
				+in the presence of the varaibility and volume quantization introduced by the
			
 
				+Tor network, but it seems likely that these factors will at best delay
			
 
				+rather than halt the attacks in the cases where they succeed.
			
 
				+%likely to entail high variability and massive storage since
			
 
				+%routes through the network to each site will be random even if they
			
 
				+%have relatively unique latency characteristics. So this does not seem
			
 
				+%an immediate practical threat.
			
 
				+Along similar lines, the same
			
 
				 paper suggested a ``clogging attack''. In \cite{attack-tor-oak05}, a
			
 
				 version of this was demonstrated to be practical against portions of
			
 
				 the fifty node Tor network as deployed in mid 2004. There it was shown
			
 
				 that an outside attacker can trace a stream through the Tor network
			
 
				-while a stream is still active simply by observing the latency of his
			
 
				+while a stream is still active by observing the latency of his
			
 
				 own traffic sent through various Tor nodes. These attacks do not show
			
 
				-the client address, only the first node within the Tor network, making
			
 
				-helper nodes all the more worthy of exploration. (See
			
 
				-Section~\ref{subsec:helper-nodes}.)
			
 
				-
			
 
				-Against internal attackers who sign up Tor nodes, the situation is more
			
 
				-complicated.  In the simplest case, if an adversary has compromised $c$ of
			
 
				-$n$ nodes on the Tor network, then the adversary will be able to compromise
			
 
				-a random circuit with probability $\frac{c^2}{n^2}$ (since the circuit
			
 
				-initiator chooses hops randomly).  But there are
			
 
				-complicating factors:
			
 
				-(1)~If the user continues to build random circuits over time, an adversary
			
 
				-  is pretty certain to see a statistical sample of the user's traffic, and
			
 
				-  thereby can build an increasingly accurate profile of her behavior.  (See
			
 
				-  Section~\ref{subsec:helper-nodes} for possible solutions.)
			
 
				-(2)~An adversary who controls a popular service outside of the Tor network
			
 
				-  can be certain of observing all connections to that service; he
			
 
				-  therefore will trace connections to that service with probability
			
 
				-  $\frac{c}{n}$.
			
 
				-(3)~Users do not in fact choose nodes with uniform probability; they
			
 
				-  favor nodes with high bandwidth or uptime, and exit nodes that
			
 
				-  permit connections to their favorite services. 
			
 
				-(See Section~\ref{subsec:routing-zones} for discussion of how larger
			
 
				-adversaries affect our dispersal goals.)
			
 
				-
			
 
				-%\begin{tightlist}
			
 
				-%\item If the user continues to build random circuits over time, an adversary
			
 
				-%  is pretty certain to see a statistical sample of the user's traffic, and
			
 
				-%  thereby can build an increasingly accurate profile of her behavior.  (See
			
 
				-%  \ref{subsec:helper-nodes} for possible solutions.)
			
 
				-%\item An adversary who controls a popular service outside of the Tor network
			
 
				-%  can be certain of observing all connections to that service; he
			
 
				-%  therefore will trace connections to that service with probability
			
 
				-%  $\frac{c}{n}$.
			
 
				-%\item Users do not in fact choose nodes with uniform probability; they
			
 
				-%  favor nodes with high bandwidth or uptime, and exit nodes that
			
 
				-%  permit connections to their favorite services.
			
 
				-%\end{tightlist}
			
 
				+client and server addresses, only the first and last nodes within the Tor
			
 
				+network, so it is still necessary to observe those nodes to complete the
			
 
				+attacks.  This may make
			
 
				+helper nodes all the more worthy of exploration (see
			
 
				+Section~\ref{subsec:helper-nodes}).
			
 
				 
			
 
				 %discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning
			
 
				 %the last hop is not $c/n$ since that doesn't take the destination (website)
			
@@ -335,25 +336,19 @@ adversaries affect our dispersal goals.)
 
				 %see Section~\ref{subsec:helper-nodes} for discussion of some ways to
			
 
				 %address this issue.
			
 
				 
			
 
				-
			
 
				 \medskip
			
 
				 \noindent
			
 
				 {\bf Distributed trust.}
			
 
				-In practice Tor's threat model is based entirely on the goal of
			
 
				+In practice Tor's threat model is based on
			
 
				 dispersal and diversity.
			
 
				-Tor's defense lies in having a diverse enough set of nodes
			
 
				+Our defense lies in having a diverse enough set of nodes
			
 
				 to prevent most real-world
			
 
				-adversaries from being in the right places to attack users.
			
 
				-Tor aims to resist observers and insiders by distributing each transaction
			
 
				+adversaries from being in the right places to attack users,
			
 
				+by distributing each transaction
			
 
				 over several nodes in the network.  This ``distributed trust'' approach
			
 
				 means the Tor network can be safely operated and used by a wide variety
			
 
				-of mutually distrustful users, providing more sustainability and security
			
 
				-than some previous attempts at anonymizing networks.
			
 
				-The Tor network has a broad range of users, including ordinary citizens
			
 
				-concerned about their privacy, corporations
			
 
				-who don't want to reveal information to their competitors, and law
			
 
				-enforcement and government intelligence agencies who need
			
 
				-to do operations on the Internet without being noticed.
			
 
				+of mutually distrustful users, providing sustainability and security.
			
 
				+%than some previous attempts at anonymizing networks.
			
 
				 
			
 
				 No organization can achieve this security on its own.  If a single
			
 
				 corporation or government agency were to build a private network to
			
@@ -368,6 +363,11 @@ and who is looking for what information.  %By bringing more users onto
 
				 %the network, all users become more secure~\cite{econymics}.
			
 
				 %[XXX I feel uncomfortable saying this last sentence now. -RD]
			
 
				 %[So, I took it out. I think we can do without it. -PFS]
			
 
				+The Tor network has a broad range of users, including ordinary citizens
			
 
				+concerned about their privacy, corporations
			
 
				+who don't want to reveal information to their competitors, and law
			
 
				+enforcement and government intelligence agencies who need
			
 
				+to do operations on the Internet without being noticed.
			
 
				 Naturally, organizations will not want to depend on others for their
			
 
				 security.  If most participating providers are reliable, Tor tolerates
			
 
				 some hostile infiltration of the network.  For maximum protection,
			
@@ -382,28 +382,28 @@ Tor is not the only anonymity system that aims to be practical and useful.
 
				 Commercial single-hop proxies~\cite{anonymizer}, as well as unsecured
			
 
				 open proxies around the Internet, can provide good
			
 
				 performance and some security against a weaker attacker. The Java
			
 
				-Anon Proxy~\cite{web-mix} provides similar functionality to Tor but only
			
 
				-handles web browsing rather than arbitrary TCP\@.
			
 
				+Anon Proxy~\cite{web-mix} provides similar functionality to Tor but
			
 
				+handles only web browsing rather than arbitrary TCP\@.
			
 
				 %Some peer-to-peer file-sharing overlay networks such as
			
 
				 %Freenet~\cite{freenet} and Mute~\cite{mute}
			
 
				 Zero-Knowledge Systems' commercial Freedom
			
 
				 network~\cite{freedom21-security} was even more flexible than Tor in
			
 
				-that it could transport arbitrary IP packets, and it also supported
			
 
				-pseudonymous access rather than just anonymous access; but it had
			
 
				+transporting arbitrary IP packets, and also supported
			
 
				+pseudonymous in addition to anonymity; but it has
			
 
				 a different approach to sustainability (collecting money from users
			
 
				-and paying ISPs to run Tor nodes), and was shut down due to financial
			
 
				+and paying ISPs to run Tor nodes), and was eventually shut down due to financial
			
 
				 load.  Finally, potentially
			
 
				-more scalable designs like Tarzan~\cite{tarzan:ccs02} and
			
 
				+more scalable peer-to-peer designs like Tarzan~\cite{tarzan:ccs02} and
			
 
				 MorphMix~\cite{morphmix:fc04} have been proposed in the literature, but
			
 
				-have not yet been fielded. All of these systems differ somewhat
			
 
				+have not yet been fielded. These systems differ somewhat
			
 
				 in threat model and presumably practical resistance to threats.
			
 
				-Morphmix is very close to Tor in circuit setup. And, by separating
			
 
				+Morphmix is close to Tor in circuit setup, and, by separating
			
 
				 node discovery from route selection from circuit setup, Tor is
			
 
				 flexible enough to potentially contain a Morphmix experiment within
			
 
				-it. We direct the interested reader to Section
			
 
				-2 of~\cite{tor-design} for a more in-depth review of related work.
			
 
				+it. We direct the interested reader
			
 
				+to~\cite{tor-design} for a more in-depth review of related work.
			
 
				 
			
 
				-Tor differs from other deployed systems for traffic analysis resistance
			
 
				+Tor also differs from other deployed systems for traffic analysis resistance
			
 
				 in its security and flexibility.  Mix networks such as
			
 
				 Mixmaster~\cite{mixmaster-spec} or its successor Mixminion~\cite{minion-design}
			
 
				 gain the highest degrees of anonymity at the expense of introducing highly
			
@@ -440,18 +440,19 @@ Tor's interaction with other services on the Internet.
 
				 \subsection{Communicating security}
			
 
				 
			
 
				 Usability for anonymity systems
			
 
				-contributes directly to their security, because how usable the system
			
 
				-is impacts the possible anonymity set~\cite{econymics,back01}. Or
			
 
				-conversely, an unusable system attracts few users and thus can't provide
			
 
				+contributes directly to their security, because usability
			
 
				+effects the possible anonymity set~\cite{econymics,back01}.
			
 
				+Conversely, an unusable system attracts few users and thus can't provide
			
 
				 much anonymity.
			
 
				 
			
 
				 This phenomenon has a second-order effect: knowing this, users should
			
 
				 choose which anonymity system to use based in part on how usable
			
 
				+and secure
			
 
				 \emph{others} will find it, in order to get the protection of a larger
			
 
				-anonymity set. Thus we might replace the adage ``usability is a security
			
 
				+anonymity set. Thus we might supplement the adage ``usability is a security
			
 
				 parameter''~\cite{back01} with a new one: ``perceived usability is a
			
 
				 security parameter.'' From here we can better understand the effects
			
 
				-of publicity and advertising on security: the more convincing your
			
 
				+of publicity on security: the more convincing your
			
 
				 advertising, the more likely people will believe you have users, and thus
			
 
				 the more users you will attract. Perversely, over-hyped systems (if they
			
 
				 are not too broken) may be a better choice than modestly promoted ones,
			
@@ -473,26 +474,26 @@ other, there's an arms race between end-to-end statistical attacks and
 
				 counter-strategies~\cite{statistical-disclosure,minion-design,e2e-traffic,trickle02}.
			
 
				 But for low-latency systems like Tor, end-to-end \emph{traffic
			
 
				 correlation} attacks~\cite{danezis-pet2004,defensive-dropping,SS03}
			
 
				-allow an attacker who can measure both ends of a communication
			
 
				-to match packet timing and volume, quickly linking
			
 
				-the initiator to her destination. This is why Tor's threat model is
			
 
				-based on preventing the adversary from observing both the initiator and
			
 
				-the responder.
			
 
				+allow an attacker who can observe both ends of a communication
			
 
				+to correlate packet timing and volume, quickly linking
			
 
				+the initiator to her destination.% This is why Tor's threat model is
			
 
				+%based on preventing the adversary from observing both the initiator and
			
 
				+%the responder.
			
 
				 
			
 
				 Like Tor, the current JAP implementation does not pad connections
			
 
				-(apart from using small fixed-size cells for transport). In fact,
			
 
				-JAP's cascade-based network topology may be even more vulnerable to these
			
 
				+apart from using small fixed-size cells for transport. In fact,
			
 
				+JAP's cascade-based network topology may be more vulnerable to these
			
 
				 attacks, because the network has fewer edges. JAP was born out of
			
 
				 the ISDN mix design~\cite{isdn-mixes}, where padding made sense because
			
 
				 every user had a fixed bandwidth allocation and altering the timing
			
 
				 pattern of packets could be immediately detected, but in its current context
			
 
				 as a general Internet web anonymizer, adding sufficient padding to JAP
			
 
				-would be prohibitively expensive and probably ineffective against a
			
 
				+would probably be prohibitively expensive and ineffective against a
			
 
				 minimally active attacker.\footnote{Even if JAP could
			
 
				 fund higher-capacity nodes indefinitely, our experience
			
 
				 suggests that many users would not accept the increased per-user
			
 
				 bandwidth requirements, leading to an overall much smaller user base. But
			
 
				-cf.\ Section~\ref{subsec:mid-latency}.} Therefore, since under this threat
			
 
				+see Section~\ref{subsec:mid-latency}.} Therefore, since under this threat
			
 
				 model the number of concurrent users does not seem to have much impact
			
 
				 on the anonymity provided, we suggest that JAP's anonymity meter is not
			
 
				 accurately communicating security levels to its users.
			
@@ -509,17 +510,17 @@ on the network. We investigate this issue next.
 
				 Another factor impacting the network's security is its reputability:
			
 
				 the perception of its social value based on its current user base. If Alice is
			
 
				 the only user who has ever downloaded the software, it might be socially
			
 
				-accepted, but she's not getting much anonymity. Add a thousand animal rights
			
 
				-activists, and she's anonymous, but everyone thinks she's a Bambi lover (or
			
 
				-NRA member if you prefer a contrasting example). Add a thousand
			
 
				+accepted, but she's not getting much anonymity. Add a thousand
			
 
				+activists, and she's anonymous, but everyone thinks she's an activist too.
			
 
				+Add a thousand
			
 
				 diverse citizens (cancer survivors, privacy enthusiasts, and so on)
			
 
				 and now she's harder to profile.
			
 
				 
			
 
				-Furthermore, the network's reputability affects its node base: more people
			
 
				+Furthermore, the network's reputability affects its operator base: more people
			
 
				 are willing to run a service if they believe it will be used by human rights
			
 
				 workers than if they believe it will be used exclusively for disreputable
			
 
				 ends.  This effect becomes stronger if node operators themselves think they
			
 
				-will be associated with these disreputable ends.
			
 
				+will be associated with their users' disreputable ends.
			
 
				 
			
 
				 So the more cancer survivors on Tor, the better for the human rights
			
 
				 activists. The more malicious hackers, the worse for the normal users. Thus,
			
@@ -532,7 +533,7 @@ political attacks, since it will attract fewer supporters.
 
				 While people therefore have an incentive for the network to be used for
			
 
				 ``more reputable'' activities than their own, there are still tradeoffs
			
 
				 involved when it comes to anonymity. To follow the above example, a
			
 
				-network used entirely by cancer survivors might welcome some NRA members
			
 
				+network used entirely by cancer survivors might welcome file sharers
			
 
				 onto the network, though of course they'd prefer a wider
			
 
				 variety of users.
			
 
				 
			
@@ -592,7 +593,7 @@ hardly likely to tell us specifics if they are.
 
				 Tor exit node operators do attain a degree of
			
 
				 ``deniability'' for traffic that originates at that exit node.  For
			
 
				   example, it is likely in practice that HTTP requests from a Tor node's IP
			
 
				-  will be assumed to be from the Tor network. 
			
 
				+  will be assumed to be from the Tor network.
			
 
				   More significantly, people and organizations who use Tor for
			
 
				   anonymity depend on the
			
 
				   continued existence of the Tor network to do so; running a node helps to
			
@@ -625,20 +626,18 @@ abuse complaints. (See Section~\ref{subsec:tor-and-blacklists}.)
 
				 %[We can enforce incentives; see Section 6.1. We can rate-limit clients.
			
 
				 %  We can put "top bandwidth nodes lists" up a la seti@home.]
			
 
				 
			
 
				-
			
 
				 \subsection{Bandwidth and file-sharing}
			
 
				 \label{subsec:bandwidth-and-file-sharing}
			
 
				 %One potentially problematical area with deploying Tor has been our response
			
 
				 %to file-sharing applications.
			
 
				 Once users have configured their applications to work with Tor, the largest
			
 
				 remaining usability issue is performance.  Users begin to suffer
			
 
				-when websites ``feel slow''.
			
 
				+when websites ``feel slow.''
			
 
				 Clients currently try to build their connections through nodes that they
			
 
				 guess will have enough bandwidth.  But even if capacity is allocated
			
 
				 optimally, it seems unlikely that the current network architecture will have
			
 
				 enough capacity to provide every user with as much bandwidth as she would
			
 
				-receive if she weren't using Tor, unless far more nodes join the network
			
 
				-(see above).
			
 
				+receive if she weren't using Tor, unless far more nodes join the network.
			
 
				 
			
 
				 %Limited capacity does not destroy the network, however.  Instead, usage tends
			
 
				 %towards an equilibrium: when performance suffers, users who value performance
			
@@ -650,31 +649,32 @@ Much of Tor's recent bandwidth difficulties have come from file-sharing
 
				 applications.  These applications provide two challenges to
			
 
				 any anonymizing network: their intensive bandwidth requirement, and the
			
 
				 degree to which they are associated (correctly or not) with copyright
			
 
				-violation.
			
 
				+infringement.
			
 
				 
			
 
				 As noted above, high-bandwidth protocols can make the network unresponsive,
			
 
				-but tend to be somewhat self-correcting.  Issues of copyright violation,
			
 
				+but tend to be somewhat self-correcting as lack of bandwidth drives away
			
 
				+users who need it.  Issues of copyright violation,
			
 
				 however, are more interesting.  Typical exit node operators want to help
			
 
				 people achieve private and anonymous speech, not to help people (say) host
			
 
				 Vin Diesel movies for download; and typical ISPs would rather not
			
 
				-deal with customers who incur them the overhead of getting menacing letters
			
 
				+deal with customers who draw menacing letters
			
 
				 from the MPAA\@.  While it is quite likely that the operators are doing nothing
			
 
				 illegal, many ISPs have policies of dropping users who get repeated legal
			
 
				 threats regardless of the merits of those threats, and many operators would
			
 
				-prefer to avoid receiving legal threats even if those threats have little
			
 
				-merit.  So when the letters arrive, operators are likely to face
			
 
				+prefer to avoid receiving even meritless legal threats.
			
 
				+So when letters arrive, operators are likely to face
			
 
				 pressure to block file-sharing applications entirely, in order to avoid the
			
 
				 hassle.
			
 
				 
			
 
				-But blocking file-sharing would not necessarily be easy; most popular
			
 
				-protocols have evolved to run on a variety of non-standard ports in order to
			
 
				-get around other port-based bans.  Thus, exit node operators who wanted to
			
 
				+But blocking file-sharing would not necessarily be easy; many popular
			
 
				+protocols have evolved to run on a non-standard ports in order to
			
 
				+get around other port-based bans.  Thus, exit node operators who want to
			
 
				 block file-sharing would have to find some way to integrate Tor with a
			
 
				 protocol-aware exit filter.  This could be a technically expensive
			
 
				 undertaking, and one with poor prospects: it is unlikely that Tor exit nodes
			
 
				 would succeed where so many institutional firewalls have failed.  Another
			
 
				 possibility for sensitive operators is to run a restrictive node that
			
 
				-only permits exit connections to a restricted range of ports which are
			
 
				+only permits exit connections to a restricted range of ports that are
			
 
				 not frequently associated with file sharing.  There are increasingly few such
			
 
				 ports.
			
 
				 
			
@@ -703,7 +703,7 @@ file-sharing protocols that have separate control and data channels.
 
				 \subsection{Tor and blacklists}
			
 
				 \label{subsec:tor-and-blacklists}
			
 
				 
			
 
				-It was long expected that, alongside Tor's legitimate users, it would also
			
 
				+It was long expected that, alongside legitimate users, Tor would also
			
 
				 attract troublemakers who exploited Tor in order to abuse services on the
			
 
				 Internet with vandalism, rude mail, and so on.
			
 
				 %[XXX we're not talking bandwidth abuse here, we're talking vandalism,
			
@@ -713,7 +713,7 @@ to allow individual Tor nodes to block access to specific IP/port ranges.
 
				 This approach aims to make operators more willing to run Tor by allowing
			
 
				 them to prevent their nodes from being used for abusing particular
			
 
				 services.  For example, all Tor nodes currently block SMTP (port 25), in
			
 
				-order to avoid being used to send spam.
			
 
				+order to avoid being used for spam.
			
 
				 
			
 
				 This approach is useful, but is insufficient for two reasons.  First, since
			
 
				 it is not possible to force all nodes to block access to any given service,
			
@@ -722,18 +722,19 @@ blockable is important to being good netizens, we would like to encourage
 
				 services to allow anonymous access; services should not need to decide
			
 
				 between blocking legitimate anonymous use and allowing unlimited abuse.
			
 
				 
			
 
				-This is potentially a bigger problem than it may appear. 
			
 
				-On the one hand, if people want to refuse connections from your address to
			
 
				-their servers it would seem that they should be allowed.  But, it's not just
			
 
				-for himself that the individual node administrator is deciding when he decides
			
 
				-if he wants to post to Wikipedia from his Tor node address or allow
			
 
				+This is potentially a bigger problem than it may appear.
			
 
				+On the one hand, people should be allowed to refuse connections to
			
 
				+their services.  But, it's not just
			
 
				+for himself that a node administrator is deciding when he decides
			
 
				+whether he prefers to be able to post to Wikipedia from his Tor node address,
			
 
				+or to allow
			
 
				 people to read Wikipedia anonymously through his Tor node. (Wikipedia
			
 
				-has blocked all posting from all Tor nodes based on IP address.) If e.g.,
			
 
				-s/he comes through a campus or corporate NAT, then the decision must
			
 
				-be to have the entire population behind it able to have a Tor exit
			
 
				-node or to have write access to Wikipedia. This is a loss for both Tor
			
 
				-and Wikipedia. We don't want to compete for (or divvy up) the NAT
			
 
				-protected entities of the world.
			
 
				+has blocked all posting from all Tor nodes based on IP addresses.) If
			
 
				+the Tor node shares an address with a campus or corporate NAT,
			
 
				+then the decision can prevent the entire population from posting.
			
 
				+This is a loss for both Tor
			
 
				+and Wikipedia: we don't want to compete for (or divvy up) the
			
 
				+NAT-protected entities of the world.
			
 
				 
			
 
				 Worse, many IP blacklists are not terribly fine-grained.
			
 
				 No current IP blacklist, for example, allows a service provider to blacklist
			
@@ -812,35 +813,37 @@ be investigated as the network develops.
 
				 \label{subsec:tcp-vs-ip}
			
 
				 
			
 
				 Tor transports streams; it does not tunnel packets.
			
 
				-Developers of the old Freedom network~\cite{freedom21-security}
			
 
				-keep telling us that IP addresses should ``obviously'' be anonymized
			
 
				-at the IP layer. These issues need to be resolved before
			
 
				-Tor will be ready to carry arbitrary IP traffic:
			
 
				+It has often been suggested that like the old Freedom
			
 
				+network~\cite{freedom21-security}, Tor should
			
 
				+``obviously'' anonymize IP traffic
			
 
				+at the IP layer. Before this could be done, many issues need to be resolved:
			
 
				 
			
 
				 \begin{enumerate}
			
 
				 \setlength{\itemsep}{0mm}
			
 
				 \setlength{\parsep}{0mm}
			
 
				-\item \emph{IP packets reveal OS characteristics.} We still need to do
			
 
				-IP-level packet normalization, to stop things like IP fingerprinting
			
 
				-attacks. There likely exist libraries that can help with this.
			
 
				+\item \emph{IP packets reveal OS characteristics.}  We would still need to do
			
 
				+IP-level packet normalization, to stop things like TCP fingerprinting
			
 
				+attacks.%There likely exist libraries that can help with this.
			
 
				+This is unlikely to be a trivial task, given the diversity and complexity of
			
 
				+various TCP stacks.
			
 
				 \item \emph{Application-level streams still need scrubbing.} We still need
			
 
				 Tor to be easy to integrate with user-level application-specific proxies
			
 
				 such as Privoxy. So it's not just a matter of capturing packets and
			
 
				 anonymizing them at the IP layer.
			
 
				-\item \emph{Certain protocols will still leak information.} For example,
			
 
				-we must rewrite DNS requests so they are
			
 
				-delivered to an unlinkable DNS server; so we must
			
 
				-understand the protocols we are transporting.
			
 
				+\item \emph{Certain protocols will still leak information.} For example, we
			
 
				+must rewrite DNS requests so they are delivered to an unlinkable DNS server
			
 
				+rather than a DNS server at a user's ISP;thus, we must understand the
			
 
				+protocols we are transporting.
			
 
				 \item \emph{The crypto is unspecified.} First we need a block-level encryption
			
 
				 approach that can provide security despite
			
 
				 packet loss and out-of-order delivery. Freedom allegedly had one, but it was
			
 
				 never publicly specified.
			
 
				-Also, TLS over UDP is not implemented or even
			
 
				+Also, TLS over UDP is not yet implemented or
			
 
				 specified, though some early work has begun on that~\cite{dtls}.
			
 
				-\item \emph{We'll still need to tune network parameters}. Since the above
			
 
				+\item \emph{We'll still need to tune network parameters.} Since the above
			
 
				 encryption system will likely need sequence numbers (and maybe more) to do
			
 
				-replay detection, handle duplicate frames, etc., we will be reimplementing
			
 
				-a subset of TCP anyway.
			
 
				+replay detection, handle duplicate frames, and so on, we will be reimplementing
			
 
				+a subset of TCP anyway---a notoriously tricky path.
			
 
				 \item \emph{Exit policies for arbitrary IP packets mean building a secure
			
 
				 IDS\@.}  Our node operators tell us that exit policies are one of
			
 
				 the main reasons they're willing to run Tor.
			
@@ -854,9 +857,11 @@ we become able to transport IP packets. We also need to compactly
 
				 describe exit policies so clients can predict
			
 
				 which nodes will allow which packets to exit.
			
 
				 \item \emph{The Tor-internal name spaces would need to be redesigned.} We
			
 
				-support hidden service {\tt{.onion}} addresses, and other special addresses
			
 
				-like {\tt{.exit}} for the user to request a particular exit node,
			
 
				+support hidden service {\tt{.onion}} addresses (and other special addresses,
			
 
				+like {\tt{.exit}} which lets the user request a particular exit node),
			
 
				 by intercepting the addresses when they are passed to the Tor client.
			
 
				+Doing so at the IP level would require more complex interface between
			
 
				+Tor and local DNS resolver.
			
 
				 \end{enumerate}
			
 
				 
			
 
				 This list is discouragingly long, but being able to transport more
			
@@ -866,14 +871,14 @@ items are actual roadblocks and which are easier to resolve than we think.
 
				 To be fair, Tor's stream-based approach has run into
			
 
				 stumbling blocks as well. While Tor supports the SOCKS protocol,
			
 
				 which provides a standardized interface for generic TCP proxies, many
			
 
				-applications do not support SOCKS\@. For them we must
			
 
				+applications do not support SOCKS\@. For them we already need to
			
 
				 replace the networking system calls with SOCKS-aware
			
 
				 versions, or run a SOCKS tunnel locally, neither of which is
			
 
				 easy for the average user. %---even with good instructions.
			
 
				-Even when applications do use SOCKS, they often make DNS requests
			
 
				-themselves before handing the address to Tor, which advertises
			
 
				+Even when applications can use SOCKS, they often make DNS requests
			
 
				+themselves before handing an IP address to Tor, which advertises
			
 
				 where the user is about to connect.
			
 
				-We are still working on usable solutions.
			
 
				+We are still working on more usable solutions.
			
 
				 
			
 
				 %So in order to actually provide good anonymity, we need to make sure that
			
 
				 %users have a practical way to use Tor anonymously.  Possibilities include
			
@@ -893,14 +898,15 @@ require increasingly more data~\cite{e2e-traffic}. Can we improve Tor's
 
				 resistance without losing too much usability?
			
 
				 
			
 
				 We need to learn whether we can trade a small increase in latency
			
 
				-for a large anonymity increase, or if we'll end up trading a lot of
			
 
				-latency for a small security gain. A trade could be worthwhile even if we
			
 
				-can only protect certain use cases, such as infrequent short-duration
			
 
				+for a large anonymity increase, or if we'd end up trading a lot of
			
 
				+latency for only a minimal security gain. A trade-off might be worthwhile
			
 
				+even if we
			
 
				+could only protect certain use cases, such as infrequent short-duration
			
 
				 transactions. % To answer this question
			
 
				 We might adapt the techniques of~\cite{e2e-traffic} to a lower-latency mix
			
 
				 network, where the messages are batches of cells in temporally clustered
			
 
				 connections. These large fixed-size batches can also help resist volume
			
 
				-signature attacks~\cite{hintz-pet02}. We can also experiment with traffic
			
 
				+signature attacks~\cite{hintz-pet02}. We could also experiment with traffic
			
 
				 shaping to get a good balance of throughput and security.
			
 
				 %Other padding regimens might supplement the
			
 
				 %mid-latency option; however, we should continue the caution with which
			
@@ -908,7 +914,7 @@ shaping to get a good balance of throughput and security.
 
				 %performance or too many volunteers.
			
 
				 
			
 
				 We must keep usability in mind too. How much can latency increase
			
 
				-before we drive away our users? We're already being forced to increase
			
 
				+before we drive users away? We've already been forced to increase
			
 
				 latency slightly, as our growing network incorporates more DSL and
			
 
				 cable-modem nodes and more nodes in distant continents. Perhaps we can
			
 
				 harness this increased latency to improve anonymity rather than just
			
@@ -950,7 +956,8 @@ order). Using randomized path lengths may help some, since the attacker
 
				 will never be certain he has identified all nodes in the path, but as
			
 
				 long as the network remains small this attack will still be feasible.
			
 
				 
			
 
				-Helper nodes also aim to help Tor clients, because choosing entry and exit points
			
 
				+Helper nodes also aim to help Tor clients, because choosing entry and exit
			
 
				+points
			
 
				 randomly and changing them frequently allows an attacker who controls
			
 
				 even a few nodes to eventually link some of their destinations. The goal
			
 
				 is to take the risk once and for all about choosing a bad entry node,
			
@@ -1507,10 +1514,10 @@ minute burst in each 4 hour period.}
 
				 
			
 
				 \end{document}
			
 
				 
			
 
				-Making use of nodes with little bandwidth, or high latency/packet loss.
			
 
				+%Making use of nodes with little bandwidth, or high latency/packet loss.
			
 
				 
			
 
				-Running Tor nodes behind NATs, behind great-firewalls-of-China, etc.
			
 
				-Restricted routes. How to propagate to everybody the topology? BGP
			
 
				-style doesn't work because we don't want just *one* path. Point to
			
 
				-Geoff's stuff.
			
 
				+%Running Tor nodes behind NATs, behind great-firewalls-of-China, etc.
			
 
				+%Restricted routes. How to propagate to everybody the topology? BGP
			
 
				+%style doesn't work because we don't want just *one* path. Point to
			
 
				+%Geoff's stuff.