22 years ago · 272cf1b3fb
--- a/doc/tor-design.bib
+++ b/doc/tor-design.bib
@@ -233,6 +233,14 @@
 
				   publisher =    {Springer-Verlag, LNCS 2009},
			
 
				 }
			
 
				 
			
 
				+@InProceedings{sybil,
			
 
				+  author = "John Douceur",
			
 
				+  title = {{The Sybil Attack}},
			
 
				+  booktitle = "Proceedings of the 1st International Peer To Peer Systems Workshop (IPTPS 2002)",
			
 
				+  month = Mar,
			
 
				+  year = 2002,
			
 
				+}
			
 
				+
			
 
				 @InProceedings{trickle02,
			
 
				   author =       {Andrei Serjantov and Roger Dingledine and Paul Syverson},
			
 
				   title =        {From a Trickle to a Flood: Active Attacks on Several
			
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -1158,109 +1158,120 @@ inefficiencies of tunneling TCP over TCP \cite{tcp-over-tcp-is-bad}.
 
				 \SubSection{Exit policies and abuse}
			
 
				 \label{subsec:exitpolicies}
			
 
				 
			
 
				-Exit abuse is a serious barrier to wide-scale Tor deployment --- we
			
 
				-must block or limit attacks and other abuse that users can do through
			
 
				+Exit abuse is a serious barrier to wide-scale Tor deployment.  Not
			
 
				+only does anonymity present would-be vandals and abusers with an
			
 
				+opportunity to hide the origins of their activities---but also,
			
 
				+existing sanctions against abuse present an easy way for attackers to
			
 
				+harm the Tor network by implicating exit servers for their abuse.
			
 
				+Thus, must block or limit attacks and other abuse that travel through
			
 
				 the Tor network.
			
 
				 
			
 
				-Each onion router's \emph{exit policy} describes to which external
			
 
				-addresses and ports the router will permit stream connections. On one end
			
 
				-of the spectrum are \emph{open exit} nodes that will connect anywhere;
			
 
				-on the other end are \emph{middleman} nodes that only relay traffic to
			
 
				-other Tor nodes, and \emph{private exit} nodes that only connect locally
			
 
				-or to addresses internal to that node's organization. 
			
 
				-This private exit
			
 
				-node configuration is more secure for clients --- the adversary cannot
			
 
				-see plaintext traffic leaving the network (e.g. to a webserver), so he
			
 
				-is less sure of Alice's destination. More generally, nodes can require
			
 
				-a variety of forms of traffic authentication \cite{or-discex00}.
			
 
				-Most onnion routers will function as \emph{limited exits} that permit
			
 
				-connections to the world at large, but restrict access to certain abuse-prone
			
 
				-addresses and services.
			
 
				-
			
 
				-Tor offers more reliability than the high-latency fire-and-forget
			
 
				-anonymous email networks, because the sender opens a TCP stream
			
 
				-with the remote mail server and receives an explicit confirmation of
			
 
				-acceptance. But ironically, the private exit node model works poorly for
			
 
				-email, when Tor nodes are run on volunteer machines that also do other
			
 
				-things, because it's quite hard to configure mail transport agents so
			
 
				-normal users can send mail normally, but the Tor process can only deliver
			
 
				-mail locally. Further, most organizations have specific hosts that will
			
 
				-deliver mail on behalf of certain IP ranges; Tor operators must be aware
			
 
				-of these hosts and consider putting them in the Tor exit policy.
			
 
				-
			
 
				-The abuse issues on closed (e.g. military) networks are different
			
 
				-from the abuse on open networks like the Internet. While these IP-based
			
 
				-access controls are still commonplace on the Internet, on closed networks,
			
 
				-nearly all participants will be honest, and end-to-end authentication
			
 
				-can be assumed for anything important.
			
 
				-
			
 
				-Tor is harder than minion because tcp doesn't include an abuse
			
 
				-address. you could reach inside the http stream and change the agent
			
 
				-or something, but that's a specific case and probably won't help
			
 
				-much anyway.
			
 
				-And volunteer nodes don't resolve to anonymizer.mit.edu so it never
			
 
				-even occurs to people that it wasn't you.
			
 
				-
			
 
				-Preventing abuse of open exit nodes is an unsolved problem. Princeton's
			
 
				-CoDeeN project \cite{darkside} gives us a glimpse of what we're in for.
			
 
				-% This is more speculative than a description of our design. 
			
 
				-
			
 
				-but their solutions, which mainly involve rate limiting and blacklisting
			
 
				-nodes which do bad things, don't translate directly to Tor. Rate limiting
			
 
				-still works great, but Tor intentionally separates sender from recipient,
			
 
				-so it's hard to know which sender was the one who did the bad thing,
			
 
				-without just making the whole network wide open.
			
 
				-
			
 
				-even limiting most nodes to allow http, ssh, and aim to exit and reject
			
 
				-all other stuff is sketchy, because plenty of abuse can happen over
			
 
				-port 80. but it's a surprisingly good start, because it blocks most things,
			
 
				-and because people are more used to the concept of port 80 abuse not
			
 
				+Also, applications that commonly use IP-based authentication (such
			
 
				+institutional mail or web servers) can be fooled by the fact that
			
 
				+anonymous connections appear to originate at the exit OR.  Rather than
			
 
				+expose a private service, an administrator may prefer to prevent Tor
			
 
				+users from connecting to those services from a local OR.
			
 
				+
			
 
				+To mitigate abuse issues, in Tor, each onion router's \emph{exit
			
 
				+  policy} describes to which external addresses and ports the router
			
 
				+will permit stream connections. On one end of the spectrum are
			
 
				+\emph{open exit} nodes that will connect anywhere.  As a compromise,
			
 
				+most onion routers will function as \emph{restricted exits} that
			
 
				+permit connections to the world at large, but prevent access to
			
 
				+certain abuse-prone addresses and services.  on the other end are
			
 
				+\emph{middleman} nodes that only relay traffic to other Tor nodes, and
			
 
				+\emph{private exit} nodes that only connect to a local host or
			
 
				+network.  (Using a private exit (if one exists) is a more secure way
			
 
				+for a client to connect to a given host or network---an external
			
 
				+adversary cannot eavesdrop traffic between the private exit and the
			
 
				+final destination, and so is less sure of Alice's destination and
			
 
				+activities.)  is less sure of Alice's destination. More generally,
			
 
				+nodes can require a variety of forms of traffic authentication
			
 
				+\cite{or-discex00}.
			
 
				+
			
 
				+%Tor offers more reliability than the high-latency fire-and-forget
			
 
				+%anonymous email networks, because the sender opens a TCP stream
			
 
				+%with the remote mail server and receives an explicit confirmation of
			
 
				+%acceptance. But ironically, the private exit node model works poorly for
			
 
				+%email, when Tor nodes are run on volunteer machines that also do other
			
 
				+%things, because it's quite hard to configure mail transport agents so
			
 
				+%normal users can send mail normally, but the Tor process can only deliver
			
 
				+%mail locally. Further, most organizations have specific hosts that will
			
 
				+%deliver mail on behalf of certain IP ranges; Tor operators must be aware
			
 
				+%of these hosts and consider putting them in the Tor exit policy.
			
 
				+
			
 
				+%The abuse issues on closed (e.g. military) networks are different
			
 
				+%from the abuse on open networks like the Internet. While these IP-based
			
 
				+%access controls are still commonplace on the Internet, on closed networks,
			
 
				+%nearly all participants will be honest, and end-to-end authentication
			
 
				+%can be assumed for important traffic.
			
 
				+
			
 
				+Many administrators will use port restrictions to support only a
			
 
				+limited set of well-known services, such as HTTP, SSH, or AIM.
			
 
				+This is not a complete solution, since abuse opportunities for these
			
 
				+protocols are still well known.  Nonetheless, the benefits are real,
			
 
				+since administrators seem used to  the concept of port 80 abuse not
			
 
				 coming from the machine's owner.
			
 
				 
			
 
				-we could also run intrusion detection system (IDS) modules at each tor
			
 
				-node, to dynamically monitor traffic streams for attack signatures. it
			
 
				-can even react when it sees a signature by closing the stream. but IDS's
			
 
				-don't actually work most of the time, and besides, how do you write a
			
 
				-signature for "is sending a mean mail"?
			
 
				-
			
 
				-we should run a squid at each exit node, to provide comparable anonymity
			
 
				-to private exit nodes for cache hits, to speed everything up, and to
			
 
				-have a buffer for funny stuff coming out of port 80. we could similarly
			
 
				-have other exit proxies for other protocols, like mail, to check
			
 
				-delivered mail for being spam.
			
 
				-
			
 
				-[XXX Um, I'm uncomfortable with this for several reasons.
			
 
				-It's not good for keeping honest nodes honest about discarding
			
 
				-state after it's no longer needed. Granted it keeps an external
			
 
				-observer from noticing how often sites are visited, but it also
			
 
				-allows fishing expeditions. ``We noticed you went to this prohibited
			
 
				-site an hour ago. Kindly turn over your caches to the authorities.''
			
 
				-I previously elsewhere suggested bulk transfer proxies to carve
			
 
				-up big things so that they could be downloaded in less noticeable
			
 
				-pieces over several normal looking connections. We could suggest
			
 
				-similarly one or a handful of squid nodes that might serve up
			
 
				-some of the more sensitive but common material, especially if
			
 
				-the relevant sites didn't want to or couldn't run their own OR.
			
 
				-This would be better than having everyone run a squid which would
			
 
				-just help identify after the fact the different history of that
			
 
				-node's activity. All this kind of speculation needs to move to
			
 
				-future work section I guess. -PS]
			
 
				+A further solution may be to use proxies to clean traffic for certain
			
 
				+protocols as it leaves the network.  For example, much abusive HTTP
			
 
				+behavior (such as exploiting buffer overflows or well-known script
			
 
				+vulnerabilities) can be detected in a straightforward manner.
			
 
				+Similarly, one could run automatic spam filtering software (such as
			
 
				+SpamAssassin) on email exiting the OR network.  A generic
			
 
				+intrusion detection system (IDS) could be adapted to these purposes.
			
 
				+
			
 
				+ORs may also choose to rewrite exiting traffic in order to append
			
 
				+headers or other information to indicate that the traffic has passed
			
 
				+through an anonymity service.  This approach is commonly used, to some
			
 
				+success, by email-only anonymity systems.  When possible, ORs can also
			
 
				+run on servers with hostnames such as {\it anonymous}, to further
			
 
				+alert abuse targets to the nature of the anonymous traffic.
			
 
				+
			
 
				+%we should run a squid at each exit node, to provide comparable anonymity
			
 
				+%to private exit nodes for cache hits, to speed everything up, and to
			
 
				+%have a buffer for funny stuff coming out of port 80. we could similarly
			
 
				+%have other exit proxies for other protocols, like mail, to check
			
 
				+%delivered mail for being spam.
			
 
				+
			
 
				+%[XXX Um, I'm uncomfortable with this for several reasons.
			
 
				+%It's not good for keeping honest nodes honest about discarding
			
 
				+%state after it's no longer needed. Granted it keeps an external
			
 
				+%observer from noticing how often sites are visited, but it also
			
 
				+%allows fishing expeditions. ``We noticed you went to this prohibited
			
 
				+%site an hour ago. Kindly turn over your caches to the authorities.''
			
 
				+%I previously elsewhere suggested bulk transfer proxies to carve
			
 
				+%up big things so that they could be downloaded in less noticeable
			
 
				+%pieces over several normal looking connections. We could suggest
			
 
				+%similarly one or a handful of squid nodes that might serve up
			
 
				+%some of the more sensitive but common material, especially if
			
 
				+%the relevant sites didn't want to or couldn't run their own OR.
			
 
				+%This would be better than having everyone run a squid which would
			
 
				+%just help identify after the fact the different history of that
			
 
				+%node's activity. All this kind of speculation needs to move to
			
 
				+%future work section I guess. -PS]
			
 
				 
			
 
				 A mixture of open and restricted exit nodes will allow the most
			
 
				 flexibility for volunteers running servers. But while a large number
			
 
				 of middleman nodes is useful to provide a large and robust network,
			
 
				-a small number of exit nodes still simplifies traffic analysis because
			
 
				-there are fewer nodes the adversary needs to monitor, and also puts a
			
 
				-greater burden on the exit nodes.
			
 
				-The JAP cascade model is really nice because they only need one node to
			
 
				-take the heat per cascade. On the other hand, a hydra scheme could work
			
 
				-better (it's still hard to watch all the clients).
			
 
				-
			
 
				-Discuss importance of public perception, and how abuse affects it.
			
 
				-``Usability is a security parameter''.  ``Public Perception is also a
			
 
				-security parameter.''
			
 
				-
			
 
				-Discuss smear attacks.
			
 
				+having only a small number of exit nodes reduces the number of nodes
			
 
				+an adversary needs to monitor for traffic analysis, and places a
			
 
				+greater burden on the exit nodes.  This tension can be seen in the JAP
			
 
				+cascade model, wherein only one node in each cascade needs to handle
			
 
				+abuse complaints---but an adversary only needs to observe the entry
			
 
				+and exit of a cascade to perform traffic analysis on all that
			
 
				+cascade's users.  The Hydra model (many entries, few exits) presents a
			
 
				+different compromise: only a few exit nodes are needed, but an
			
 
				+adversary needs to work harder to watch all the clients.
			
 
				+
			
 
				+Finally, we note that exit abuse must not be dismissed as a peripheral
			
 
				+issue: when a system's public image suffers, it can reduce the number
			
 
				+and diversity of that system's users, and thereby reduce the anonymity
			
 
				+of the system itself.  Like usability, public perception is also a
			
 
				+security parameter.  Sadly, preventing abuse of open exit nodes is an
			
 
				+unsolved problem, and will probably remain an arms race for the
			
 
				+forseeable future.  The abuse problems faced by Princeton's CoDeeN
			
 
				+project \cite{darkside} give us a glimpse of likely issues.
			
 
				 
			
 
				 \SubSection{Directory Servers}
			
 
				 \label{subsec:dirservers}
			
@@ -1270,30 +1281,40 @@ First-generation Onion Routing designs \cite{or-jsac98,freedom2-arch} did
 
				 in-band network status updates: each router flooded a signed statement
			
 
				 to its neighbors, which propagated it onward. But anonymizing networks
			
 
				 have different security goals than typical link-state routing protocols.
			
 
				-For example, we worry more about delays (accidental or intentional)
			
 
				+For example, delays (accidental or intentional)
			
 
				 that can cause different parts of the network to have different pictures
			
 
				-of link-state and topology. We also worry about attacks to deceive a
			
 
				+of link-state and topology are not only inconvenient---they give
			
 
				+attackers an opportunity to exploit differences in client knowledge.
			
 
				+We also worry about attacks to deceive a
			
 
				 client about the router membership list, topology, or current network
			
 
				 state. Such \emph{partitioning attacks} on client knowledge help an
			
 
				 adversary with limited resources to efficiently deploy those resources
			
 
				 when attacking a target.
			
 
				 
			
 
				-Instead, Tor uses a small group of redundant directory servers to
			
 
				-track network topology and node state such as current keys and exit
			
 
				-policies. The directory servers are normal onion routers, but there are
			
 
				-only a few of them and they are more trusted. They listen on a separate
			
 
				-port as an HTTP server, both so participants can fetch current network
			
 
				-state and router lists (a \emph{directory}), and so other onion routers
			
 
				-can upload their router descriptors.
			
 
				-
			
 
				-[[mention that descriptors are signed with long-term keys; ORs publish
			
 
				-    regularly to dirservers; policies for generating directories; key
			
 
				-    rotation (link, onion, identity); Everybody already know directory
			
 
				-    keys; how to approve new nodes (advogato, sybil, captcha (RTT));
			
 
				-    policy for handling connections with unknown ORs; diff-based
			
 
				-    retrieval; diff-based consensus; separate liveness from descriptor
			
 
				-    list]]
			
 
				-
			
 
				+Instead of flooding, Tor uses a small group of redundant, well-known
			
 
				+directory servers to track changes in network topology and node state,
			
 
				+including keys and exit policies.  Directory servers are a small group
			
 
				+of well-known, mostly-trusted onion routers.  They listen on a
			
 
				+separate port as an HTTP server, so that participants can fetch
			
 
				+current network state and router lists (a \emph{directory}), and so
			
 
				+that other onion routers can upload their router descriptors.  Onion
			
 
				+routers now periodically publish signed statements of their state to
			
 
				+the directories only.  The directories themselves combine this state
			
 
				+information with their own views of network liveness, and generate a
			
 
				+signed description of the entire network state whenever its contents
			
 
				+have changed.  Client software is pre-loaded with a list of the
			
 
				+directory servers and their keys, and uses this information to
			
 
				+bootstrap each client's view of the network.
			
 
				+
			
 
				+When a directory receives a signed statement from and onion router, it
			
 
				+recognizes the onion router by its identity (signing) key.
			
 
				+Directories do not automatically advertise ORs that they do not
			
 
				+recognize.  (If they did, an adversary could take over the network by
			
 
				+creating many servers \cite{sybil}.)  Instead, new nodes must be
			
 
				+approved by the directory administrator before they are included.
			
 
				+Mechanisms for automated node approval are an area of active research,
			
 
				+and are discussed more in section~\ref{sec:maintaining-anonymity}.
			
 
				+  
			
 
				 Of course, a variety of attacks remain. An adversary who controls a
			
 
				 directory server can track certain clients by providing different
			
 
				 information --- perhaps by listing only nodes under its control
			
@@ -1308,17 +1329,18 @@ software is distributed with the signature public key of each directory
 
				 server, and directories must be signed by a threshold of these keys.
			
 
				 
			
 
				 The directory servers in Tor are modeled after those in Mixminion
			
 
				-\cite{minion-design}, but our situation is easier. Firstly, we make the
			
 
				-simplifying assumption that all participants agree on who the directory
			
 
				-servers are. Secondly, Mixminion needs to predict node behavior ---
			
 
				-that is, build a reputation system for guessing future performance of
			
 
				-nodes based on past performance, and then figure out a way to build
			
 
				-a threshold consensus of these predictions. Tor just needs to get a
			
 
				-threshold consensus of the current state of the network.
			
 
				+\cite{minion-design}, but our situation is easier. First, we make the
			
 
				+simplifying assumption that all participants agree on who the
			
 
				+directory servers are. Second, Mixminion needs to predict node
			
 
				+behavior, whereas Tor only needs a threshold consensus of the current
			
 
				+state of the network.
			
 
				+% Cite dir-spec or dir-agreement?
			
 
				 
			
 
				 The threshold consensus can be reached with standard Byzantine agreement
			
 
				-techniques \cite{castro-liskov}.
			
 
				+techniques \cite{castro-liskov}.  
			
 
				 % Should I just stop the section here? Is the rest crap? -RD
			
 
				+% IMO this graf makes me uncomfortable.  It picks a fight with the
			
 
				+% Byzantine people for no good reason. -NM
			
 
				 But this library, while more efficient than previous Byzantine agreement
			
 
				 systems, is still complex and heavyweight for our purposes: we only need
			
 
				 to compute a single algorithm, and we do not require strict in-order
			
@@ -1361,15 +1383,18 @@ their existence to any central point.
 
				 % the dirservers but discard all other traffic.
			
 
				 % in some sense they're like reputation servers in \cite{mix-acc} -RD
			
 
				 
			
 
				+
			
 
				 \Section{Rendezvous points: location privacy}
			
 
				 \label{sec:rendezvous}
			
 
				 
			
 
				-Rendezvous points are a building block for \emph{location-hidden services}
			
 
				-(aka responder anonymity) in the Tor network. Location-hidden services
			
 
				-means Bob can offer a TCP service, such as a webserver, without revealing
			
 
				-the IP of that service. One motivation for location privacy is to provide
			
 
				-protection against DDoS attacks: attackers are forced to attack the
			
 
				-onion routing network as a whole rather than just Bob's IP.
			
 
				+Rendezvous points are a building block for \emph{location-hidden
			
 
				+services} (also known as ``responder anonymity'') in the Tor
			
 
				+network.  Location-hidden services allow a server Bob to a TCP
			
 
				+service, such as a webserver, without revealing the IP of his service.
			
 
				+Besides allowing Bob to provided services anonymously, location
			
 
				+privacy also seeks to provide some protection against DDoS attacks:
			
 
				+attackers are forced to attack the onion routing network as a whole
			
 
				+rather than just Bob's IP.
			
 
				 
			
 
				 \subsection{Goals for rendezvous points}
			
 
				 \label{subsec:rendezvous-goals}
			
@@ -1392,52 +1417,58 @@ properties in our design for location-hidden servers:
 
				 \end{tightlist}
			
 
				 
			
 
				 \subsection{Rendezvous design}
			
 
				-We provide location-hiding for Bob by allowing him to advertise several onion
			
 
				-routers (his \emph{Introduction Points}) as his public location.  (He may do
			
 
				-this on any robust efficient distributed key-value lookup system with
			
 
				-authenticated updates, such as CFS \cite{cfs:sosp01}.)
			
 
				-Alice, the client, chooses a node for her \emph{Meeting
			
 
				-Point}. She connects to one of Bob's introduction points, informs him
			
 
				-about her rendezvous point, and then waits for him to connect to the
			
 
				-rendezvous
			
 
				-point. This extra level of indirection means Bob's introduction points
			
 
				-don't open themselves up to abuse by serving files directly, eg if Bob
			
 
				-chooses a node in France to serve material distateful to the French,
			
 
				-%
			
 
				-% We need a more legitimate-sounding reason here.
			
 
				-%
			
 
				-or if Bob's service tends to get DDoS'ed by script kiddies.
			
 
				+We provide location-hiding for Bob by allowing him to advertise
			
 
				+several onion routers (his \emph{Introduction Points}) as his public
			
 
				+location.  (He may do this on any robust efficient distributed
			
 
				+key-value lookup system with authenticated updates, such as CFS
			
 
				+\cite{cfs:sosp01}\footnote{
			
 
				+Each onion router could run a node in this lookup
			
 
				+system; also note that as a stopgap measure, we can start by running a
			
 
				+simple lookup system on the directory servers.})  
			
 
				+Alice, the client, chooses a node for her
			
 
				+\emph{Meeting Point}. She connects to one of Bob's introduction
			
 
				+points, informs him about her rendezvous point, and then waits for him
			
 
				+to connect to the rendezvous point. This extra level of indirection
			
 
				+helps Bob's introduction points avoid problems associated with serving
			
 
				+unpopular files directly, as could occur, for example, if Bob chooses
			
 
				+an introduction point in Texas to serve anti-ranching propaganda,
			
 
				+or if Bob's service tends to get DDoS'ed by network vandals.
			
 
				 The extra level of indirection also allows Bob to respond to some requests
			
 
				 and ignore others.
			
 
				 
			
 
				-We provide the necessary glue so that Alice can view webpages from Bob's
			
 
				-location-hidden webserver with minimal invasive changes. Both Alice and
			
 
				-Bob must run local onion proxies.
			
 
				-
			
 
				-The steps of a rendezvous:
			
 
				+The steps of a rendezvous as follows.  These steps are performed on
			
 
				+behalf of Alice and Bob by their local onion proxies, which they both
			
 
				+must run; application integration is described more fully below.
			
 
				 \begin{tightlist}
			
 
				-\item Bob chooses some Introduction Points, and advertises them on a
			
 
				-      Distributed Hash Table (DHT).
			
 
				-\item Bob establishes onion routing connections to each of his
			
 
				+\item Bob chooses some introduction ppoints, and advertises them via
			
 
				+      CFS (or some other distributed key-value publication system).
			
 
				+\item Bob establishes a Tor virtual circuit to each of his
			
 
				       Introduction Points, and waits.
			
 
				 \item Alice learns about Bob's service out of band (perhaps Bob told her,
			
 
				       or she found it on a website). She looks up the details of Bob's
			
 
				-      service from the DHT.
			
 
				-\item Alice chooses and establishes a Rendezvous Point (RP) for this
			
 
				-      transaction.
			
 
				-\item Alice goes to one of Bob's Introduction Points, and gives it a blob
			
 
				-      (encrypted for Bob) which tells him about herself, the RP
			
 
				-      she chose, and the first half of an ephemeral key handshake. The
			
 
				-      Introduction Point sends the blob to Bob.
			
 
				-\item Bob chooses whether to ignore the blob, or to onion route to RP.
			
 
				-      Let's assume the latter.
			
 
				-\item RP plugs together Alice and Bob. Note that RP can't recognize Alice,
			
 
				+      service from CFS.
			
 
				+\item Alice chooses an OR to serve as a Rendezvous Point (RP) for this
			
 
				+      transaction. She establishes a virtual circuit to her RP, and
			
 
				+      tells it to wait for connections. [XXX how?]
			
 
				+\item Alice opens an anonymous stream to one of Bob's Introduction
			
 
				+      Points, and gives it message (encrypted for Bob) which tells him
			
 
				+      about herself, her chosen RP, and the first half of an ephemeral
			
 
				+      key handshake. The Introduction Point sends the message to Bob.
			
 
				+\item Bob may decide to ignore Alice's request.  [XXX Based on what?]
			
 
				+      Otherwise, he creates a new virtual circuit to Alice's RP, and
			
 
				+      authenticates himself. [XXX how?]
			
 
				+\item If the authentication is successful, the RP connects Alice's
			
 
				+      virtual circuit to Bob's. Note that RP can't recognize Alice,
			
 
				       Bob, or the data they transmit (they share a session key).
			
 
				-\item Alice sends a Begin cell along the circuit. It arrives at Bob's
			
 
				+\item Alice now sends a Begin cell along the circuit. It arrives at Bob's
			
 
				       onion proxy. Bob's onion proxy connects to Bob's webserver.
			
 
				-\item Data goes back and forth as usual.
			
 
				+\item An anonymous stream has been established, and Alice and Bob
			
 
				+      communicate as normal.
			
 
				 \end{tightlist}
			
 
				 
			
 
				+[XXX We need to modify the above to refer people down to these next
			
 
				+  paragraphs. -NM]
			
 
				+
			
 
				 When establishing an introduction point, Bob provides the onion router
			
 
				 with a public ``introduction'' key.  The hash of this public key
			
 
				 identifies a unique service, and (since Bob is required to sign his
			
@@ -1445,7 +1476,7 @@ messages) prevents anybody else from usurping Bob's introduction point
 
				 in the future. Bob uses the same public key when establishing the other
			
 
				 introduction points for that service.
			
 
				 
			
 
				-The blob that Alice gives the introduction point includes a hash of Bob's
			
 
				+The message that Alice gives the introduction point includes a hash of Bob's
			
 
				 public key to identify the service, an optional initial authentication
			
 
				 token (the introduction point can do prescreening, eg to block replays),
			
 
				 and (encrypted to Bob's public key) the location of the rendezvous point,
			
@@ -1458,55 +1489,55 @@ other half of the DH key exchange.
 
				 The authentication tokens can be used to provide selective access to users
			
 
				 proportional to how important it is that they main uninterrupted access
			
 
				 to the service. During normal situations, Bob's service might simply be
			
 
				-offered directly from mirrors; Bob also gives out authentication cookies
			
 
				-to special users. When those mirrors are knocked down by DDoS attacks,
			
 
				-those special users can switch to accessing Bob's service via the Tor
			
 
				+offered directly from mirrors; Bob can also give out authentication cookies
			
 
				+to high-priority users. If those mirrors are knocked down by DDoS attacks,
			
 
				+those users can switch to accessing Bob's service via the Tor
			
 
				 rendezvous system.
			
 
				 
			
 
				 \SubSection{Integration with user applications}
			
 
				 
			
 
				 For each service Bob offers, he configures his local onion proxy to know
			
 
				-the local IP and port of the server, a strategy for authorizating Alices,
			
 
				-and a public key. (Each onion router could run a node in this lookup
			
 
				-system; also note that as a stopgap measure, we can just run a simple
			
 
				-lookup system on the directory servers.)  Bob publishes into the DHT
			
 
				-(indexed by the hash of the public key) the public key, an expiration
			
 
				-time (``not valid after''), and the current introduction points for that
			
 
				-service. Note that Bob's webserver is unmodified, and doesn't even know
			
 
				+the local IP and port of the server, a strategy for authorizing Alices,
			
 
				+and a public key.   Bob publishes
			
 
				+the public key, an expiration
			
 
				+time (``not valid after''), and the current introduction points for
			
 
				+his
			
 
				+service into CFS, all indexed by the hash of the public key
			
 
				+Note that Bob's webserver is unmodified, and doesn't even know
			
 
				 that it's hidden behind the Tor network.
			
 
				 
			
 
				-As far as Alice's experience goes, we require that her client interface
			
 
				-remain a SOCKS proxy, and we require that she shouldn't have to modify
			
 
				-her applications. Thus we encode all of the necessary information into
			
 
				-the hostname (more correctly, fully qualified domain name) that Alice
			
 
				-uses, eg when clicking on a url in her browser. Location-hidden services
			
 
				-use the special top level domain called `.onion': thus hostnames take the
			
 
				-form x.y.onion where x encodes the hash of PK, and y is the authentication
			
 
				-cookie. Alice's onion proxy examines hostnames and recognizes when they're
			
 
				-destined for a hidden server. If so, it decodes the PK and starts the
			
 
				-rendezvous as described in the table above.
			
 
				+Because Alice's applications must work unchanged, her client interface
			
 
				+remains a SOCKS proxy.  Thus we must encode all of the necessary
			
 
				+information into the fully qualified domain name Alice uses when
			
 
				+establishing her connections.  Location-hidden services use a virtual
			
 
				+top level domain called `.onion': thus hostnames take the form
			
 
				+x.y.onion where x encodes the hash of PK, and y is the authentication
			
 
				+cookie. Alice's onion proxy examines hostnames and recognizes when
			
 
				+they're destined for a hidden server. If so, it decodes the PK and
			
 
				+starts the rendezvous as described in the table above.
			
 
				 
			
 
				 \subsection{Previous rendezvous work}
			
 
				 
			
 
				 Ian Goldberg developed a similar notion of rendezvous points for
			
 
				-low-latency anonymity systems \cite{ian-thesis}. His ``service tag''
			
 
				-is the same concept as our ``hash of service's public key''. We make it
			
 
				-a hash of the public key so it can be self-authenticating, and so the
			
 
				-client can recognize the same service with confidence later on. His
			
 
				-design differs from ours in the following ways though. Firstly, Ian
			
 
				-suggests that the client should manually hunt down a current location of
			
 
				-the service via Gnutella; whereas our use of the DHT makes lookup faster,
			
 
				-more robust, and transparent to the user. Secondly, in Tor the client
			
 
				-and server can share ephemeral DH keys, so at no point in the path is
			
 
				-the plaintext
			
 
				-exposed. Thirdly, our design is much more practical for deployment in a
			
 
				-volunteer network, in terms of getting volunteers to offer introduction
			
 
				-and rendezvous point services. The introduction points do not output any
			
 
				-bytes to the clients, and the rendezvous points don't know the client,
			
 
				-the server, or the stuff being transmitted. The indirection scheme
			
 
				-is also designed with authentication/authorization in mind -- if the
			
 
				-client doesn't include the right cookie with its request for service,
			
 
				-the server doesn't even acknowledge its existence.
			
 
				+low-latency anonymity systems \cite{ian-thesis}. His ``service tags''
			
 
				+play the same role in his design as the hashes of services' public
			
 
				+keys play in ours.  We use public key hashes so that they can be
			
 
				+self-authenticating, and so the client can recognize the same service
			
 
				+with confidence later on. His design also differs from ours in the
			
 
				+following ways: First, Goldberg suggests that the client should
			
 
				+manually hunt down a current location of the service via Gnutella;
			
 
				+whereas our use of the DHT makes lookup faster, more robust, and
			
 
				+transparent to the user. Second, in Tor the client and server
			
 
				+negotiate ephemeral keys via Diffie-Hellman, so at no point in the
			
 
				+path is the plaintext exposed. Third, our design tries to minimize the
			
 
				+exposure associated with running the service, so as to make volunteers
			
 
				+more willing to offer introduction and rendezvous point services.
			
 
				+Tor's introduction points do not output any bytes to the clients, and
			
 
				+the rendezvous points don't know the client, the server, or the data
			
 
				+being transmitted. The indirection scheme is also designed to include
			
 
				+authentication/authorization---if the client doesn't include the right
			
 
				+cookie with its request for service, the server need not even
			
 
				+acknowledge its existence.
			
 
				 
			
 
				 \Section{Analysis}
			
 
				 \label{sec:analysis}
			
@@ -1544,6 +1575,9 @@ Pull attacks and defenses into analysis as a subsection
 
				 
			
 
				 \Section{Open Questions in Low-latency Anonymity}
			
 
				 \label{sec:maintaining-anonymity}
			
 
				+ 
			
 
				+
			
 
				+
			
 
				 
			
 
				 % There must be a better intro than this! -NM
			
 
				 In addition to the open problems discussed in
			
@@ -1644,19 +1678,24 @@ for low-latency anonymity systems to support far more servers than Tor
 
				 currently anticipates.  This introduces several issues.  First, if
			
 
				 approval by a centralized set of directory servers is no longer
			
 
				 feasible, what mechanism should be used to prevent adversaries from
			
 
				-signing up many spurious servers?  (Tarzan and Morphmix present
			
 
				-possible solutions.)  Second, if clients can no longer have a complete
			
 
				-picture of the network at all times how do we prevent attackers from
			
 
				-manipulating client knowledge?  Third, if there are to many servers
			
 
				+signing up many spurious servers? 
			
 
				+Second, if clients can no longer have a complete
			
 
				+picture of the network at all times, how can should they perform
			
 
				+discovery while preventing attackers from manipulating or exploiting
			
 
				+gaps in client knowledge?  Third, if there are to many servers
			
 
				 for every server to constantly communicate with every other, what kind
			
 
				-of non-clique topology should the network use?  [XXX cite george's
			
 
				-  restricted-routes paper] (Whatever topology we choose, we need some
			
 
				-way to keep attackers from manipulating their position within it.)
			
 
				+of non-clique topology should the network use?   Restricted-route
			
 
				+topologies promise comparable anonymity with better scalability
			
 
				+\cite{danezis-pets03}, but whatever topology we choose, we need some
			
 
				+way to keep attackers from manipulating their position within it.
			
 
				 Fourth, since no centralized authority is tracking server reliability,
			
 
				 How do we prevent unreliable servers from rendering the network
			
 
				 unusable?  Fifth, do clients receive so much anonymity benefit from
			
 
				 running their own servers that we should expect them all to do so, or
			
 
				 do we need to find another incentive structure to motivate them?
			
 
				+(Tarzan and Morphmix present possible solutions.)
			
 
				+
			
 
				+[[ XXX how to approve new nodes (advogato, sybil, captcha (RTT));]
			
 
				 
			
 
				 Alternatively, it may be the case that one of these problems proves
			
 
				 intractable, or that the drawbacks to many-server systems prove
			
@@ -1791,7 +1830,7 @@ keys)
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 
			
 
				-\Section{Future Directions and Open Problems}
			
 
				+\Section{Future Directions}
			
 
				 \label{sec:conclusion}
			
 
				 
			
 
				 % Mention that we need to do TCP over tor for reliability.
			
@@ -1801,39 +1840,53 @@ a unified deployable system. But there are still several attacks that
 
				 work quite well, as well as a number of sustainability and run-time
			
 
				 issues remaining to be ironed out. In particular:
			
 
				 
			
 
				+% Many of these (Scalability, cover traffic) are duplicates from open problems.
			
 
				+%
			
 
				 \begin{itemize}
			
 
				-\item \emph{Scalability:} Since Tor's emphasis currently is on simplicity
			
 
				-of design and deployment, the current design won't easily handle more
			
 
				-than a few hundred servers, because of its clique topology. Restricted
			
 
				-route topologies \cite{danezis-pets03} promise comparable anonymity
			
 
				-with much better scaling properties, but we must solve problems like
			
 
				-how to randomly form the network without introducing net attacks.
			
 
				-% [cascades are a restricted route topology too. we must mention
			
 
				-% earlier why we're not satisfied with the cascade approach.]-RD
			
 
				-% [We do. At least 
			
 
				+\item \emph{Scalability:} Tor's emphasis on design simplicity and
			
 
				+  deployability has led us to adopt a clique topology, a
			
 
				+  semi-centralized model for directories and trusts, and a
			
 
				+  full-network-visibility model for client knowledge.  None of these
			
 
				+  properties will scale to more than a few hundred servers, at most.
			
 
				+  Promising approaches to better scalability exist (see
			
 
				+  section~\ref{sec:maintaining-anonymity}), but more deployment
			
 
				+  experience would be helpful in learning the relative importance of
			
 
				+  these bottlenecks.
			
 
				 \item \emph{Cover traffic:} Currently we avoid cover traffic because
			
 
				-it introduces clear performance and bandwidth costs, but and its
			
 
				-security properties are not well understood. With more research
			
 
				-\cite{SS03,defensive-dropping}, the price/value ratio may change, both for
			
 
				-link-level cover traffic and also long-range cover traffic. In particular,
			
 
				-we expect restricted route topologies to reduce the cost of cover traffic
			
 
				-because there are fewer links to cover.
			
 
				+  of its clear costs in performance and bandwidth, and because its
			
 
				+  security benefits have not well understood. With more research
			
 
				+  \cite{SS03,defensive-dropping}, the price/value ratio may change,
			
 
				+  both for link-level cover traffic and also long-range cover traffic.
			
 
				 \item \emph{Better directory distribution:} Even with the threshold
			
 
				-directory agreement algorithm described in \ref{subsec:dirservers},
			
 
				-the directory servers are still trust bottlenecks. We must find more
			
 
				-decentralized yet practical ways to distribute up-to-date snapshots of
			
 
				-network status without introducing new attacks.
			
 
				+  directory agreement algorithm described in \ref{subsec:dirservers},
			
 
				+  the directory servers are still trust bottlenecks. We must find more
			
 
				+  decentralized yet practical ways to distribute up-to-date snapshots of
			
 
				+  network status without introducing new attacks.  Also, directory
			
 
				+  retrieval presents a scaling problem, since clients currently
			
 
				+  download a description of the entire network state every 15
			
 
				+  minutes.  As the state grows larger and clients more numerous, we
			
 
				+  may need to move to a solution in which clients only receive
			
 
				+  incremental updates to directory state, or where directories are
			
 
				+  cached at the ORs to avoid high loads on the directory servers.
			
 
				 \item \emph{Implementing location-hidden servers:} While
			
 
				-Section~\ref{sec:rendezvous} provides a design for rendezvous points and
			
 
				-location-hidden servers, this feature has not yet been implemented.
			
 
				-We will likely encounter additional issues, both in terms of usability
			
 
				-and anonymity, that must be resolved.
			
 
				+  Section~\ref{sec:rendezvous} describes a design for rendezvous
			
 
				+  points and location-hidden servers, these feature has not yet been
			
 
				+  implemented.  While doing so, will likely encounter additional
			
 
				+  issues, both in terms of usability and anonymity, that must be
			
 
				+  resolved.
			
 
				+\item \emph{Further specification review:} Although we have a public,
			
 
				+  byte-level specification for the Tor protocols, this protocol has
			
 
				+  not received extensive external review.  We hope that as Tor
			
 
				+  becomes more widely deployed, more people will become interested in
			
 
				+  examining our specification.
			
 
				 \item \emph{Wider-scale deployment:} The original goal of Tor was to
			
 
				-gain experience in deploying an anonymizing overlay network, and learn
			
 
				-from having actual users. We are now at the point where we can start
			
 
				-deploying a wider network. We will see what happens!
			
 
				-% ok, so that's hokey. fix it. -RD
			
 
				-\item \emph{Further specification review:} Foo.
			
 
				+  gain experience in deploying an anonymizing overlay network, and
			
 
				+  learn from having actual users.  We are now at the point in design
			
 
				+  and development where we can start deploying a wider network.  Once
			
 
				+  we have are ready for actual users, we will doubtlessly be better
			
 
				+  able to evaluate some of our design decisions, including our
			
 
				+  robustness/latency tradeoffs, our abuse-prevention mechanisms, and
			
 
				+  our overall usability.
			
 
				 \end{itemize}
			
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
@@ -1865,6 +1918,8 @@ deploying a wider network. We will see what happens!
 
				 %     'Onion Routing design', 'onion router' [note capitalization]
			
 
				 %     'SOCKS'
			
 
				 %     Try not to use \cite as a noun.  
			
 
				+%     'Authorizating' sounds great, but it isn't a word.
			
 
				+%     'First, second, third', not 'Firstly, secondly, thridly'.
			
 
				 %
			
 
				 %     'Substitute ``Damn'' every time you're inclined to write ``very;'' your
			
 
				 %     editor will delete it and the writing will be just as it should be.'