22 years ago · c826c5a95c
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -476,6 +476,7 @@ Tor's evolution.
 
															 \end{description}
														
 
															 \SubSection{Non-goals}
														
 
															+\label{subsec:non-goals}
														
 
															 In favoring conservative, deployable designs, we have explicitly deferred
														
 
															 a number of goals. Many of these goals are desirable in anonymity systems,
														
 
															 but we choose to defer them either because they are solved elsewhere,
														
@@ -1539,124 +1540,161 @@ Mention jurisdictional arbitrage.
 
															 Pull attacks and defenses into analysis as a subsection
														
 
															-\Section{Maintaining anonymity in Tor}
														
 
															+\Section{Open Questions in Low-latency Anonymity}
														
 
															 \label{sec:maintaining-anonymity}
														
 
															-\footnote{The first Onion Routing design \cite{or-ih96} protected against
														
 
															-this threat to some
														
 
															-extent by requiring users to hide network access behind an onion
														
 
															-router/firewall that was also forwarding traffic from other nodes.
														
 
															-However, it is desirable for users to
														
 
															-benefit from Onion Routing even when they can't run their own
														
 
															-onion routers.
														
 
															-%Such users, especially if they engage in certain unusual
														
 
															-%communication behaviors, may be identifiable \cite{wright03}.
														
 
															-%To
														
 
															-%complicate the possibility of such attacks Tor multiplexes many
														
 
															-%stream down each circuit, but still rotates the circuit
														
 
															-%periodically to avoid too much linkability from requests on a single
														
 
															-%circuit.
														
 
															-}
														
 
															-
														
 
															-I probably should have noted that this means loops will be on at least
														
 
															-five hop routes, which should be rare given the distribution.  I'm    
														
 
															-realizing that this is reproducing some of the thought that led to a  
														
 
															-default of five hops in the original onion routing design.  There were
														
 
															-some different assumptions, which I won't spell out now.  Note that   
														
 
															-enclave level protections really change these assumptions.  If most   
														
 
															-circuits are just two hops, then just a single link observer will be  
														
 
															-able to tell that two enclaves are communicating with high probability.
														
 
															-So, it would seem that enclaves should have a four node minimum circuit
														
 
															-to prevent trivial circuit insider identification of the whole circuit,
														
 
															-and three hop minimum for circuits from an enclave to some nonclave    
														
 
															-responder. But then... we would have to make everyone obey these rules 
														
 
															-or a node that through timing inferred it was on a four hop circuit    
														
 
															-would know that it was probably carrying enclave to enclave traffic.   
														
 
															-Which... if there were even a moderate number of bad nodes in the      
														
 
															-network would make it advantageous to break the connection to conduct  
														
 
															-a reformation intersection attack. Ahhh! I gotta stop thinking         
														
 
															-about this and work on the paper some before the family wakes up.  
														
 
															-On Sat, Oct 25, 2003 at 06:57:12AM -0400, Paul Syverson wrote:
														
 
															-> Which... if there were even a moderate number of bad nodes in the
														
 
															-> network would make it advantageous to break the connection to conduct
														
 
															-> a reformation intersection attack. Ahhh! I gotta stop thinking
														
 
															-> about this and work on the paper some before the family wakes up. 
														
 
															-This is the sort of issue that should go in the 'maintaining anonymity
														
 
															-with tor' section towards the end. :)
														
 
															-Email from between roger and me to beginning of section above. Fix and move.
														
 
															-
														
 
															-
														
 
															-[Put as much of this as a part of open issues as is possible.]
														
 
															-
														
 
															-[what's an anonymity set?]
														
 
															-
														
 
															-packet counting attacks work great against initiators. need to do some
														
 
															-level of obfuscation for that. standard link padding for passive link
														
 
															-observers. long-range padding for people who own the first hop. are
														
 
															-we just screwed against people who insert timing signatures into your
														
 
															-traffic?
														
 
															-
														
 
															-Even regardless of link padding from Alice to the cloud, there will be
														
 
															-times when Alice is simply not online. Link padding, at the edges or
														
 
															-inside the cloud, does not help for this.
														
 
															-
														
 
															-how often should we pull down directories? how often send updated
														
 
															-server descs?
														
 
															-
														
 
															-when we start up the client, should we build a circuit immediately,
														
 
															-or should the default be to build a circuit only on demand? should we
														
 
															-fetch a directory immediately?
														
 
															-
														
 
															-would we benefit from greater synchronization, to blend with the other
														
 
															-users? would the reduced speed hurt us more?
														
 
															-
														
 
															-does the "you can't see when i'm starting or ending a stream because
														
 
															-you can't tell what sort of relay cell it is" idea work, or is just
														
 
															-a distraction?
														
 
															-
														
 
															-does running a server actually get you better protection, because traffic
														
 
															-coming from your node could plausibly have come from elsewhere? how
														
 
															-much mixing do you need before this is actually plausible, or is it
														
 
															-immediately beneficial because many adversary can't see your node?
														
 
															-
														
 
															-do different exit policies at different exit nodes trash anonymity sets,
														
 
															-or not mess with them much?
														
 
															-
														
 
															-do we get better protection against a realistic adversary by having as
														
 
															-many nodes as possible, so he probably can't see the whole network,
														
 
															-or by having a small number of nodes that mix traffic well? is a
														
 
															-cascade topology a more realistic way to get defenses against traffic
														
 
															-confirmation? does the hydra (many inputs, few outputs) topology work
														
 
															-better? are we going to get a hydra anyway because most nodes will be
														
 
															+% There must be a better intro than this! -NM
														
 
															+In addition to the open problems discussed in
														
 
															+section~\ref{subsec:non-goals}, many other questions remain to be
														
 
															+solved by future research before we can be truly confident that we
														
 
															+have built a secure low-latency anonymity service.
														
 
															+
														
 
															+Many of these open issues are questions of balance.  For example,
														
 
															+how often should users rotate to fresh circuits?  Too-frequent
														
 
															+rotation is inefficient and expensive, but too-infrequent rotation
														
 
															+makes the user's traffic linkable.   Instead of opening a fresh
														
 
															+circuit; clients can also limit linkability exit from a middle point
														
 
															+of the circuit, or by truncating and re-extending the circuit, but
														
 
															+more analysis is needed to determine the proper trade-off.
														
 
															+[XXX mention predecessor attacks?]
														
 
															+
														
 
															+A similar question surrounds timing of directory operations:
														
 
															+how often should directories be updated?  With too-infrequent
														
 
															+updates clients receive an inaccurate picture of the network; with
														
 
															+too-frequent updates the directory servers are overloaded.
														
 
															+
														
 
															+%do different exit policies at different exit nodes trash anonymity sets,
														
 
															+%or not mess with them much?
														
 
															+%
														
 
															+%% Why would they?  By routing traffic to certain nodes preferentially?
														
 
															+
														
 
															+[XXX Choosing paths and path lengths: I'm not writing this bit till
														
 
															+  Arma's pathselection stuff is in. -NM]
														
 
															+
														
 
															+%%%% Roger said that he'd put a path selection paragraph into section
														
 
															+%%%% 4 that would replace this.
														
 
															+%
														
 
															+%I probably should have noted that this means loops will be on at least
														
 
															+%five hop routes, which should be rare given the distribution.  I'm    
														
 
															+%realizing that this is reproducing some of the thought that led to a  
														
 
															+%default of five hops in the original onion routing design.  There were
														
 
															+%some different assumptions, which I won't spell out now.  Note that   
														
 
															+%enclave level protections really change these assumptions.  If most   
														
 
															+%circuits are just two hops, then just a single link observer will be  
														
 
															+%able to tell that two enclaves are communicating with high probability.
														
 
															+%So, it would seem that enclaves should have a four node minimum circuit
														
 
															+%to prevent trivial circuit insider identification of the whole circuit,
														
 
															+%and three hop minimum for circuits from an enclave to some nonclave    
														
 
															+%responder. But then... we would have to make everyone obey these rules 
														
 
															+%or a node that through timing inferred it was on a four hop circuit    
														
 
															+%would know that it was probably carrying enclave to enclave traffic.   
														
 
															+%Which... if there were even a moderate number of bad nodes in the      
														
 
															+%network would make it advantageous to break the connection to conduct  
														
 
															+%a reformation intersection attack. Ahhh! I gotta stop thinking         
														
 
															+%about this and work on the paper some before the family wakes up.  
														
 
															+%On Sat, Oct 25, 2003 at 06:57:12AM -0400, Paul Syverson wrote:
														
 
															+%> Which... if there were even a moderate number of bad nodes in the
														
 
															+%> network would make it advantageous to break the connection to conduct
														
 
															+%> a reformation intersection attack. Ahhh! I gotta stop thinking
														
 
															+%> about this and work on the paper some before the family wakes up. 
														
 
															+%This is the sort of issue that should go in the 'maintaining anonymity
														
 
															+%with tor' section towards the end. :)
														
 
															+%Email from between roger and me to beginning of section above. Fix and move.
														
 
															+
														
 
															+Throughout this paper, we have assumed that end-to-end traffic
														
 
															+analysis cannot yet be defeated.  But even high-latency anonymity
														
 
															+systems can be vulnerable to end-to-end traffic analysis, if the
														
 
															+traffic volumes are high enough, and if users' habits are sufficiently
														
 
															+distinct \cite{disclosure,statistical-disclosure}.  \emph{What can be
														
 
															+  done to limit the effectiveness of these attacks against low-latency
														
 
															+  systems?}  Tor already makes some effort to conceal the starts and
														
 
															+ends of streams by wrapping all long-range control commands in
														
 
															+identical-looking relay cells, but more analysis is needed.  Link
														
 
															+padding could frustrate passive observer who count packets; long-range
														
 
															+padding could work against observers who own the first hop in a
														
 
															+circuit.  But more research needs to be done in order to find an
														
 
															+efficient and practical approach.  Volunteers prefer not to run
														
 
															+constant-bandwidth padding; but more sophisticated traffic shaping
														
 
															+approaches remain somewhat unanalyzed. [XXX is this so?] Recent work
														
 
															+on long-range padding \cite{long-range-padding} shows promise.  One
														
 
															+could also try to reduce correlation in packet timing by batching and
														
 
															+re-ordering packets, but it is unclear whether this could improve
														
 
															+anonymity without introducing so much latency as to render the
														
 
															+network unusable.
														
 
															+
														
 
															+Even if passive timing attacks were wholly solved, active timing
														
 
															+attacks would remain.  \emph{What can
														
 
															+  be done to address attackers who can introduce timing patterns into
														
 
															+  a user's traffic?}  [XXX mention likely approaches]
														
 
															+
														
 
															+%%% I think we cover this by framing the problem as ``Can we make 
														
 
															+%%% end-to-end characteristics of low-latency systems as good as
														
 
															+%%% those of high-latency systems?''  Eliminating long-term
														
 
															+%%% intersection is a hard problem.
														
 
															+%
														
 
															+%Even regardless of link padding from Alice to the cloud, there will be
														
 
															+%times when Alice is simply not online. Link padding, at the edges or
														
 
															+%inside the cloud, does not help for this.
														
 
															+
														
 
															+In order to scale to large numbers of users, and to prevent an
														
 
															+attacker from observing the whole network at once, it may be necessary
														
 
															+for low-latency anonymity systems to support far more servers than Tor
														
 
															+currently anticipates.  This introduces several issues.  First, if
														
 
															+approval by a centralized set of directory servers is no longer
														
 
															+feasible, what mechanism should be used to prevent adversaries from
														
 
															+signing up many spurious servers?  (Tarzan and Morphmix present
														
 
															+possible solutions.)  Second, if clients can no longer have a complete
														
 
															+picture of the network at all times how do we prevent attackers from
														
 
															+manipulating client knowledge?  Third, if there are to many servers
														
 
															+for every server to constantly communicate with every other, what kind
														
 
															+of non-clique topology should the network use?  [XXX cite george's
														
 
															+  restricted-routes paper] (Whatever topology we choose, we need some
														
 
															+way to keep attackers from manipulating their position within it.)
														
 
															+Fourth, since no centralized authority is tracking server reliability,
														
 
															+How do we prevent unreliable servers from rendering the network
														
 
															+unusable?  Fifth, do clients receive so much anonymity benefit from
														
 
															+running their own servers that we should expect them all to do so, or
														
 
															+do we need to find another incentive structure to motivate them?
														
 
															+
														
 
															+Alternatively, it may be the case that one of these problems proves
														
 
															+intractable, or that the drawbacks to many-server systems prove
														
 
															+greater than the benefits.  Nevertheless, we may still do well to
														
 
															+consider non-clique topologies.  A cascade topology may provide more
														
 
															+defense against traffic confirmation confirmation.
														
 
															+% Why would it?   Cite.  -NM
														
 
															+Does the hydra (many inputs, few outputs) topology work
														
 
															+better? Are we going to get a hydra anyway because most nodes will be
														
 
															 middleman nodes?
														
 
															-using a circuit many times is good because it's less cpu work.
														
 
															-  good because of predecessor attacks with path rebuilding.
														
 
															-  bad because predecessor attacks can be more likely to link you with a
														
 
															-    previous circuit since you're so verbose.
														
 
															-  bad because each thing you do on that circuit is linked to the other
														
 
															-    things you do on that circuit.
														
 
															-  how often to rotate?
														
 
															-  how to decide when to exit from middle?
														
 
															-  when to truncate and re-extend versus when to start new circuit?
														
 
															-
														
 
															-Because Tor runs over TCP, when one of the servers goes down it seems
														
 
															-that all the circuits (and thus streams) going over that server must
														
 
															-break. This reduces anonymity because everybody needs to reconnect
														
 
															-right then (does it? how much?) and because exit connections all break
														
 
															-at the same time, and it also reduces usability. It seems the problem
														
 
															-is even worse in a p2p environment, because so far such systems don't
														
 
															-really provide an incentive for nodes to stay connected when they're
														
 
															-done browsing, so we would expect a much higher churn rate than for
														
 
															-onion routing. Are there ways of allowing streams to survive the loss
														
 
															-of a node in the path?
														
 
															-
														
 
															-discuss topologies. Cite George's non-freeroutes paper.  Maybe this
														
 
															-graf goes elsewhere.
														
 
															-
														
 
															-discuss attracting users; incentives; usability.
														
 
															-
														
 
															-Choosing paths and path lengths.
														
 
															+%%% Do more with this paragraph once The TCP-over-TCP paragraph is
														
 
															+%%% more integrated into Related works.
														
 
															+%
														
 
															+As mentioned in section\ref{where-is-it-now}, Tor could improve its
														
 
															+robustness against node failure by buffering stream data at the
														
 
															+network's edges, and performing end-to-end acknowledgments.  The
														
 
															+efficacy of this approach remains to be tested, however, and there
														
 
															+may be more effective means for ensuring reliable connections in the
														
 
															+presence of unreliable nodes.
														
 
															+
														
 
															+%%% Keeping this original paragraph for a little while, since it 
														
 
															+%%% is not the same as what's written there now.
														
 
															+%
														
 
															+%Because Tor depends on TLS and TCP to provide a reliable transport,
														
 
															+%when one of the servers goes down, all the circuits (and thus streams)
														
 
															+%traveling over that server must break.  This reduces anonymity because
														
 
															+%everybody needs to reconnect right then (does it? how much?)  and
														
 
															+%because exit connections all break at the same time, and it also harms
														
 
															+%usability. It seems the problem is even worse in a peer-to-peer
														
 
															+%environment, because so far such systems don't really provide an
														
 
															+%incentive for nodes to stay connected when they're done browsing, so
														
 
															+%we would expect a much higher churn rate than for onion routing.
														
 
															+%there ways of allowing streams to survive the loss of a node in the
														
 
															+%path?
														
 
															+
														
 
															+% Roger or Paul suggested that we say something about incentives,
														
 
															+% too, but I think that's a better candidate for our future work
														
 
															+% section.  After all, we will doubtlessly learn very much about why
														
 
															+% people do or don't run and use Tor in the near future. -NM
														
 
															 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%