22 jaren geleden · c826c5a95c
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -476,6 +476,7 @@ Tor's evolution.
 
				 \end{description}
			
 
				 
			
 
				 \SubSection{Non-goals}
			
 
				+\label{subsec:non-goals}
			
 
				 In favoring conservative, deployable designs, we have explicitly deferred
			
 
				 a number of goals. Many of these goals are desirable in anonymity systems,
			
 
				 but we choose to defer them either because they are solved elsewhere,
			
@@ -1539,124 +1540,161 @@ Mention jurisdictional arbitrage.
 
				 
			
 
				 Pull attacks and defenses into analysis as a subsection
			
 
				 
			
 
				-\Section{Maintaining anonymity in Tor}
			
 
				+\Section{Open Questions in Low-latency Anonymity}
			
 
				 \label{sec:maintaining-anonymity}
			
 
				 
			
 
				-\footnote{The first Onion Routing design \cite{or-ih96} protected against
			
 
				-this threat to some
			
 
				-extent by requiring users to hide network access behind an onion
			
 
				-router/firewall that was also forwarding traffic from other nodes.
			
 
				-However, it is desirable for users to
			
 
				-benefit from Onion Routing even when they can't run their own
			
 
				-onion routers.
			
 
				-%Such users, especially if they engage in certain unusual
			
 
				-%communication behaviors, may be identifiable \cite{wright03}.
			
 
				-%To
			
 
				-%complicate the possibility of such attacks Tor multiplexes many
			
 
				-%stream down each circuit, but still rotates the circuit
			
 
				-%periodically to avoid too much linkability from requests on a single
			
 
				-%circuit.
			
 
				-}
			
 
				-
			
 
				-I probably should have noted that this means loops will be on at least
			
 
				-five hop routes, which should be rare given the distribution.  I'm    
			
 
				-realizing that this is reproducing some of the thought that led to a  
			
 
				-default of five hops in the original onion routing design.  There were
			
 
				-some different assumptions, which I won't spell out now.  Note that   
			
 
				-enclave level protections really change these assumptions.  If most   
			
 
				-circuits are just two hops, then just a single link observer will be  
			
 
				-able to tell that two enclaves are communicating with high probability.
			
 
				-So, it would seem that enclaves should have a four node minimum circuit
			
 
				-to prevent trivial circuit insider identification of the whole circuit,
			
 
				-and three hop minimum for circuits from an enclave to some nonclave    
			
 
				-responder. But then... we would have to make everyone obey these rules 
			
 
				-or a node that through timing inferred it was on a four hop circuit    
			
 
				-would know that it was probably carrying enclave to enclave traffic.   
			
 
				-Which... if there were even a moderate number of bad nodes in the      
			
 
				-network would make it advantageous to break the connection to conduct  
			
 
				-a reformation intersection attack. Ahhh! I gotta stop thinking         
			
 
				-about this and work on the paper some before the family wakes up.  
			
 
				-On Sat, Oct 25, 2003 at 06:57:12AM -0400, Paul Syverson wrote:
			
 
				-> Which... if there were even a moderate number of bad nodes in the
			
 
				-> network would make it advantageous to break the connection to conduct
			
 
				-> a reformation intersection attack. Ahhh! I gotta stop thinking
			
 
				-> about this and work on the paper some before the family wakes up. 
			
 
				-This is the sort of issue that should go in the 'maintaining anonymity
			
 
				-with tor' section towards the end. :)
			
 
				-Email from between roger and me to beginning of section above. Fix and move.
			
 
				-
			
 
				-
			
 
				-[Put as much of this as a part of open issues as is possible.]
			
 
				-
			
 
				-[what's an anonymity set?]
			
 
				-
			
 
				-packet counting attacks work great against initiators. need to do some
			
 
				-level of obfuscation for that. standard link padding for passive link
			
 
				-observers. long-range padding for people who own the first hop. are
			
 
				-we just screwed against people who insert timing signatures into your
			
 
				-traffic?
			
 
				-
			
 
				-Even regardless of link padding from Alice to the cloud, there will be
			
 
				-times when Alice is simply not online. Link padding, at the edges or
			
 
				-inside the cloud, does not help for this.
			
 
				-
			
 
				-how often should we pull down directories? how often send updated
			
 
				-server descs?
			
 
				-
			
 
				-when we start up the client, should we build a circuit immediately,
			
 
				-or should the default be to build a circuit only on demand? should we
			
 
				-fetch a directory immediately?
			
 
				-
			
 
				-would we benefit from greater synchronization, to blend with the other
			
 
				-users? would the reduced speed hurt us more?
			
 
				-
			
 
				-does the "you can't see when i'm starting or ending a stream because
			
 
				-you can't tell what sort of relay cell it is" idea work, or is just
			
 
				-a distraction?
			
 
				-
			
 
				-does running a server actually get you better protection, because traffic
			
 
				-coming from your node could plausibly have come from elsewhere? how
			
 
				-much mixing do you need before this is actually plausible, or is it
			
 
				-immediately beneficial because many adversary can't see your node?
			
 
				-
			
 
				-do different exit policies at different exit nodes trash anonymity sets,
			
 
				-or not mess with them much?
			
 
				-
			
 
				-do we get better protection against a realistic adversary by having as
			
 
				-many nodes as possible, so he probably can't see the whole network,
			
 
				-or by having a small number of nodes that mix traffic well? is a
			
 
				-cascade topology a more realistic way to get defenses against traffic
			
 
				-confirmation? does the hydra (many inputs, few outputs) topology work
			
 
				-better? are we going to get a hydra anyway because most nodes will be
			
 
				+% There must be a better intro than this! -NM
			
 
				+In addition to the open problems discussed in
			
 
				+section~\ref{subsec:non-goals}, many other questions remain to be
			
 
				+solved by future research before we can be truly confident that we
			
 
				+have built a secure low-latency anonymity service.
			
 
				+
			
 
				+Many of these open issues are questions of balance.  For example,
			
 
				+how often should users rotate to fresh circuits?  Too-frequent
			
 
				+rotation is inefficient and expensive, but too-infrequent rotation
			
 
				+makes the user's traffic linkable.   Instead of opening a fresh
			
 
				+circuit; clients can also limit linkability exit from a middle point
			
 
				+of the circuit, or by truncating and re-extending the circuit, but
			
 
				+more analysis is needed to determine the proper trade-off.
			
 
				+[XXX mention predecessor attacks?]
			
 
				+
			
 
				+A similar question surrounds timing of directory operations:
			
 
				+how often should directories be updated?  With too-infrequent
			
 
				+updates clients receive an inaccurate picture of the network; with
			
 
				+too-frequent updates the directory servers are overloaded.
			
 
				+
			
 
				+%do different exit policies at different exit nodes trash anonymity sets,
			
 
				+%or not mess with them much?
			
 
				+%
			
 
				+%% Why would they?  By routing traffic to certain nodes preferentially?
			
 
				+
			
 
				+[XXX Choosing paths and path lengths: I'm not writing this bit till
			
 
				+  Arma's pathselection stuff is in. -NM]
			
 
				+
			
 
				+%%%% Roger said that he'd put a path selection paragraph into section
			
 
				+%%%% 4 that would replace this.
			
 
				+%
			
 
				+%I probably should have noted that this means loops will be on at least
			
 
				+%five hop routes, which should be rare given the distribution.  I'm    
			
 
				+%realizing that this is reproducing some of the thought that led to a  
			
 
				+%default of five hops in the original onion routing design.  There were
			
 
				+%some different assumptions, which I won't spell out now.  Note that   
			
 
				+%enclave level protections really change these assumptions.  If most   
			
 
				+%circuits are just two hops, then just a single link observer will be  
			
 
				+%able to tell that two enclaves are communicating with high probability.
			
 
				+%So, it would seem that enclaves should have a four node minimum circuit
			
 
				+%to prevent trivial circuit insider identification of the whole circuit,
			
 
				+%and three hop minimum for circuits from an enclave to some nonclave    
			
 
				+%responder. But then... we would have to make everyone obey these rules 
			
 
				+%or a node that through timing inferred it was on a four hop circuit    
			
 
				+%would know that it was probably carrying enclave to enclave traffic.   
			
 
				+%Which... if there were even a moderate number of bad nodes in the      
			
 
				+%network would make it advantageous to break the connection to conduct  
			
 
				+%a reformation intersection attack. Ahhh! I gotta stop thinking         
			
 
				+%about this and work on the paper some before the family wakes up.  
			
 
				+%On Sat, Oct 25, 2003 at 06:57:12AM -0400, Paul Syverson wrote:
			
 
				+%> Which... if there were even a moderate number of bad nodes in the
			
 
				+%> network would make it advantageous to break the connection to conduct
			
 
				+%> a reformation intersection attack. Ahhh! I gotta stop thinking
			
 
				+%> about this and work on the paper some before the family wakes up. 
			
 
				+%This is the sort of issue that should go in the 'maintaining anonymity
			
 
				+%with tor' section towards the end. :)
			
 
				+%Email from between roger and me to beginning of section above. Fix and move.
			
 
				+
			
 
				+Throughout this paper, we have assumed that end-to-end traffic
			
 
				+analysis cannot yet be defeated.  But even high-latency anonymity
			
 
				+systems can be vulnerable to end-to-end traffic analysis, if the
			
 
				+traffic volumes are high enough, and if users' habits are sufficiently
			
 
				+distinct \cite{disclosure,statistical-disclosure}.  \emph{What can be
			
 
				+  done to limit the effectiveness of these attacks against low-latency
			
 
				+  systems?}  Tor already makes some effort to conceal the starts and
			
 
				+ends of streams by wrapping all long-range control commands in
			
 
				+identical-looking relay cells, but more analysis is needed.  Link
			
 
				+padding could frustrate passive observer who count packets; long-range
			
 
				+padding could work against observers who own the first hop in a
			
 
				+circuit.  But more research needs to be done in order to find an
			
 
				+efficient and practical approach.  Volunteers prefer not to run
			
 
				+constant-bandwidth padding; but more sophisticated traffic shaping
			
 
				+approaches remain somewhat unanalyzed. [XXX is this so?] Recent work
			
 
				+on long-range padding \cite{long-range-padding} shows promise.  One
			
 
				+could also try to reduce correlation in packet timing by batching and
			
 
				+re-ordering packets, but it is unclear whether this could improve
			
 
				+anonymity without introducing so much latency as to render the
			
 
				+network unusable.
			
 
				+
			
 
				+Even if passive timing attacks were wholly solved, active timing
			
 
				+attacks would remain.  \emph{What can
			
 
				+  be done to address attackers who can introduce timing patterns into
			
 
				+  a user's traffic?}  [XXX mention likely approaches]
			
 
				+
			
 
				+%%% I think we cover this by framing the problem as ``Can we make 
			
 
				+%%% end-to-end characteristics of low-latency systems as good as
			
 
				+%%% those of high-latency systems?''  Eliminating long-term
			
 
				+%%% intersection is a hard problem.
			
 
				+%
			
 
				+%Even regardless of link padding from Alice to the cloud, there will be
			
 
				+%times when Alice is simply not online. Link padding, at the edges or
			
 
				+%inside the cloud, does not help for this.
			
 
				+
			
 
				+In order to scale to large numbers of users, and to prevent an
			
 
				+attacker from observing the whole network at once, it may be necessary
			
 
				+for low-latency anonymity systems to support far more servers than Tor
			
 
				+currently anticipates.  This introduces several issues.  First, if
			
 
				+approval by a centralized set of directory servers is no longer
			
 
				+feasible, what mechanism should be used to prevent adversaries from
			
 
				+signing up many spurious servers?  (Tarzan and Morphmix present
			
 
				+possible solutions.)  Second, if clients can no longer have a complete
			
 
				+picture of the network at all times how do we prevent attackers from
			
 
				+manipulating client knowledge?  Third, if there are to many servers
			
 
				+for every server to constantly communicate with every other, what kind
			
 
				+of non-clique topology should the network use?  [XXX cite george's
			
 
				+  restricted-routes paper] (Whatever topology we choose, we need some
			
 
				+way to keep attackers from manipulating their position within it.)
			
 
				+Fourth, since no centralized authority is tracking server reliability,
			
 
				+How do we prevent unreliable servers from rendering the network
			
 
				+unusable?  Fifth, do clients receive so much anonymity benefit from
			
 
				+running their own servers that we should expect them all to do so, or
			
 
				+do we need to find another incentive structure to motivate them?
			
 
				+
			
 
				+Alternatively, it may be the case that one of these problems proves
			
 
				+intractable, or that the drawbacks to many-server systems prove
			
 
				+greater than the benefits.  Nevertheless, we may still do well to
			
 
				+consider non-clique topologies.  A cascade topology may provide more
			
 
				+defense against traffic confirmation confirmation.
			
 
				+% Why would it?   Cite.  -NM
			
 
				+Does the hydra (many inputs, few outputs) topology work
			
 
				+better? Are we going to get a hydra anyway because most nodes will be
			
 
				 middleman nodes?
			
 
				 
			
 
				-using a circuit many times is good because it's less cpu work.
			
 
				-  good because of predecessor attacks with path rebuilding.
			
 
				-  bad because predecessor attacks can be more likely to link you with a
			
 
				-    previous circuit since you're so verbose.
			
 
				-  bad because each thing you do on that circuit is linked to the other
			
 
				-    things you do on that circuit.
			
 
				-  how often to rotate?
			
 
				-  how to decide when to exit from middle?
			
 
				-  when to truncate and re-extend versus when to start new circuit?
			
 
				-
			
 
				-Because Tor runs over TCP, when one of the servers goes down it seems
			
 
				-that all the circuits (and thus streams) going over that server must
			
 
				-break. This reduces anonymity because everybody needs to reconnect
			
 
				-right then (does it? how much?) and because exit connections all break
			
 
				-at the same time, and it also reduces usability. It seems the problem
			
 
				-is even worse in a p2p environment, because so far such systems don't
			
 
				-really provide an incentive for nodes to stay connected when they're
			
 
				-done browsing, so we would expect a much higher churn rate than for
			
 
				-onion routing. Are there ways of allowing streams to survive the loss
			
 
				-of a node in the path?
			
 
				-
			
 
				-discuss topologies. Cite George's non-freeroutes paper.  Maybe this
			
 
				-graf goes elsewhere.
			
 
				-
			
 
				-discuss attracting users; incentives; usability.
			
 
				-
			
 
				-Choosing paths and path lengths.
			
 
				+%%% Do more with this paragraph once The TCP-over-TCP paragraph is
			
 
				+%%% more integrated into Related works.
			
 
				+%
			
 
				+As mentioned in section\ref{where-is-it-now}, Tor could improve its
			
 
				+robustness against node failure by buffering stream data at the
			
 
				+network's edges, and performing end-to-end acknowledgments.  The
			
 
				+efficacy of this approach remains to be tested, however, and there
			
 
				+may be more effective means for ensuring reliable connections in the
			
 
				+presence of unreliable nodes.
			
 
				+
			
 
				+%%% Keeping this original paragraph for a little while, since it 
			
 
				+%%% is not the same as what's written there now.
			
 
				+%
			
 
				+%Because Tor depends on TLS and TCP to provide a reliable transport,
			
 
				+%when one of the servers goes down, all the circuits (and thus streams)
			
 
				+%traveling over that server must break.  This reduces anonymity because
			
 
				+%everybody needs to reconnect right then (does it? how much?)  and
			
 
				+%because exit connections all break at the same time, and it also harms
			
 
				+%usability. It seems the problem is even worse in a peer-to-peer
			
 
				+%environment, because so far such systems don't really provide an
			
 
				+%incentive for nodes to stay connected when they're done browsing, so
			
 
				+%we would expect a much higher churn rate than for onion routing.
			
 
				+%there ways of allowing streams to survive the loss of a node in the
			
 
				+%path?
			
 
				+
			
 
				+% Roger or Paul suggested that we say something about incentives,
			
 
				+% too, but I think that's a better candidate for our future work
			
 
				+% section.  After all, we will doubtlessly learn very much about why
			
 
				+% people do or don't run and use Tor in the near future. -NM
			
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%