|  | @@ -0,0 +1,292 @@
 | 
	
		
			
				|  |  | +Filename: 115-two-hop-paths.txt
 | 
	
		
			
				|  |  | +Title: Two Hop Paths
 | 
	
		
			
				|  |  | +Version: $Revision$
 | 
	
		
			
				|  |  | +Last-Modified: $Date$
 | 
	
		
			
				|  |  | +Author: Mike Perry
 | 
	
		
			
				|  |  | +Created:
 | 
	
		
			
				|  |  | +Status: Open
 | 
	
		
			
				|  |  | +Supersedes: 112
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Overview:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The idea is that users should be able to choose if they would like
 | 
	
		
			
				|  |  | +  to have either two or three hop paths through the tor network.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  This value should be modifiable from the controller, and should be
 | 
	
		
			
				|  |  | +  available from Vidalia.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Motivation:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The Tor network is slow and overloaded. Increasingly often I hear
 | 
	
		
			
				|  |  | +  stories about friends and friends of friends who are behind firewalls,
 | 
	
		
			
				|  |  | +  annoying censorware, or under surveillance that interferes with their
 | 
	
		
			
				|  |  | +  productivity and Internet usage, or chills their speech. These people
 | 
	
		
			
				|  |  | +  know about Tor, but they choose to put up with the censorship because
 | 
	
		
			
				|  |  | +  Tor is too slow to be usable for them. In fact, to download a fresh,
 | 
	
		
			
				|  |  | +  complete copy of levine-timing.pdf for the Theoretical Argument
 | 
	
		
			
				|  |  | +  section of this proposal over Tor took me 3 tries.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Furthermore, the biggest current problem with Tor's anonymity for
 | 
	
		
			
				|  |  | +  those who really need it is not someone attacking the network to
 | 
	
		
			
				|  |  | +  discover who they are. It's instead the extreme danger that so few
 | 
	
		
			
				|  |  | +  people use Tor because it's so slow, that those who do use it have
 | 
	
		
			
				|  |  | +  essentially no confusion set.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The recent case where the professor and the rogue Tor user were the
 | 
	
		
			
				|  |  | +  only Tor users on campus, and thus suspected in an incident involving
 | 
	
		
			
				|  |  | +  Tor and that University underscores this point: "That was why the police
 | 
	
		
			
				|  |  | +  had come to see me. They told me that only two people on our campus were
 | 
	
		
			
				|  |  | +  using Tor: me and someone they suspected of engaging in an online scam.
 | 
	
		
			
				|  |  | +  The detectives wanted to know whether the other user was a former
 | 
	
		
			
				|  |  | +  student of mine, and why I was using Tor"[1].
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Not only does Tor provide no anonymity if you use it to be anonymous
 | 
	
		
			
				|  |  | +  but are obviously from a certain institution, location or circumstance,
 | 
	
		
			
				|  |  | +  it is also dangerous to use Tor for risk of being accused of having
 | 
	
		
			
				|  |  | +  something significant enough to hide to be willing to put up with
 | 
	
		
			
				|  |  | +  the horrible performance.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  There are many ways to improve the speed problem, and of course we
 | 
	
		
			
				|  |  | +  should and will implement as many as we can. Johannes's GSoC project
 | 
	
		
			
				|  |  | +  and my reputation system are longer term, higher-effort things that
 | 
	
		
			
				|  |  | +  will still provide benefit independent of this proposal.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  However, reducing the path length to 2 for those who do not need the
 | 
	
		
			
				|  |  | +  (questionable) extra anonymity 3 hops provide not only improves their
 | 
	
		
			
				|  |  | +  Tor experience but also reduces their load on the Tor network by 33%,
 | 
	
		
			
				|  |  | +  and can be done in less than 10 lines of code (not counting various
 | 
	
		
			
				|  |  | +  security enhancements). That's not just Win-Win, it's Win-Win-Win.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Theoretical Argument:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  It has long been established that timing attacks against mixed
 | 
	
		
			
				|  |  | +  and onion networks are extremely effective, and that regardless 
 | 
	
		
			
				|  |  | +  of path length, if the adversary has compromised your first and 
 | 
	
		
			
				|  |  | +  last hop of your path, you can assume they have compromised your
 | 
	
		
			
				|  |  | +  identity for that connection.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  In fact, it was demonstrated that for all but the slowest, lossiest
 | 
	
		
			
				|  |  | +  networks, error rates for false positives and false negatives were
 | 
	
		
			
				|  |  | +  very near zero[2]. Only for constant streams of traffic over slow and
 | 
	
		
			
				|  |  | +  (more importantly) extremely lossy network links did the error rate
 | 
	
		
			
				|  |  | +  hit 20%. For loss rates typical to the Internet, even the error rate
 | 
	
		
			
				|  |  | +  for slow nodes with constant traffic streams was 13%.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  When you take into account that most Tor streams are not constant,
 | 
	
		
			
				|  |  | +  but probably much more like their "HomeIP" dataset, which consists
 | 
	
		
			
				|  |  | +  mostly of web traffic that exists over finite intervals at specific
 | 
	
		
			
				|  |  | +  times, error rates drop to fractions of 1%, even for the "worst"
 | 
	
		
			
				|  |  | +  network nodes.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Therefore, the user has little benefit from the extra hop, assuming
 | 
	
		
			
				|  |  | +  the adversary does timing correlation on their nodes. Since timing
 | 
	
		
			
				|  |  | +  correlation is simply an implementation issue and is most likely
 | 
	
		
			
				|  |  | +  a single up-front cost (and one that is like quite a bit cheaper
 | 
	
		
			
				|  |  | +  than the cost of the machines purchased to host the nodes to mount
 | 
	
		
			
				|  |  | +  an attack), the real protection is the low probability of getting
 | 
	
		
			
				|  |  | +  both the first and last hop of a client's stream.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Practical Issues:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Theoretical issues aside, there are several practical issues with the
 | 
	
		
			
				|  |  | +  implementation of Tor that need to be addressed to ensure that
 | 
	
		
			
				|  |  | +  identity information is not leaked by the implementation.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Exit policy issues:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  If a client chooses an exit with a very restrictive exit policy
 | 
	
		
			
				|  |  | +  (such as an IP or IP range), the first hop then knows a good deal
 | 
	
		
			
				|  |  | +  about the destination. For this reason, clients should not select
 | 
	
		
			
				|  |  | +  exits that match their destination IP with anything other than "*".
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Partitioning:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Partitioning attacks form another concern. Since Tor uses telescoping
 | 
	
		
			
				|  |  | +  to build circuits, it is possible to tell a user is constructing only
 | 
	
		
			
				|  |  | +  two hop paths at the entry node and on the local network. An external
 | 
	
		
			
				|  |  | +  adversary can potentially differentiate 2 and 3 hop users, and decide
 | 
	
		
			
				|  |  | +  that all IP addresses connecting to Tor and using 3 hops have something
 | 
	
		
			
				|  |  | +  to hide, and should be scrutinized more closely or outright apprehended.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  One solution to this is to use the "leaky-circuit" method of attaching
 | 
	
		
			
				|  |  | +  streams: The user always creates 3-hop circuits, but if the option
 | 
	
		
			
				|  |  | +  is enabled, they always exit from their 2nd hop. The ideal solution
 | 
	
		
			
				|  |  | +  would be to create a RELAY_SHISHKABOB cell which contains onion
 | 
	
		
			
				|  |  | +  skins for every host along the path, but this requires protocol
 | 
	
		
			
				|  |  | +  changes at the nodes to support.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Guard nodes:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Since guard nodes do rotate due to network failure, node upgrades and
 | 
	
		
			
				|  |  | +  other issues, if you amortize the risk a user is exposed to over any
 | 
	
		
			
				|  |  | +  reasonable duration of Tor usage (on the order of a year), it is the
 | 
	
		
			
				|  |  | +  same with or without guard nodes. Assuming an adversary has c%/n% of
 | 
	
		
			
				|  |  | +  network bandwidth, and guards rotate on average with period R,
 | 
	
		
			
				|  |  | +  statistically speaking, it's merely a question of if the user wishes
 | 
	
		
			
				|  |  | +  their risk to be concentrated with probability c/n over an expected
 | 
	
		
			
				|  |  | +  period of R*c, and probability 0 over an expected period of R*(n-c),
 | 
	
		
			
				|  |  | +  versus a continuous risk of (c/n)^2. So statistically speaking, guards
 | 
	
		
			
				|  |  | +  only create a time-tradeoff of risk over the long run for normal Tor
 | 
	
		
			
				|  |  | +  usage. They do not reduce risk for normal client usage long term.[3]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Guard nodes do offer a measure of accountability of sorts. If a user
 | 
	
		
			
				|  |  | +  was using a small set of guard nodes, and then is suddenly apprehended
 | 
	
		
			
				|  |  | +  as a result of Tor usage, having a fixed set of entry points to suspect
 | 
	
		
			
				|  |  | +  is a lot better than suspecting the whole network.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  It has been speculated that a set of guard nodes can be used to
 | 
	
		
			
				|  |  | +  fingerprint a user (presumably by a local adversary) when they move
 | 
	
		
			
				|  |  | +  about. However, it is precisely this activity of moving your laptop that
 | 
	
		
			
				|  |  | +  causes guards to be marked as down by the Tor client, which then chooses
 | 
	
		
			
				|  |  | +  new ones.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  All of this is not terribly relevant to this proposal, but worth bearing
 | 
	
		
			
				|  |  | +  in mind, since guard nodes do have a bit more ability to wreak
 | 
	
		
			
				|  |  | +  havoc with two hops than with three.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Two hop paths allow malicious guards to get considerably more benefit
 | 
	
		
			
				|  |  | +  from failing circuits if they do not extend to their colluding peers for
 | 
	
		
			
				|  |  | +  the exit hop. Since guards can detect the number of hops in a path via
 | 
	
		
			
				|  |  | +  either timing or by statistical analysis of the exit policy of the 2nd
 | 
	
		
			
				|  |  | +  hop, they can perform this attack predominantly against 2 hop users
 | 
	
		
			
				|  |  | +  only.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  This can be addressed by completely abandoning an entry guard after a
 | 
	
		
			
				|  |  | +  certain ratio of extend or general circuit failures with respect to
 | 
	
		
			
				|  |  | +  non-failed circuits. The proper value for this ratio can be determined
 | 
	
		
			
				|  |  | +  experimentally with TorFlow. There is the possibility that the local
 | 
	
		
			
				|  |  | +  network can abuse this feature to cause certain guards to be dropped,
 | 
	
		
			
				|  |  | +  but they can do that anyways with the current Tor by just making guards
 | 
	
		
			
				|  |  | +  they don't like unreachable. With this mechanism, Tor will complain
 | 
	
		
			
				|  |  | +  loudly if any guard failure rate exceeds the expected in any failure
 | 
	
		
			
				|  |  | +  case, local or remote.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Eliminating guards entirely would actually not address this issue due
 | 
	
		
			
				|  |  | +  to the time-tradeoff nature of risk. In fact, it would just make it
 | 
	
		
			
				|  |  | +  worse. Without guard nodes, it becomes much more difficult for clients
 | 
	
		
			
				|  |  | +  to become alerted to Tor entry points that are failing circuits to make
 | 
	
		
			
				|  |  | +  sure that they only devote bandwidth to carry traffic for streams which
 | 
	
		
			
				|  |  | +  they observe both ends.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  For this reason, guard nodes should remain enabled for 2 hop users,
 | 
	
		
			
				|  |  | +  at least until an IP-independent, undetectable guard scanner can
 | 
	
		
			
				|  |  | +  be created. TorFlow can scan for failing guards, but after a while, 
 | 
	
		
			
				|  |  | +  its unique behavior gives away the fact that its IP is a scanner and 
 | 
	
		
			
				|  |  | +  it can be given selective service.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Why not fix Pathlen=2?:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The main reason I am not advocating that we always use 2 hops is that
 | 
	
		
			
				|  |  | +  in some situations, timing correlation evidence by itself may not be
 | 
	
		
			
				|  |  | +  considered as solid and convincing as an actual, uninterrupted, fully
 | 
	
		
			
				|  |  | +  traced path. Are these timing attacks as effective on a real network as
 | 
	
		
			
				|  |  | +  they are in simulation? Maybe the circuit multiplexing of Tor can serve 
 | 
	
		
			
				|  |  | +  to frustrate them to a degree? Would an extralegal adversary or 
 | 
	
		
			
				|  |  | +  authoritarian government even care? In the face of these situation 
 | 
	
		
			
				|  |  | +  dependent unknowns, it should be up to the user to decide if this is 
 | 
	
		
			
				|  |  | +  a concern for them or not.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  It should probably also be noted that even a false positive
 | 
	
		
			
				|  |  | +  rate of 1% for a 200k concurrent-user network could mean that for a
 | 
	
		
			
				|  |  | +  given node, a given stream could be confused with something like 10
 | 
	
		
			
				|  |  | +  users, assuming ~200 nodes carry most of the traffic (ie 1000 users
 | 
	
		
			
				|  |  | +  each). Though of course to really know for sure, someone needs to do
 | 
	
		
			
				|  |  | +  an attack on a real network, unfortunately.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Additionally, at some point cover traffic schemes may be implemented to
 | 
	
		
			
				|  |  | +  frustrate timing attacks on the first hop. It is possible some expert
 | 
	
		
			
				|  |  | +  users may do this ad-hoc already, and may wish to continue using 3 hops
 | 
	
		
			
				|  |  | +  for this reason.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Who will enable this option?
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  This is the crux of the proposal. Admittedly, there is some anonymity
 | 
	
		
			
				|  |  | +  loss and some degree of decreased investment required on the part of
 | 
	
		
			
				|  |  | +  the adversary to attack 2 hop users versus 3 hop users, even if it is
 | 
	
		
			
				|  |  | +  minimal and limited mostly to up-front costs and false positives.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The key questions are:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  1. Are these users in a class such that their risk is significantly
 | 
	
		
			
				|  |  | +     less than the amount of this anonymity loss?
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  2. Are these users able to identify themselves?
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Many many users of Tor are not at risk for an adversary capturing c/n
 | 
	
		
			
				|  |  | +  nodes of the network just to see what they do. These users use Tor to
 | 
	
		
			
				|  |  | +  circumvent aggressive content filters, or simply to keep their IP out of
 | 
	
		
			
				|  |  | +  marketing and search engine databases. Most content filters have no
 | 
	
		
			
				|  |  | +  interest in running Tor nodes to catch violators, and marketers
 | 
	
		
			
				|  |  | +  certainly would never consider such a thing, both on a cost basis and a
 | 
	
		
			
				|  |  | +  legal one.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  In a sense, this represents an alternate threat model against these
 | 
	
		
			
				|  |  | +  users who are not at risk for Tor's normal threat model.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  It should be evident to these users that they fall into this class. All
 | 
	
		
			
				|  |  | +  that should be needed is a radio button
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   * "I use Tor for censorship resistance and IP obfuscation, not anonymity.
 | 
	
		
			
				|  |  | +      Speed is more important to me than high anonymity."
 | 
	
		
			
				|  |  | +   * "I use Tor for anonymity. I need more protection at the cost of speed."
 | 
	
		
			
				|  |  | + 
 | 
	
		
			
				|  |  | +  and then some explanation in the help for exactly what this means, and
 | 
	
		
			
				|  |  | +  the risks involved with eliminating the adversary's need for timing
 | 
	
		
			
				|  |  | +  attacks with respect to false positives.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Implementation:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  new_route_len() can be modified directly with a check of the
 | 
	
		
			
				|  |  | +  Pathlen option.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The exit policy hack is a bit more tricky. compare_addr_to_addr_policy
 | 
	
		
			
				|  |  | +  needs to return an alternate ADDR_POLICY_ACCEPTED_WILDCARD or
 | 
	
		
			
				|  |  | +  ADDR_POLICY_ACCEPTED_SPECIFIC return value for use in
 | 
	
		
			
				|  |  | +  circuit_is_acceptable.
 | 
	
		
			
				|  |  | +  
 | 
	
		
			
				|  |  | +  The leaky exit is trickier still.. handle_control_attachstream
 | 
	
		
			
				|  |  | +  does allow paths to exit at a given hop. Presumably something similar
 | 
	
		
			
				|  |  | +  can be done in connection_ap_handshake_process_socks, and elsewhere?
 | 
	
		
			
				|  |  | +  Circuit construction would also have to be performed such that the
 | 
	
		
			
				|  |  | +  2nd hop's exit policy is what is considered, not the 3rd's.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The entry_guard_t structure could have num_circ_failed and
 | 
	
		
			
				|  |  | +  num_circ_succeeded members such that if it exceeds F% circuit
 | 
	
		
			
				|  |  | +  extend failure rate to a second hop, it is removed from the entry list.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  F should be sufficiently high to avoid churn from normal Tor circuit
 | 
	
		
			
				|  |  | +  failure as determined by TorFlow scans.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The Vidalia option should be presented as a radio button.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +Migration:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Phase 1: Adjust exit policy checks if Pathlen is set. Modify
 | 
	
		
			
				|  |  | +  new_route_len() to obey a 'Pathlen' config option.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Phase 2: Implement leaky circuit ability.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Phase 3: Experiment to determine the proper ratio of circuit
 | 
	
		
			
				|  |  | +  failures used to expire garbage or malicious guards via TorFlow
 | 
	
		
			
				|  |  | +  (pending Bug #440 backport+adoption).
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Phase 4: Implement guard expiration code to kick off failure-prone
 | 
	
		
			
				|  |  | +  guards and warn the user.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Phase 5: Make radiobutton in Vidalia, along with help entry
 | 
	
		
			
				|  |  | +  that explains in layman's terms the risks involved.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Phase 6: Allow user to specify pathlength by HTTP URL suffix.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +[1] http://p2pnet.net/story/11279
 | 
	
		
			
				|  |  | +[2] http://www.cs.umass.edu/~mwright/papers/levine-timing.pdf
 | 
	
		
			
				|  |  | +[3] Proof available upon request ;)
 |