challenges.tex 35 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705
  1. \documentclass{llncs}
  2. \usepackage{url}
  3. \usepackage{amsmath}
  4. \usepackage{epsfig}
  5. \newenvironment{tightlist}{\begin{list}{$\bullet$}{
  6. \setlength{\itemsep}{0mm}
  7. \setlength{\parsep}{0mm}
  8. % \setlength{\labelsep}{0mm}
  9. % \setlength{\labelwidth}{0mm}
  10. % \setlength{\topsep}{0mm}
  11. }}{\end{list}}
  12. \begin{document}
  13. \title{Challenges in practical low-latency stream anonymity (DRAFT)}
  14. \author{Roger Dingledine and Nick Mathewson}
  15. \institute{The Free Haven Project\\
  16. \email{\{arma,nickm\}@freehaven.net}}
  17. \maketitle
  18. \pagestyle{empty}
  19. \begin{abstract}
  20. foo
  21. \end{abstract}
  22. \section{Introduction}
  23. Tor is a low-latency anonymous communication overlay network
  24. \cite{tor-design} designed to be practical and usable for securing TCP
  25. streams over the Internet. We have been operating a publicly deployed
  26. Tor network since October 2003.
  27. Tor aims to resist observers and insiders by distributing each transaction
  28. over several nodes in the network. This ``distributed trust'' approach
  29. means the Tor network can be safely operated and used by a wide variety
  30. of mutually distrustful users, providing more sustainability and security
  31. than previous attempts at anonymizing networks.
  32. The Tor network has a broad range of users, including ordinary citizens
  33. who want to avoid being profiled for targeted advertisements, corporations
  34. who don't want to reveal information to their competitors, and law
  35. enforcement and government intelligence agencies who need
  36. to do operations on the Internet without being noticed.
  37. Tor has been funded by the U.S. Navy, for use in securing government
  38. communications, and also by the Electronic Frontier Foundation, for use
  39. in maintaining civil liberties for ordinary citizens online. The Tor
  40. protocol is one of the leading choices
  41. to be the anonymizing layer in the European Union's PRIME directive to
  42. help maintain privacy in Europe. The University of Dresden in Germany
  43. has integrated an independent implementation of the Tor protocol into
  44. their popular Java Anon Proxy anonymizing client. This wide variety of
  45. interests helps maintain both the stability and the security of the
  46. network.
  47. Tor has a weaker threat model than many anonymity designs in the
  48. literature. This is because we our primary requirements are to have a
  49. practical and useful network, and from there we aim to provide as much
  50. anonymity as we can.
  51. %need to discuss how we take the approach of building the thing, and then
  52. %assuming that, how much anonymity can we get. we're not here to model or
  53. %to simulate or to produce equations and formulae. but those have their
  54. %roles too.
  55. This paper aims to give the reader enough information to understand the
  56. technical and policy issues that Tor faces as we continue deployment,
  57. and to lay a research agenda for others to help in addressing some of
  58. these issues. Section \ref{sec:what-is-tor} gives an overview of the Tor
  59. design and ours goals. We go on in Section \ref{sec:related} to describe
  60. Tor's context in the anonymity space. Sections \ref{sec:crossroads-policy}
  61. and \ref{sec:crossroads-technical} describe the practical challenges,
  62. both policy and technical respectively, that stand in the way of moving
  63. from a practical useful network to a practical useful anonymous network.
  64. \section{What Is Tor}
  65. \label{sec:what-is-tor}
  66. \subsection{Distributed trust: safety in numbers}
  67. Tor provides \emph{forward privacy}, so that users can connect to
  68. Internet sites without revealing their logical or physical locations
  69. to those sites or to observers. It also provides \emph{location-hidden
  70. services}, so that critical servers can support authorized users without
  71. giving adversaries an effective vector for physical or online attacks.
  72. Our design provides this protection even when a portion of its own
  73. infrastructure is controlled by an adversary.
  74. To make private connections in Tor, users incrementally build a path or
  75. \emph{circuit} of encrypted connections through servers on the network,
  76. extending it one step at a time so that each server in the circuit only
  77. learns which server extended to it and which server it has been asked
  78. to extend to. The client negotiates a separate set of encryption keys
  79. for each step along the circuit.
  80. Once a circuit has been established, the client software waits for
  81. applications to request TCP connections, and directs these application
  82. streams along the circuit. Many streams can be multiplexed along a single
  83. circuit, so applications don't need to wait for keys to be negotiated
  84. every time they open a connection. Because each server sees no
  85. more than one end of the connection, a local eavesdropper or a compromised
  86. server cannot use traffic analysis to link the connection's source and
  87. destination. The Tor client software rotates circuits periodically
  88. to prevent long-term linkability between different actions by a
  89. single user.
  90. Tor differs from other deployed systems for traffic analysis resistance
  91. in its security and flexibility. Mix networks such as Mixmaster or its
  92. successor Mixminion \cite{minion-design}
  93. gain the highest degrees of anonymity at the expense of introducing highly
  94. variable delays, thus making them unsuitable for applications such as web
  95. browsing that require quick response times. Commercial single-hop proxies
  96. such as {\url{anonymizer.com}} present a single point of failure, where
  97. a single compromise can expose all users' traffic, and a single-point
  98. eavesdropper can perform traffic analysis on the entire network.
  99. Also, their proprietary implementations place any infrastucture that
  100. depends on these single-hop solutions at the mercy of their providers'
  101. financial health. Tor can handle any TCP-based protocol, such as web
  102. browsing, instant messaging and chat, and secure shell login; and it is
  103. the only implemented anonymizing design with an integrated system for
  104. secure location-hidden services.
  105. No organization can achieve this security on its own. If a single
  106. corporation or government agency were to build a private network to
  107. protect its operations, any connections entering or leaving that network
  108. would be obviously linkable to the controlling organization. The members
  109. and operations of that agency would be easier, not harder, to distinguish.
  110. Instead, to protect our networks from traffic analysis, we must
  111. collaboratively blend the traffic from many organizations and private
  112. citizens, so that an eavesdropper can't tell which users are which,
  113. and who is looking for what information. By bringing more users onto
  114. the network, all users become more secure \cite{econymics}.
  115. Naturally, organizations will not want to depend on others for their
  116. security. If most participating providers are reliable, Tor tolerates
  117. some hostile infiltration of the network. For maximum protection,
  118. the Tor design includes an enclave approach that lets data be encrypted
  119. (and authenticated) end-to-end, so high-sensitivity users can be sure it
  120. hasn't been read or modified. This even works for Internet services that
  121. don't have built-in encryption and authentication, such as unencrypted
  122. HTTP or chat, and it requires no modification of those services to do so.
  123. weasel's graph of \# nodes and of bandwidth, ideally from week 0.
  124. Tor has the following goals.
  125. and we made these assumptions when trying to design the thing.
  126. \section{Tor's position in the anonymity field}
  127. \label{sec:related}
  128. There are many other classes of systems: single-hop proxies, open proxies,
  129. jap, mixminion, flash mixes, freenet, i2p, mute/ants/etc, tarzan,
  130. morphmix, freedom. Give brief descriptions and brief characterizations
  131. of how we differ. This is not the breakthrough stuff and we only have
  132. a page or two for it.
  133. have a serious discussion of morphmix's assumptions, since they would
  134. seem to be the direct competition. in fact tor is a flexible architecture
  135. that would encompass morphmix, and they're nearly identical except for
  136. path selection and node discovery. and the trust system morphmix has
  137. seems overkill (and/or insecure) based on the threat model we've picked.
  138. \section{Threat model}
  139. discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning
  140. the last hop is not c/n since that doesn't take the destination (website)
  141. into account. so in cases where the adversary does not also control the
  142. final destination we're in good shape, but if he *does* then we'd be better
  143. off with a system that lets each hop choose a path.
  144. in practice tor's threat model is based entirely on the goal of dispersal
  145. and diversity. george and steven describe an attack \cite{draft} that
  146. lets them determine the nodes used in a circuit; yet they can't identify
  147. alice or bob through this attack. so it's really just the endpoints that
  148. remain secure. and the enclave model seems particularly threatened by
  149. this, since this attack lets us identify endpoints when they're servers.
  150. see \ref{subsec:helper-nodes} for discussion of some ways to address this
  151. issue.
  152. see \ref{subsec:routing-zones} for discussion of larger
  153. adversaries and our dispersal goals.
  154. \section{Crossroads: Policy issues}
  155. \label{sec:crossroads-policy}
  156. Many of the issues the Tor project needs to address are not just a
  157. matter of system design or technology development. In particular, the
  158. Tor project's \emph{image} with respect to its users and the rest of
  159. the Internet impacts the security it can provide.
  160. As an example to motivate this section, some U.S.~Department of Enery
  161. penetration testing engineers are tasked with compromising DoE computers
  162. from the outside. They only have a limited number of ISPs from which to
  163. launch their attacks, and they found that the defenders were recognizing
  164. attacks because they came from the same IP space. These engineers wanted
  165. to use Tor to hide their tracks. First, from a technical standpoint,
  166. Tor does not support the variety of IP packets they would like to use in
  167. such attacks (see Section \ref{subsec:ip-vs-tcp}). But aside from this,
  168. we also decided that it would probably be poor precedent to encourage
  169. such use -- even legal use that improves national security -- and managed
  170. to dissuade them.
  171. With this image issue in mind, here we discuss the Tor user base and
  172. Tor's interaction with other services on the Internet.
  173. \subsection{Usability}
  174. Usability: fc03 paper was great, except the lower latency you are the
  175. less useful it seems it is.
  176. A Tor gui, how jap's gui is nice but does not reflect the security
  177. they provide.
  178. Public perception, and thus advertising, is a security parameter.
  179. \subsection{Image, usability, and sustainability}
  180. Image: substantial non-infringing uses. Image is a security parameter,
  181. since it impacts user base and perceived sustainability.
  182. Sustainability. Previous attempts have been commercial which we think
  183. adds a lot of unnecessary complexity and accountability. Freedom didn't
  184. collect enough money to pay its servers; JAP bandwidth is supported by
  185. continued money, and they periodically ask what they will do when it
  186. dries up.
  187. good uses are kept private, bad uses are publicized. not good.
  188. \subsection{Reputability}
  189. Yet another factor in the safety of a given network is its reputability:
  190. the perception of its social value based on its current users. If I'm
  191. the only user of a system, it might be socially accepted, but I'm not
  192. getting any anonymity. Add a thousand Communists, and I'm anonymous,
  193. but everyone thinks I'm a Commie. Add a thousand random citizens (cancer
  194. survivors, privacy enthusiasts, and so on) and now I'm hard to profile.
  195. The more cancer survivors on Tor, the better for the human rights
  196. activists. The more script kiddies, the worse for the normal users. Thus,
  197. reputability is an anonymity issue for two reasons. First, it impacts
  198. the sustainability of the network: a network that's always about to be
  199. shut down has difficulty attracting and keeping users, so its anonymity
  200. set suffers. Second, a disreputable network attracts the attention of
  201. powerful attackers who may not mind revealing the identities of all the
  202. users to uncover the few bad ones.
  203. While people therefore have an incentive for the network to be used for
  204. ``more reputable'' activities than their own, there are still tradeoffs
  205. involved when it comes to anonymity. To follow the above example, a
  206. network used entirely by cancer survivors might welcome some Communists
  207. onto the network, though of course they'd prefer a wider variety of users.
  208. The impact of public perception on security is especially important
  209. during the bootstrapping phase of the network, where the first few
  210. widely publicized uses of the network can dictate the types of users it
  211. attracts next.
  212. \subsection{Tor and file-sharing}
  213. Bittorrent and dmca. Should we add an IDS to autodetect protocols and
  214. snipe them?
  215. \subsection{Tor and blacklists}
  216. Takedowns and efnet abuse and wikipedia complaints and irc
  217. networks.
  218. It was long expected that, alongside Tor's legitimate users, it would also
  219. attract troublemakers who exploited Tor in order to abuse services on the
  220. Internet. Our initial answer to this situation was to use ``exit policies''
  221. to allow individual Tor servers to block access to specific IP/port ranges.
  222. This approach was meant to make operators more willing to run Tor by allowing
  223. them to prevent their servers from being used for abusing particular
  224. services. For example, all Tor servers currently block SMTP (port 25), in
  225. order to avoid being used to send spam.
  226. This approach is useful, but is insufficient for two reasons. First, since
  227. it is not possible to force all ORs to block access to any given service,
  228. many of those services try to block Tor instead. More broadly, while being
  229. blockable is important to being good netizens, we would like to encourage
  230. services to allow anonymous access; services should not need to decide
  231. between blocking legitimate anonymous use and allowing unlimited abuse.
  232. This is potentially a bigger problem than it may appear.
  233. On the one hand, if people want to refuse connections from you on
  234. their servers it would seem that they should be allowed to. But, a
  235. possible major problem with the blocking of Tor is that it's not just
  236. the decision of the individual server administrator whose deciding if
  237. he wants to post to wikipedia from his Tor node address or allow
  238. people to read wikipedia anonymously through his Tor node. If e.g.,
  239. s/he comes through a campus or corporate NAT, then the decision must
  240. be to have the entire population behind it able to have a Tor exit
  241. node or write access to wikipedia. This is a loss for both of us (Tor
  242. and wikipedia). We don't want to compete for (or divvy up) the NAT
  243. protected entities of the world.
  244. (A related problem is that many IP blacklists are not terribly fine-grained.
  245. No current IP blacklist, for example, allow a service provider to blacklist
  246. only those Tor servers that allow access to a specific IP or port, even
  247. though this information is readily available. One IP blacklist even bans
  248. every class C network that contains a Tor server, and recommends banning SMTP
  249. from these networks even though Tor does not allow SMTP at all.)
  250. Problems of abuse occur mainly with services such as IRC networks and
  251. Wikipedia, which rely on IP-blocking to ban abusive users. While at first
  252. blush this practice might seem to depend on the anachronistic assumption that
  253. each IP is an identifier for a single user, it is actually more reasonable in
  254. practice: it assumes that non-proxy IPs are a costly resource, and that an
  255. abuser can not change IPs at will. By blocking IPs which are used by Tor
  256. servers, open proxies, and service abusers, these systems hope to make
  257. ongoing abuse difficult. Although the system is imperfect, it works
  258. tolerably well for them in practice.
  259. But of course, we would prefer that legitimate anonymous users be able to
  260. access abuse-prone services. One conceivable approach would be to require
  261. would-be IRC users, for instance, to register accounts if they wanted to
  262. access the IRC network from Tor. But in practise, this would not
  263. significantly impede abuse if creating new accounts were easily automatable;
  264. this is why services use IP blocking. In order to deter abuse, pseudonymous
  265. identities need to impose a significant switching cost in resources or human
  266. time.
  267. Once approach, similar to that taken by Freedom, would be to bootstrap some
  268. non-anonymous costly identification mechanism to allow access to a
  269. blind-signature pseudonym protocol. This would effectively create costly
  270. pseudonyms, which services could require in order to allow anonymous access.
  271. This approach has difficulties in practise, however:
  272. \begin{tightlist}
  273. \item Unlike Freedom, Tor is not a commercial service. Therefore, it would
  274. be a shame to require payment in order to make Tor useful, or to make
  275. non-paying users second-class citizens.
  276. \item It is hard to think of an underlying resource that would actually work.
  277. We could use IP addresses, but that's the problem, isn't it?
  278. \item Managing single sign-on services is not considered a well-solved
  279. problem in practice. If Microsoft can't get universal acceptance for
  280. passport, why do we think that a Tor-specific solution would do any good?
  281. \item Even if we came up with a perfect authentication system for our needs,
  282. there's no guarantee that any service would actually start using it. It
  283. would require a nonzero effort for them to support it, and it might just
  284. be less hassle for them to block tor anyway.
  285. \end{tightlist}
  286. Squishy IP based ``authentication'' and ``authorization'' is a reality
  287. we must contend with. We should say something more about the analogy
  288. with SSNs.
  289. \subsection{Other}
  290. Tor's scope: How much should Tor aim to do? Applications that leak
  291. data: we can say they're not our problem, but they're somebody's problem.
  292. Also, the more widely deployed Tor becomes, the more people who need a
  293. deployed overlay network tell us they'd like to use us if only we added
  294. the following more features. For example, Blossom \cite{blossom} and
  295. random community wireless projects both want source-routable overlay
  296. networks for their own purposes. Fortunately, our modular design separates
  297. routing from node discovery; so we could implement Morphmix in Tor just
  298. by implementing the Morphmix-specific node discovery and path selection
  299. pieces. On the other hand, we could easily get distracted building a
  300. general-purpose overlay library, and we're only a few developers.
  301. Should we allow revocation of anonymity if a threshold of
  302. servers want to?
  303. Logging. Making logs not revealing. A happy coincidence that verbose
  304. logging is our \#2 performance bottleneck. Is there a way to detect
  305. modified servers, or to have them volunteer the information that they're
  306. logging verbosely? Would that actually solve any attacks?
  307. \section{Crossroads: Scaling and Design choices}
  308. \label{sec:crossroads-design}
  309. \subsection{Transporting the stream vs transporting the packets}
  310. We periodically run into ex ZKS employees who tell us that the process of
  311. anonymizing IPs should ``obviously'' be done at the IP layer. Here are
  312. the issues that need to be resolved before we'll be ready to switch Tor
  313. over to arbitrary IP traffic.
  314. \begin{enumerate}
  315. \setlength{\itemsep}{0mm}
  316. \setlength{\parsep}{0mm}
  317. \item [IP packets reveal OS characteristics.] We still need to do
  318. IP-level packet normalization, to stop things like IP fingerprinting
  319. \cite{ip-fingerprinting}. There exist libraries \cite{ip-normalizing}
  320. that can help with this.
  321. \item [Application-level streams still need scrubbing.] We still need
  322. Tor to be easy to integrate with user-level application-specific proxies
  323. such as Privoxy. So it's not just a matter of capturing packets and
  324. anonymizing them at the IP layer.
  325. \item [Certain protocols will still leak information.] For example,
  326. DNS requests destined for my local DNS servers need to be rewritten
  327. to be delivered to some other unlinkable DNS server. This requires
  328. understanding the protocols we are transporting.
  329. \item [The crypto is unspecified.] First we need a block-level encryption
  330. approach that can provide security despite
  331. packet loss and out-of-order delivery. Freedom allegedly had one, but it was
  332. never publicly specified, and we believe it's likely vulnerable to tagging
  333. attacks \cite{tor-design}. Also, TLS over UDP is not implemented or even
  334. specified, though some early work has begun on that \cite{ben-tls-udp}.
  335. \item [We'll still need to tune network parameters]. Since the above
  336. encryption system will likely need sequence numbers and maybe more to do
  337. replay detection, handle duplicate frames, etc, we will be reimplementing
  338. some subset of TCP anyway to manage throughput, congestion control, etc.
  339. \item [Exit policies for arbitrary IP packets mean building a secure
  340. IDS.] Our server operators tell us that exit policies are one of
  341. the main reasons they're willing to run Tor over previous attempts
  342. at anonymizing networks. Adding an IDS to handle exit policies would
  343. increase the security complexity of Tor, and would likely not work anyway,
  344. as evidenced by the entire field of IDS and counter-IDS papers. Many
  345. potential abuse issues are resolved by the fact that Tor only transports
  346. valid TCP streams (as opposed to arbitrary IP including malformed packets
  347. and IP floods), so exit policies become even \emph{more} important as
  348. we become able to transport IP packets. We also need a way to compactly
  349. characterize the exit policies and let clients parse them to decide
  350. which nodes will allow which packets to exit.
  351. \item [The Tor-internal name spaces would need to be redesigned.] We
  352. support hidden service \tt{.onion} addresses, and other special addresses
  353. like \tt{.exit} (see Section \ref{subsec:}), by intercepting the addresses
  354. when they are passed to the Tor client.
  355. \end{enumerate}
  356. This list is discouragingly long right now, but we recognize that it
  357. would be good to investigate each of these items in further depth and to
  358. understand which are actual roadblocks and which are easier to resolve
  359. than we think. We certainly wouldn't mind if Tor one day is able to
  360. transport a greater variety of protocols.
  361. \subsection{Mid-latency}
  362. Mid-latency. Can we do traffic shape to get any defense against George's
  363. PET2004 paper? Will padding or long-range dummies do anything then? Will
  364. it kill the user base or can we get both approaches to play well together?
  365. explain what mid-latency is. propose a single network where users of
  366. varying latency goals can combine.
  367. Note that in practice as the network is growing and we accept cable
  368. modem and dsl nodes, and nodes in other continents, we're *already*
  369. looking at many-second delays for some transactions. The engineering
  370. required to get this lower is going to be extremely hard. It's worth
  371. considering how hard it would be to accept the fixed (higher) latency
  372. and improve the protection we get from it.
  373. % can somebody besides arma flesh this section out?
  374. %\subsection{The DNS problem in practice}
  375. \subsection{Measuring performance and capacity}
  376. How to measure performance without letting people selectively deny service
  377. by distinguishing pings. Heck, just how to measure performance at all. In
  378. practice people have funny firewalls that don't match up to their exit
  379. policies and Tor doesn't deal.
  380. Network investigation: Is all this bandwidth publishing thing a good idea?
  381. How can we collect stats better? Note weasel's smokeping, at
  382. http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
  383. which probably gives george and steven enough info to break tor?
  384. \subsection{Plausible deniability}
  385. Does running a server help you or harm you? George's Oakland attack.
  386. Plausible deniability -- without even running your traffic through Tor! We
  387. have to pick the path length so adversary can't distinguish client from
  388. server (how many hops is good?).
  389. \subsection{Helper nodes}
  390. When does fixing your entry or exit node help you?
  391. Helper nodes in the literature don't deal with churn, and
  392. especially active attacks to induce churn.
  393. Do general DoS attacks have anonymity implications? See e.g. Adam
  394. Back's IH paper, but I think there's more to be pointed out here.
  395. \subsection{Location-hidden services}
  396. Survivable services are new in practice, yes? Hidden services seem
  397. less hidden than we'd like, since they stay in one place and get used
  398. a lot. They're the epitome of the need for helper nodes. This means
  399. that using Tor as a building block for Free Haven is going to be really
  400. hard. Also, they're brittle in terms of intersection and observation
  401. attacks. Would be nice to have hot-swap services, but hard to design.
  402. \subsection{Trust and discovery}
  403. The published Tor design adopted a deliberately simplistic design for
  404. authorizing new nodes and informing clients about servers and their status.
  405. In the early Tor designs, all ORs periodically uploaded a signed description
  406. of their locations, keys, and capabilities to each of several well-known {\it
  407. directory servers}. These directory servers constructed a signed summary
  408. of all known ORs (a ``directory''), and a signed statement of which ORs they
  409. believed to be operational at any given time (a ``network status''). Clients
  410. periodically downloaded a directory in order to learn the latest ORs and
  411. keys, and more frequently downloaded a network status to learn which ORs are
  412. likely to be running. ORs also operate as directory caches, in order to
  413. lighten the bandwidth on the authoritative directory servers.
  414. In order to prevent Sybil attacks (wherein an adversary signs up many
  415. purportedly independent servers in order to increase her chances of observing
  416. a stream as it enters and leaves the network), the early Tor directory design
  417. required the operators of the authoritative directory servers to manually
  418. approve new ORs. Unapproved ORs were included in the directory, but clients
  419. did not use them at the start or end of their circuits. In practice,
  420. directory administrators performed little actual verification, and tended to
  421. approve any OR whose operator could compose a coherent email. This procedure
  422. may have prevented trivial automated Sybil attacks, but would do little
  423. against a clever attacker.
  424. There are a number of flaws in this system that need to be addressed as we
  425. move forward. They include:
  426. \begin{tightlist}
  427. \item Each directory server represents an independent point of failure; if
  428. any one were compromised, it could immediately compromise all of its users
  429. by recommending only compromised ORs.
  430. \item The more servers appear join the network, the more unreasonable it
  431. becomes to expect clients to know about them all. Directories
  432. become unfeasibly large, and downloading the list of servers becomes
  433. burdonsome.
  434. \item The validation scheme may do as much harm as it does good. It is not
  435. only incapable of preventing clever attackers from mounting Sybil attacks,
  436. but may deter server operators from joining the network. (For instance, if
  437. they expect the validation process to be difficult, or if they do not share
  438. any languages in common with the directory server operators.)
  439. \end{tightlist}
  440. We could try to move the system in several directions, depending on our
  441. choice of threat model and requirements. If we did not need to increase
  442. network capacity in order to support more users, there would be no reason not
  443. to adopt even stricter validation requirements, and reduce the number of
  444. servers in the network to a trusted minimum. But since we want Tor to work
  445. for as many users as it can, we need XXXXX
  446. In order to address the first two issues, it seems wise to move to a system
  447. including a number of semi-trusted directory servers, no one of which can
  448. compromise a user on its own. Ultimately, of course, we cannot escape the
  449. problem of a first introducer: since most users will run Tor in whatever
  450. configuration the software ships with, the Tor distribution itself will
  451. remain a potential single point of failure so long as it includes the seed
  452. keys for directory servers, a list of directory servers, or any other means
  453. to learn which servers are on the network. But omitting this information
  454. from the Tor distribution would only delegate the trust problem to the
  455. individual users, most of whom are presumably less informed about how to make
  456. trust decisions than the Tor developers.
  457. %Network discovery, sybil, node admission, scaling. It seems that the code
  458. %will ship with something and that's our trust root. We could try to get
  459. %people to build a web of trust, but no. Where we go from here depends
  460. %on what threats we have in mind. Really decentralized if your threat is
  461. %RIAA; less so if threat is to application data or individuals or...
  462. Game theory for helper nodes: if Alice offers a hidden service on a
  463. server (enclave model), and nobody ever uses helper nodes, then against
  464. George+Steven's attack she's totally nailed. If only Alice uses a helper
  465. node, then she's still identified as the source of the data. If everybody
  466. uses a helper node (including Alice), then the attack identifies the
  467. helper node and also Alice, and knows which one is which. If everybody
  468. uses a helper node (but not Alice), then the attacker figures the real
  469. source was a client that is using Alice as a helper node. [How's my
  470. logic here?]
  471. in practice, sites like bloggers without borders (www.b19s.org) are
  472. running tor servers but more important are advertising a hidden-service
  473. address on their front page. doing this can provide increased robustness
  474. if they used the dual-IP approach we describe in tor-design, but in
  475. practice they do it to a) increase visibility of the tor project and their
  476. support for privacy, and b) to offer a way for their users, using vanilla
  477. software, to get end-to-end encryption and end-to-end authentication to
  478. their website.
  479. \section{Crossroads: Scaling}
  480. %\label{sec:crossroads-scaling}
  481. %P2P + anonymity issues:
  482. Tor is running today with hundreds of servers and tens of thousands of
  483. users, but it will certainly not scale to millions.
  484. Scaling Tor involves three main challenges. First is safe server
  485. discovery, both bootstrapping -- how a Tor client can robustly find an
  486. initial server list -- and ongoing -- how a Tor client can learn about
  487. a fair sample of honest servers and not let the adversary control his
  488. circuits (see Section x). Second is detecting and handling the speed
  489. and reliability of the variety of servers we must use if we want to
  490. accept many servers (see Section y).
  491. Since the speed and reliability of a circuit is limited by its worst link,
  492. we must learn to track and predict performance. Finally, in order to get
  493. a large set of servers in the first place, we must address incentives
  494. for users to carry traffic for others (see Section incentives).
  495. \subsection{Incentives}
  496. There are three behaviors we need to encourage for each server: relaying
  497. traffic; providing good throughput and reliability while doing it;
  498. and allowing traffic to exit the network from that server.
  499. We encourage these behaviors through \emph{indirect} incentives, that
  500. is, designing the system and educating users in such a way that users
  501. with certain goals will choose to relay traffic. In practice, the
  502. main incentive for running a Tor server is social benefit: volunteers
  503. altruistically donate their bandwidth and time. We also keep public
  504. rankings of the throughput and reliability of servers, much like
  505. seti@home. We further explain to users that they can get \emph{better
  506. security} by operating a server, because they get plausible deniability
  507. (indeed, they may not need to route their own traffic through Tor at all
  508. -- blending directly with other traffic exiting Tor may be sufficient
  509. protection for them), and because they can use their own Tor server
  510. as entry or exit point and be confident it's not run by the adversary.
  511. Finally, we can improve the usability and feature set of the software:
  512. rate limiting support and easy packaging decrease the hassle of
  513. maintaining a server, and our configurable exit policies allow each
  514. operator to advertise a policy describing the hosts and ports to which
  515. he feels comfortable connecting.
  516. Beyond these, however, there is also a need for \emph{direct} incentives:
  517. providing payment or other resources in return for high-quality service.
  518. Paying actual money is problematic: decentralized e-cash systems are
  519. not yet practical, and a centralized collection system not only reduces
  520. robustness, but also has failed in the past (the history of commercial
  521. anonymizing networks is littered with failed attempts). A more promising
  522. option is to use a tit-for-tat incentive scheme: provide better service
  523. to nodes that have provided good service to you.
  524. Unfortunately, such an approach introduces new anonymity problems.
  525. Does the incentive system enable the adversary to attract more traffic by
  526. performing well? Typically a user who chooses evenly from all options is
  527. most resistant to an adversary targetting him, but that approach prevents
  528. us from handling heterogeneous servers \cite{casc-rep}.
  529. When a server (call him Steve) performs well for Alice, does Steve gain
  530. reputation with the entire system, or just with Alice? If the entire
  531. system, how does Alice tell everybody about her experience in a way that
  532. prevents her from lying about it yet still protects her identity? If
  533. Steve's behavior only affects Alice's behavior, does this allow Steve to
  534. selectively perform only for Alice, and then break her anonymity later
  535. when somebody (presumably Alice) routes through his node?
  536. These are difficult and open questions, yet choosing not to scale means
  537. leaving most users to a less secure network or no anonymizing network
  538. at all. We will start with a simplified approach to the tit-for-tat
  539. incentive scheme based on two rules: (1) each node should measure the
  540. service it receives from adjacent nodes, and provide service relative to
  541. the received service, but (2) when a node is making decisions that affect
  542. its own security (e.g. when building a circuit for its own application
  543. connections), it should choose evenly from a sufficiently large set of
  544. nodes that meet some minimum service threshold. This approach allows us
  545. to discourage bad service without opening Alice up as much to attacks.
  546. %XXX rewrite the above so it sounds less like a grant proposal and
  547. %more like a "if somebody were to try to solve this, maybe this is a
  548. %good first step".
  549. %We should implement the above incentive scheme in the
  550. %deployed Tor network, in conjunction with our plans to add the necessary
  551. %associated scalability mechanisms. We will do experiments (simulated
  552. %and/or real) to determine how much the incentive system improves
  553. %efficiency over baseline, and also to determine how far we are from
  554. %optimal efficiency (what we could get if we ignored the anonymity goals).
  555. \subsection{Peer-to-peer / practical issues}
  556. Making use of servers with little bandwidth. How to handle hammering by
  557. certain applications.
  558. Handling servers that are far away from the rest of the network, e.g. on
  559. the continents that aren't North America and Europe. High latency,
  560. often high packet loss.
  561. Running Tor servers behind NATs, behind great-firewalls-of-China, etc.
  562. Restricted routes. How to propagate to everybody the topology? BGP
  563. style doesn't work because we don't want just *one* path. Point to
  564. Geoff's stuff.
  565. \subsection{ISP-class adversaries}
  566. Routing-zones. It seems that our threat model comes down to diversity and
  567. dispersal. But hard for Alice to know how to act. Many questions remain.
  568. \subsection{The China problem}
  569. We have lots of users in Iran and similar (we stopped
  570. logging, so it's hard to know now, but many Persian sites on how to use
  571. Tor), and they seem to be doing ok. But the China problem is bigger. Cite
  572. Stefan's paper, and talk about how we need to route through clients,
  573. and we maybe we should start with a time-release IP publishing system +
  574. advogato based reputation system, to bound the number of IPs leaked to the
  575. adversary.
  576. \section{The Future}
  577. \label{sec:conclusion}
  578. \bibliographystyle{plain} \bibliography{tor-design}
  579. \end{document}