tor-design.tex 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427
  1. \documentclass[times,10pt,twocolumn]{article}
  2. \usepackage{latex8}
  3. %\usepackage{times}
  4. \usepackage{url}
  5. \usepackage{graphics}
  6. \usepackage{amsmath}
  7. \pagestyle{empty}
  8. \renewcommand\url{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
  9. \newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
  10. % If an URL ends up with '%'s in it, that's because the line *in the .bib/.tex
  11. % file* is too long, so break it there (it doesn't matter if the next line is
  12. % indented with spaces). -DH
  13. %\newif\ifpdf
  14. %\ifx\pdfoutput\undefined
  15. % \pdffalse
  16. %\else
  17. % \pdfoutput=1
  18. % \pdftrue
  19. %\fi
  20. \begin{document}
  21. %% Use dvipdfm instead. --DH
  22. %\ifpdf
  23. % \pdfcompresslevel=9
  24. % \pdfpagewidth=\the\paperwidth
  25. % \pdfpageheight=\the\paperheight
  26. %\fi
  27. \title{Tor: Design of a Next-Generation Onion Router}
  28. \author{Anonymous}
  29. %\author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
  30. %Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
  31. %Paul Syverson \\ Naval Research Lab \\ syverson@itd.nrl.navy.mil}
  32. \maketitle
  33. \thispagestyle{empty}
  34. \begin{abstract}
  35. We present Tor, a connection-based low-latency anonymous communication
  36. system which addresses many limitations in the original onion routing design.
  37. Tor works in a real-world Internet environment,
  38. requires little synchronization or coordination between nodes, and
  39. protects against known anonymity-breaking attacks as well
  40. as or better than other systems with similar design parameters.
  41. \end{abstract}
  42. %\begin{center}
  43. %\textbf{Keywords:} anonymity, peer-to-peer, remailer, nymserver, reply block
  44. %\end{center}
  45. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  46. \Section{Overview}
  47. \label{sec:intro}
  48. Onion routing is a distributed overlay network designed to anonymize
  49. low-latency TCP-based applications such as web browsing, secure shell,
  50. and instant messaging. Users choose a path through the network and
  51. build a \emph{virtual circuit}, in which each node in the path knows its
  52. predecessor and successor, but no others. Traffic flowing down the circuit
  53. is sent in fixed-size \emph{cells}, which are unwrapped by a symmetric key
  54. at each node, revealing the downstream node. The original onion routing
  55. project published several design and analysis papers
  56. \cite{or-jsac98,or-discex00,or-ih96,or-pet02}. While there was briefly
  57. a network of about a dozen nodes at three widely distributed sites,
  58. the only long-running and publicly accessible
  59. implementation was a fragile proof-of-concept that ran on a single
  60. machine. Many critical design and deployment issues were never implemented,
  61. and the design has not been updated in several years.
  62. Here we describe Tor, a protocol for asynchronous, loosely
  63. federated onion routers that provides the following improvements over
  64. the old onion routing design:
  65. \begin{itemize}
  66. \item \textbf{Perfect forward secrecy:} The original onion routing
  67. design is vulnerable to a single hostile node recording traffic and later
  68. forcing successive nodes in the circuit to decrypt it. Rather than using
  69. onions to lay the circuits, Tor uses an incremental or \emph{telescoping}
  70. path-building design, where the initiator negotiates session keys with
  71. each successive hop in the circuit. Onion replay detection is no longer
  72. necessary, and the network as a whole is more reliable to boot, since
  73. the initiator knows which hop failed and can try extending to a new node.
  74. \item \textbf{Applications talk to the onion proxy via Socks:}
  75. The original onion routing design required a separate proxy for each
  76. supported application protocol, resulting in a lot of extra code (most
  77. of which was never written) and also meaning that a lot of TCP-based
  78. applications were not supported. Tor uses the unified and standard Socks
  79. \cite{socks4,socks5} interface, allowing us to support any TCP-based
  80. program without modification.
  81. \item \textbf{Many applications can share one circuit:} The original
  82. onion routing design built one circuit for each request. Aside from the
  83. performance issues of doing public key operations for every request, it
  84. also turns out that regular communications patterns mean building lots
  85. of circuits, which can endanger anonymity \cite{wright03}. [XXX Was this
  86. supposed to be Wright02 or Wright03. In any case I am hesitant to cite
  87. that work in this context. While the point is valid in general, that
  88. work is predicated on assumptions that I don't think typically apply
  89. to onion routing (whether old or new design).]
  90. Tor multiplexes many
  91. connections down each circuit, but still rotates the circuit periodically
  92. to avoid too much linkability.
  93. \item \textbf{No mixing or traffic shaping:} The original onion routing
  94. design called for full link padding both between onion routers and between
  95. onion proxies (that is, users) and onion routers \cite{or-jsac98}. The
  96. later analysis paper \cite{or-pet02} suggested \emph{traffic shaping}
  97. to provide similar protection but use less bandwidth, but did not go
  98. into detail. However, recent research \cite{econymics} and deployment
  99. experience \cite{freedom} indicate that this level of resource
  100. use is not practical or economical; and even full link padding is still
  101. vulnerable to active attacks \cite{defensive-dropping}. [XXX what is being
  102. referenced here, Dogan?]
  103. \item \textbf{Leaky pipes:} Through in-band signalling within the circuit,
  104. Tor initiators can direct traffic to nodes partway down the circuit. This
  105. allows for long-range padding to frustrate timing attacks at the initiator
  106. \cite{defensive-dropping}, but because circuits are used by more than
  107. one application, it also allows traffic to exit the circuit from the
  108. middle -- thus frustrating timing attacks based on observing exit points.
  109. %Or something like that. hm.
  110. \item \textbf{Congestion control:} Earlier anonymity designs do not
  111. address traffic bottlenecks. Unfortunately, typical approaches to load
  112. balancing and flow control in overlay networks involve inter-node control
  113. communication and global views of traffic. Our decentralized ack-based
  114. congestion control maintains reasonable anonymity while allowing nodes
  115. at the edges of the network to detect congestion or flooding attacks
  116. and send less data until the congestion subsides.
  117. \item \textbf{Directory servers:} Rather than attempting to flood
  118. link-state information through the network, which can be unreliable and
  119. open to partitioning attacks or outright deception, Tor takes a simplified
  120. view towards distributing link-state information. Certain more trusted
  121. onion routers also serve as directory servers; they provide signed
  122. \emph{directories} describing all routers they know about, and which
  123. are currently up. Users periodically download these directories via HTTP.
  124. \item \textbf{End-to-end integrity checking:} Without integrity checking
  125. on traffic going through the network, an onion router can change the
  126. contents of cells as they pass by, e.g. by redirecting a connection on
  127. the fly so it connects to a different webserver, or by tagging encrypted
  128. traffic and looking for traffic at the network edges that has been
  129. tagged \cite{minion-design}.
  130. \item \textbf{Robustness to node failure:} router twins
  131. \item \textbf{Exit policies:}
  132. Tor provides a consistent mechanism for each node to specify and
  133. advertise an exit policy.
  134. \item \textbf{Rendezvous points:}
  135. location-protected servers
  136. \end{itemize}
  137. We review previous work in Section \ref{sec:background}, describe
  138. our goals and assumptions in Section \ref{sec:assumptions},
  139. and then address the above list of improvements in Sections
  140. \ref{sec:design}-\ref{sec:maintaining-anonymity}. We then summarize
  141. how our design stands up to known attacks, and conclude with a list of
  142. open problems.
  143. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  144. \Section{Background and threat model}
  145. \label{sec:background}
  146. \SubSection{Related work}
  147. \label{sec:related-work}
  148. Modern anonymity designs date to Chaum's Mix-Net\cite{chaum-mix} design of
  149. 1981. Chaum proposed hiding sender-recipient connections by wrapping
  150. messages in several layers of public key cryptography, and relaying them
  151. through a path composed of Mix servers. Mix servers in turn decrypt, delay,
  152. and re-order messages, before relay them along the path towards their
  153. destinations.
  154. Subsequent relay-based anonymity designs have diverged in two
  155. principal directions. Some have attempted to maximize anonymity at
  156. the cost of introducing comparatively large and variable latencies,
  157. for example, Babel\cite{babel}, Mixmaster\cite{mixmaster-spec}, and
  158. Mixminion\cite{minion-design}. Because of this
  159. decision, such \emph{high-latency} networks are well-suited for anonymous
  160. email, but introduce too much lag for interactive tasks such as web browsing,
  161. internet chat, or SSH connections.
  162. Tor belongs to the second category: \emph{low-latency} designs that
  163. attempt to anonymize interactive network traffic. Because such
  164. traffic tends to involve a relatively large numbers of packets, it is
  165. difficult to prevent an attacker who can eavesdrop entry and exit
  166. points from correlating packets entering the anonymity network with
  167. packets leaving it. Although some work has been done to frustrate
  168. these attacks, most designs protect primarily against traffic analysis
  169. rather than traffic confirmation \cite{or-jsac98}. One can pad and
  170. limit communication to a constant rate or at least to control the
  171. variation in traffic shape. This can have prohibitive bandwidth costs
  172. and/or performance limitations. One can also use a cascade (fixed
  173. shared route) with a relatively fixed set of users. This assumes a
  174. degree of agreement and provides an easier target for an active
  175. attacker since the endpoints are generally known. However, a practical
  176. network with both of these features has been run for many years
  177. \cite{web-mix}.
  178. they still...
  179. [XXX go on to explain how the design choices implied in low-latency result in
  180. significantly different designs.]
  181. The simplest low-latency designs are single-hop proxies such as the
  182. Anonymizer \cite{anonymizer}, wherein a single trusted server removes
  183. identifying users' data before relaying it. These designs are easy to
  184. analyze, but require end-users to trust the anonymizing proxy.
  185. More complex are distributed-trust, channel-based anonymizing systems. In
  186. these designs, a user establishes one or more medium-term bidirectional
  187. end-to-end tunnels to exit servers, and uses those tunnels to deliver a
  188. number of low-latency packets to and from one or more destinations per
  189. tunnel. Establishing tunnels is comparatively expensive and typically
  190. requires public-key cryptography, whereas relaying packets along a tunnel is
  191. comparatively inexpensive. Because a tunnel crosses several servers, no
  192. single server can learn the user's communication partners.
  193. Systems such as earlier versions of Freedom and onion routing
  194. build the anonymous channel all at once (using an onion). Later
  195. designs of each of these build the channel in stages as does AnonNet
  196. \cite{anonnet}. Amongst other things, this makes perfect forward
  197. secrecy feasible.
  198. Some systems, such as Crowds \cite{crowds-tissec}, do not rely on the
  199. changing appearance of packets to hide the path; rather they employ
  200. mechanisms so that an intermediary cannot be sure when it is
  201. receiving/sending to the ultimate initiator. There is no public-key
  202. encryption needed for Crowds, but the responder and all data are
  203. visible to all nodes on the path so that anonymity of connection
  204. initiator depends on filtering all identifying information from the
  205. data stream. Crowds is also designed only for HTTP traffic.
  206. Hordes \cite{hordes-jcs} is based on Crowds but also uses multicast
  207. responses to hide the initiator. Some systems go even further
  208. requiring broadcast \cite{herbivore,p5} although tradeoffs are made to
  209. make this more practical. Both Herbivore and P5 are designed primarily
  210. for communication between communicating peers, although Herbivore
  211. permits external connections by requesting a peer to serve as a proxy.
  212. Allowing easy connections to nonparticipating responders or recipients
  213. is a practical requirement for many users, e.g., to visit
  214. nonparticipating Web sites or to send mail to nonparticipating
  215. recipients.
  216. Distributed-trust anonymizing systems differ in how they prevent attackers
  217. from controlling too many servers and thus compromising too many user paths.
  218. Some protocols rely on a centrally maintained set of well-known anonymizing
  219. servers. Others (such as Tarzan and MorphMix) allow unknown users to run
  220. servers, while using a limited resource (DHT space for Tarzan; IP space for
  221. MorphMix) to prevent an attacker from owning too much of the network.
  222. [XXX what else? What does (say) crowds do?]
  223. All of the above systems Several systems with varying design goals
  224. and capabilities but all of which require that communicants be
  225. intentionally participating are mentioned here.
  226. Some involve multicast or more to work
  227. herbivore
  228. There are also many systems which are intended for anonymous
  229. and/or censorship resistant file sharing. [XXX Should we list all these
  230. or just say it's out of scope for the paper?
  231. eternity, gnunet, freenet, freehaven, publius, tangler, taz/rewebber]
  232. [XXX Should we add a paragraph dividing servers by all-at-once approach to
  233. tunnel-building (OR1,Freedom1) versus piecemeal approach
  234. (OR2,Anonnet?,Freedom2) ?]
  235. Channel-based anonymizing systems also differ in their use of dummy traffic.
  236. [XXX]
  237. Finally, several systems provide low-latency anonymity without channel-based
  238. communication. Crowds and [XXX] provide anonymity for HTTP requests; [...]
  239. [XXX Mention error recovery?]
  240. Web-MIXes \cite{web-mix} (also known as the Java Anon Proxy or JAP)
  241. use a cascade architecture with relatively constant groups of users
  242. sending and receiving at a constant rate.
  243. Some, such as Crowds \cite{crowds-tissec}, do nothing against such
  244. confirmation but still make it difficult for nodes along a connection to
  245. perform timing confirmations that would more easily identify when
  246. the immediate predecessor is the initiator of a connection, which in
  247. Crowds would reveal both initiator and responder to the attacker.
  248. anonymizer
  249. pipenet
  250. freedom v1
  251. freedom v2
  252. onion routing v1
  253. isdn-mixes
  254. crowds
  255. real-time mixes, web mixes
  256. anonnet (marc rennhard's stuff)
  257. morphmix
  258. P5
  259. gnunet
  260. rewebbers
  261. tarzan
  262. herbivore
  263. hordes
  264. cebolla (?)
  265. [XXX Close by mentioning where Tor fits.]
  266. \SubSection{Our threat model}
  267. \label{subsec:threat-model}
  268. \SubSection{Known attacks against low-latency anonymity systems}
  269. \label{subsec:known-attacks}
  270. We discuss each of these attacks in more detail below, along with the
  271. aspects of the Tor design that provide defense. We provide a summary
  272. of the attacks and our defenses against them in Section \ref{sec:attacks}.
  273. Passive attacks:
  274. simple observation,
  275. timing correlation,
  276. size correlation,
  277. option distinguishability,
  278. Active attacks:
  279. key compromise,
  280. iterated subpoena,
  281. run recipient,
  282. run a hostile node,
  283. compromise entire path,
  284. selectively DOS servers,
  285. introduce timing into messages,
  286. directory attacks,
  287. tagging attacks
  288. \Section{Design goals and assumptions}
  289. \label{sec:assumptions}
  290. [XXX Perhaps the threat model belongs here.]
  291. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  292. \Section{The Tor Design}
  293. \label{sec:design}
  294. \Section{Other design decisions}
  295. \SubSection{Exit policies and abuse}
  296. \label{subsec:exitpolicies}
  297. \SubSection{Directory Servers}
  298. \label{subsec:dir-servers}
  299. \Section{Rendezvous points: pseudonyms with responder anonymity}
  300. \label{sec:rendezvous}
  301. \Section{Maintaining anonymity sets}
  302. \label{sec:maintaining-anonymity}
  303. \SubSection{Using a circuit many times}
  304. \label{subsec:many-messages}
  305. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  306. \Section{Attacks and Defenses}
  307. \label{sec:attacks}
  308. Below we summarize a variety of attacks and how well our design withstands
  309. them.
  310. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  311. \Section{Future Directions and Open Problems}
  312. \label{sec:conclusion}
  313. Tor brings together many innovations from many different projects into
  314. a unified deployable system. But there are still several attacks that
  315. work quite well, as well as a number of sustainability and run-time
  316. issues remaining to be ironed out. In particular:
  317. \begin{itemize}
  318. \item foo
  319. \end{itemize}
  320. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  321. \Section{Acknowledgments}
  322. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  323. \bibliographystyle{latex8}
  324. \bibliography{tor-design}
  325. \end{document}
  326. % Style guide:
  327. % U.S. spelling
  328. % avoid contractions (it's, can't, etc.)
  329. % 'mix', 'mixes' (as noun)
  330. % 'mix-net'
  331. % 'mix', 'mixing' (as verb)
  332. % 'Mixminion Project'
  333. % 'Mixminion' (meaning the protocol suite or the network)
  334. % 'Mixmaster' (meaning the protocol suite or the network)
  335. % 'middleman' [Not with a hyphen; the hyphen has been optional
  336. % since Middle English.]
  337. % 'nymserver'
  338. % 'Cypherpunk', 'Cypherpunks', 'Cypherpunk remailer'
  339. %
  340. % 'Whenever you are tempted to write 'Very', write 'Damn' instead, so
  341. % your editor will take it out for you.' -- Misquoted from Mark Twain