challenges.tex 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329
  1. \documentclass{llncs}
  2. \usepackage{url}
  3. \usepackage{amsmath}
  4. \usepackage{epsfig}
  5. \newenvironment{tightlist}{\begin{list}{$\bullet$}{
  6. \setlength{\itemsep}{0mm}
  7. \setlength{\parsep}{0mm}
  8. % \setlength{\labelsep}{0mm}
  9. % \setlength{\labelwidth}{0mm}
  10. % \setlength{\topsep}{0mm}
  11. }}{\end{list}}
  12. \begin{document}
  13. \title{Challenges in practical low-latency stream anonymity (DRAFT)}
  14. \author{Roger Dingledine and Nick Mathewson}
  15. \institute{The Free Haven Project\\
  16. \email{\{arma,nickm\}@freehaven.net}}
  17. \maketitle
  18. \pagestyle{empty}
  19. \begin{abstract}
  20. foo
  21. \end{abstract}
  22. \section{Introduction}
  23. Tor is a low-latency anonymous communication overlay network
  24. \cite{tor-design} designed to be practical and usable for securing TCP
  25. streams over the Internet. We have been operating a publicly deployed
  26. Tor network since October 2003.
  27. Tor aims to resist observers and insiders by distributing each transaction
  28. over several nodes in the network. This ``distributed trust'' approach
  29. means the Tor network can be safely operated and used by a wide variety
  30. of mutually distrustful users, providing more sustainability and security
  31. than previous attempts at anonymizing networks.
  32. The Tor network has a broad range of users, including ordinary citizens
  33. who want to avoid being profiled for targeted advertisements, corporations
  34. who don't want to reveal information to their competitors, and law
  35. enforcement and government intelligence agencies who need
  36. to do operations on the Internet without being noticed.
  37. Tor has been funded by the U.S. Navy, for use in securing government
  38. communications, and also by the Electronic Frontier Foundation, for use
  39. in maintaining civil liberties for ordinary citizens online. The Tor
  40. protocol is one of the leading choices
  41. to be the anonymizing layer in the European Union's PRIME directive to
  42. help maintain privacy in Europe. The University of Dresden in Germany
  43. has integrated an independent implementation of the Tor protocol into
  44. their popular Java Anon Proxy anonymizing client. This wide variety of
  45. interests helps maintain both the stability and the security of the
  46. network.
  47. Tor has a weaker threat model than many anonymity designs in the
  48. literature. This is because we our primary requirements are to have a
  49. practical and useful network, and from there we aim to provide as much
  50. anonymity as we can.
  51. %need to discuss how we take the approach of building the thing, and then
  52. %assuming that, how much anonymity can we get. we're not here to model or
  53. %to simulate or to produce equations and formulae. but those have their
  54. %roles too.
  55. This paper aims to give the reader enough information to understand the
  56. technical and policy issues that Tor faces as we continue deployment,
  57. and to lay a research agenda for others to help in addressing some of
  58. these issues. Section \ref{sec:what-is-tor} gives an overview of the Tor
  59. design and ours goals. We go on in Section \ref{sec:related} to describe
  60. Tor's context in the anonymity space. Sections \ref{sec:crossroads-policy}
  61. and \ref{sec:crossroads-technical} describe the practical challenges,
  62. both policy and technical respectively, that stand in the way of moving
  63. from a practical useful network to a practical useful anonymous network.
  64. \section{What Is Tor}
  65. \label{sec:what-is-tor}
  66. \subsection{Distributed trust: safety in numbers}
  67. Tor provides \emph{forward privacy}, so that users can connect to
  68. Internet sites without revealing their logical or physical locations
  69. to those sites or to observers. It also provides \emph{location-hidden
  70. services}, so that critical servers can support authorized users without
  71. giving adversaries an effective vector for physical or online attacks.
  72. Our design provides this protection even when a portion of its own
  73. infrastructure is controlled by an adversary.
  74. To make private connections in Tor, users incrementally build a path or
  75. \emph{circuit} of encrypted connections through servers on the network,
  76. extending it one step at a time so that each server in the circuit only
  77. learns which server extended to it and which server it has been asked
  78. to extend to. The client negotiates a separate set of encryption keys
  79. for each step along the circuit.
  80. Once a circuit has been established, the client software waits for
  81. applications to request TCP connections, and directs these application
  82. streams along the circuit. Many streams can be multiplexed along a single
  83. circuit, so applications don't need to wait for keys to be negotiated
  84. every time they open a connection. Because each server sees no
  85. more than one end of the connection, a local eavesdropper or a compromised
  86. server cannot use traffic analysis to link the connection's source and
  87. destination. The Tor client software rotates circuits periodically
  88. to prevent long-term linkability between different actions by a
  89. single user.
  90. Tor differs from other deployed systems for traffic analysis resistance
  91. in its security and flexibility. Mix networks such as Mixmaster or its
  92. successor Mixminion \cite{minion-design}
  93. gain the highest degrees of anonymity at the expense of introducing highly
  94. variable delays, thus making them unsuitable for applications such as web
  95. browsing that require quick response times. Commercial single-hop proxies
  96. such as {\url{anonymizer.com}} present a single point of failure, where
  97. a single compromise can expose all users' traffic, and a single-point
  98. eavesdropper can perform traffic analysis on the entire network.
  99. Also, their proprietary implementations place any infrastucture that
  100. depends on these single-hop solutions at the mercy of their providers'
  101. financial health. Tor can handle any TCP-based protocol, such as web
  102. browsing, instant messaging and chat, and secure shell login; and it is
  103. the only implemented anonymizing design with an integrated system for
  104. secure location-hidden services.
  105. No organization can achieve this security on its own. If a single
  106. corporation or government agency were to build a private network to
  107. protect its operations, any connections entering or leaving that network
  108. would be obviously linkable to the controlling organization. The members
  109. and operations of that agency would be easier, not harder, to distinguish.
  110. Instead, to protect our networks from traffic analysis, we must
  111. collaboratively blend the traffic from many organizations and private
  112. citizens, so that an eavesdropper can't tell which users are which,
  113. and who is looking for what information. By bringing more users onto
  114. the network, all users become more secure \cite{econymics}.
  115. Naturally, organizations will not want to depend on others for their
  116. security. If most participating providers are reliable, Tor tolerates
  117. some hostile infiltration of the network. For maximum protection,
  118. the Tor design includes an enclave approach that lets data be encrypted
  119. (and authenticated) end-to-end, so high-sensitivity users can be sure it
  120. hasn't been read or modified. This even works for Internet services that
  121. don't have built-in encryption and authentication, such as unencrypted
  122. HTTP or chat, and it requires no modification of those services to do so.
  123. weasel's graph of \# nodes and of bandwidth, ideally from week 0.
  124. Tor has the following goals.
  125. and we made these assumptions when trying to design the thing.
  126. \section{Tor's position in the anonymity field}
  127. \label{sec:related}
  128. There are many other classes of systems: single-hop proxies, open proxies,
  129. jap, mixminion, flash mixes, freenet, i2p, mute/ants/etc, tarzan,
  130. morphmix, freedom. Give brief descriptions and brief characterizations
  131. of how we differ. This is not the breakthrough stuff and we only have
  132. a page or two for it.
  133. have a serious discussion of morphmix's assumptions, since they would
  134. seem to be the direct competition. in fact tor is a flexible architecture
  135. that would encompass morphmix, and they're nearly identical except for
  136. path selection and node discovery. and the trust system morphmix has
  137. seems overkill (and/or insecure) based on the threat model we've picked.
  138. \section{Crossroads: Policy issues}
  139. \label{sec:crossroads-policy}
  140. Bittorrent and dmca. Should we add an IDS to autodetect protocols and
  141. snipe them? Takedowns and efnet abuse and wikipedia complaints and irc
  142. networks. Should we allow revocation of anonymity if a threshold of
  143. servers want to?
  144. Image: substantial non-infringing uses. Image is a security parameter,
  145. since it impacts user base and perceived sustainability.
  146. good uses are kept private, bad uses are publicized. not good.
  147. Sustainability. Previous attempts have been commercial which we think
  148. adds a lot of unnecessary complexity and accountability. Freedom didn't
  149. collect enough money to pay its servers; JAP bandwidth is supported by
  150. continued money, and they periodically ask what they will do when it
  151. dries up.
  152. How much should Tor aim to do? Applications that leak data. We can say
  153. they're not our problem, but they're somebody's problem.
  154. Logging. Making logs not revealing. A happy coincidence that verbose
  155. logging is our \#2 performance bottleneck. Is there a way to detect
  156. modified servers, or to have them volunteer the information that they're
  157. logging verbosely? Would that actually solve any attacks?
  158. \section{Crossroads: Scaling and Design choices}
  159. \label{sec:crossroads-design}
  160. \subsection{Transporting the stream vs transporting the packets}
  161. We periodically run into ZKS people who tell us that the process of
  162. anonymizing IPs should ``obviously'' be done at the IP layer. Here are
  163. the issues that need to be resolved before we'll be ready to switch Tor
  164. over to arbitrary IP traffic.
  165. 1: we still need to do IP-level packet normalization, to stop things
  166. like ip fingerprinting. This is doable.
  167. 2: we still need to be easy to integrate with user-level applications,
  168. so they can do application-level scrubbing. So we will still need
  169. application-specific proxies.
  170. 3: we need a block-level encryption approach that can provide security despite
  171. packet loss and out-of-order delivery. Freedom allegedly had one, but it was
  172. never publicly specified. (We also believe that the Freedom and Cebolla designs
  173. are vulnerable to tagging attacks.)
  174. 4: we still need to play with parameters for throughput, congestion control,
  175. etc -- since we need sequence numbers and maybe more to do replay detection,
  176. and just to handle duplicate frames. so we would be reimplementing some subset of tcp
  177. anyway.
  178. 5: tls over udp is not implemented or even specified.
  179. 6: exit policies over arbitrary IP packets seems to be an IDS-hard problem. i
  180. don't want to build an IDS into tor.
  181. 7: certain protocols are going to leak information at the IP layer anyway. for
  182. example, if we anonymizer your dns requests, but they still go to comcast's dns servers,
  183. that's bad.
  184. 8: hidden services, .exit addresses, etc are broken unless we have some way to
  185. reach into the application-level protocol and decide the hostname it's trying to get.
  186. \subsection{Mid-latency}
  187. Mid-latency. Can we do traffic shape to get any defense against George's
  188. PET2004 paper? Will padding or long-range dummies do anything then? Will
  189. it kill the user base or can we get both approaches to play well together?
  190. %\subsection{The DNS problem in practice}
  191. \subsection{Measuring performance and capacity}
  192. How to measure performance without letting people selectively deny service
  193. by distinguishing pings. Heck, just how to measure performance at all. In
  194. practice people have funny firewalls that don't match up to their exit
  195. policies and Tor doesn't deal.
  196. Network investigation: Is all this bandwidth publishing thing a good idea?
  197. How can we collect stats better? Note weasel's smokeping, at
  198. http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
  199. which probably gives george and steven enough info to break tor?
  200. \subsection{Plausible deniability}
  201. Does running a server help you or harm you? George's Oakland attack.
  202. Plausible deniability -- without even running your traffic through Tor! We
  203. have to pick the path length so adversary can't distinguish client from
  204. server (how many hops is good?).
  205. \subsection{Helper nodes}
  206. When does fixing your entry or exit node help you?
  207. Helper nodes in the literature don't deal with churn, and
  208. especially active attacks to induce churn.
  209. Do general DoS attacks have anonymity implications? See e.g. Adam
  210. Back's IH paper, but I think there's more to be pointed out here.
  211. \subsection{Location-hidden services}
  212. Survivable services are new in practice, yes? Hidden services seem
  213. less hidden than we'd like, since they stay in one place and get used
  214. a lot. They're the epitome of the need for helper nodes. This means
  215. that using Tor as a building block for Free Haven is going to be really
  216. hard. Also, they're brittle in terms of intersection and observation
  217. attacks. Would be nice to have hot-swap services, but hard to design.
  218. %\section{Crossroads: Scaling}
  219. %\label{sec:crossroads-scaling}
  220. %P2P + anonymity issues:
  221. Incentives. Copy the page I wrote for the NSF proposal, and maybe extend
  222. it if we're feeling smart.
  223. Usability: fc03 paper was great, except the lower latency you are the
  224. less useful it seems it is.
  225. A Tor gui, how jap's gui is nice but does not reflect the security
  226. they provide.
  227. Public perception, and thus advertising, is a security parameter.
  228. Peer-to-peer / practical issues:
  229. Network discovery, sybil, node admission, scaling. It seems that the code
  230. will ship with something and that's our trust root. We could try to get
  231. people to build a web of trust, but no. Where we go from here depends
  232. on what threats we have in mind. Really decentralized if your threat is
  233. RIAA; less so if threat is to application data or individuals or...
  234. Making use of servers with little bandwidth. How to handle hammering by
  235. certain applications.
  236. Handling servers that are far away from the rest of the network, e.g. on
  237. the continents that aren't North America and Europe. High latency,
  238. often high packet loss.
  239. Running Tor servers behind NATs, behind great-firewalls-of-China, etc.
  240. Restricted routes. How to propagate to everybody the topology? BGP
  241. style doesn't work because we don't want just *one* path. Point to
  242. Geoff's stuff.
  243. Routing-zones. It seems that our threat model comes down to diversity and
  244. dispersal. But hard for Alice to know how to act. Many questions remain.
  245. The China problem. We have lots of users in Iran and similar (we stopped
  246. logging, so it's hard to know now, but many Persian sites on how to use
  247. Tor), and they seem to be doing ok. But the China problem is bigger. Cite
  248. Stefan's paper, and talk about how we need to route through clients,
  249. and we maybe we should start with a time-release IP publishing system +
  250. advogato based reputation system, to bound the number of IPs leaked to the
  251. adversary.
  252. \section{The Future}
  253. \label{sec:conclusion}
  254. \bibliographystyle{plain} \bibliography{tor-design}
  255. \end{document}