123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524 |
- \documentclass{llncs}
- \usepackage{url}
- \usepackage{amsmath}
- \usepackage{epsfig}
- \newenvironment{tightlist}{\begin{list}{$\bullet$}{
- \setlength{\itemsep}{0mm}
- \setlength{\parsep}{0mm}
-
-
-
- }}{\end{list}}
- \begin{document}
- \title{Challenges in practical low-latency stream anonymity (DRAFT)}
- \author{Roger Dingledine and Nick Mathewson}
- \institute{The Free Haven Project\\
- \email{\{arma,nickm\}@freehaven.net}}
- \maketitle
- \pagestyle{empty}
- \begin{abstract}
- foo
- \end{abstract}
- \section{Introduction}
- Tor is a low-latency anonymous communication overlay network
- \cite{tor-design} designed to be practical and usable for securing TCP
- streams over the Internet. We have been operating a publicly deployed
- Tor network since October 2003.
- Tor aims to resist observers and insiders by distributing each transaction
- over several nodes in the network. This ``distributed trust'' approach
- means the Tor network can be safely operated and used by a wide variety
- of mutually distrustful users, providing more sustainability and security
- than previous attempts at anonymizing networks.
- The Tor network has a broad range of users, including ordinary citizens
- who want to avoid being profiled for targeted advertisements, corporations
- who don't want to reveal information to their competitors, and law
- enforcement and government intelligence agencies who need
- to do operations on the Internet without being noticed.
- Tor has been funded by the U.S. Navy, for use in securing government
- communications, and also by the Electronic Frontier Foundation, for use
- in maintaining civil liberties for ordinary citizens online. The Tor
- protocol is one of the leading choices
- to be the anonymizing layer in the European Union's PRIME directive to
- help maintain privacy in Europe. The University of Dresden in Germany
- has integrated an independent implementation of the Tor protocol into
- their popular Java Anon Proxy anonymizing client. This wide variety of
- interests helps maintain both the stability and the security of the
- network.
- Tor has a weaker threat model than many anonymity designs in the
- literature. This is because we our primary requirements are to have a
- practical and useful network, and from there we aim to provide as much
- anonymity as we can.
- This paper aims to give the reader enough information to understand the
- technical and policy issues that Tor faces as we continue deployment,
- and to lay a research agenda for others to help in addressing some of
- these issues. Section \ref{sec:what-is-tor} gives an overview of the Tor
- design and ours goals. We go on in Section \ref{sec:related} to describe
- Tor's context in the anonymity space. Sections \ref{sec:crossroads-policy}
- and \ref{sec:crossroads-technical} describe the practical challenges,
- both policy and technical respectively, that stand in the way of moving
- from a practical useful network to a practical useful anonymous network.
- \section{What Is Tor}
- \label{sec:what-is-tor}
- \subsection{Distributed trust: safety in numbers}
- Tor provides \emph{forward privacy}, so that users can connect to
- Internet sites without revealing their logical or physical locations
- to those sites or to observers. It also provides \emph{location-hidden
- services}, so that critical servers can support authorized users without
- giving adversaries an effective vector for physical or online attacks.
- Our design provides this protection even when a portion of its own
- infrastructure is controlled by an adversary.
- To make private connections in Tor, users incrementally build a path or
- \emph{circuit} of encrypted connections through servers on the network,
- extending it one step at a time so that each server in the circuit only
- learns which server extended to it and which server it has been asked
- to extend to. The client negotiates a separate set of encryption keys
- for each step along the circuit.
- Once a circuit has been established, the client software waits for
- applications to request TCP connections, and directs these application
- streams along the circuit. Many streams can be multiplexed along a single
- circuit, so applications don't need to wait for keys to be negotiated
- every time they open a connection. Because each server sees no
- more than one end of the connection, a local eavesdropper or a compromised
- server cannot use traffic analysis to link the connection's source and
- destination. The Tor client software rotates circuits periodically
- to prevent long-term linkability between different actions by a
- single user.
- Tor differs from other deployed systems for traffic analysis resistance
- in its security and flexibility. Mix networks such as Mixmaster or its
- successor Mixminion \cite{minion-design}
- gain the highest degrees of anonymity at the expense of introducing highly
- variable delays, thus making them unsuitable for applications such as web
- browsing that require quick response times. Commercial single-hop proxies
- such as {\url{anonymizer.com}} present a single point of failure, where
- a single compromise can expose all users' traffic, and a single-point
- eavesdropper can perform traffic analysis on the entire network.
- Also, their proprietary implementations place any infrastucture that
- depends on these single-hop solutions at the mercy of their providers'
- financial health. Tor can handle any TCP-based protocol, such as web
- browsing, instant messaging and chat, and secure shell login; and it is
- the only implemented anonymizing design with an integrated system for
- secure location-hidden services.
- No organization can achieve this security on its own. If a single
- corporation or government agency were to build a private network to
- protect its operations, any connections entering or leaving that network
- would be obviously linkable to the controlling organization. The members
- and operations of that agency would be easier, not harder, to distinguish.
- Instead, to protect our networks from traffic analysis, we must
- collaboratively blend the traffic from many organizations and private
- citizens, so that an eavesdropper can't tell which users are which,
- and who is looking for what information. By bringing more users onto
- the network, all users become more secure \cite{econymics}.
- Naturally, organizations will not want to depend on others for their
- security. If most participating providers are reliable, Tor tolerates
- some hostile infiltration of the network. For maximum protection,
- the Tor design includes an enclave approach that lets data be encrypted
- (and authenticated) end-to-end, so high-sensitivity users can be sure it
- hasn't been read or modified. This even works for Internet services that
- don't have built-in encryption and authentication, such as unencrypted
- HTTP or chat, and it requires no modification of those services to do so.
- weasel's graph of \# nodes and of bandwidth, ideally from week 0.
- Tor has the following goals.
- and we made these assumptions when trying to design the thing.
- \section{Tor's position in the anonymity field}
- \label{sec:related}
- There are many other classes of systems: single-hop proxies, open proxies,
- jap, mixminion, flash mixes, freenet, i2p, mute/ants/etc, tarzan,
- morphmix, freedom. Give brief descriptions and brief characterizations
- of how we differ. This is not the breakthrough stuff and we only have
- a page or two for it.
- have a serious discussion of morphmix's assumptions, since they would
- seem to be the direct competition. in fact tor is a flexible architecture
- that would encompass morphmix, and they're nearly identical except for
- path selection and node discovery. and the trust system morphmix has
- seems overkill (and/or insecure) based on the threat model we've picked.
- \section{Threat model}
- discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning
- the last hop is not c/n since that doesn't take the destination (website)
- into account. so in cases where the adversary does not also control the
- final destination we're in good shape, but if he *does* then we'd be better
- off with a system that lets each hop choose a path.
- in practice tor's threat model is based entirely on the goal of dispersal
- and diversity. george and steven describe an attack \cite{draft} that
- lets them determine the nodes used in a circuit; yet they can't identify
- alice or bob through this attack. so it's really just the endpoints that
- remain secure. see \ref{subsec:routing-zones} for discussion of larger
- adversaries and our dispersal goals.
- \section{Crossroads: Policy issues}
- \label{sec:crossroads-policy}
- Many of the issues the Tor project needs to address are not just a
- matter of system design or technology development. In particular, the
- Tor project's \emph{image} with respect to its users and the rest of
- the Internet impacts the security it can provide.
- As an example to motivate this section, some U.S.~Department of Enery
- penetration testing engineers are tasked with compromising DoE computers
- from the outside. They only have a limited number of ISPs from which to
- launch their attacks, and they found that the defenders were recognizing
- attacks because they came from the same IP space. These engineers wanted
- to use Tor to hide their tracks. First, from a technical standpoint,
- Tor does not support the variety of IP packets they would like to use in
- such attacks (see Section \ref{subsec:ip-vs-tcp}). But aside from this,
- we also decided that it would probably be poor precedent to encourage
- such use -- even legal use that improves national security -- and managed
- to dissuade them.
- With this image issue in mind, here we discuss the Tor user base and
- Tor's interaction with other services on the Internet.
- \subsection{Usability}
- Usability: fc03 paper was great, except the lower latency you are the
- less useful it seems it is.
- A Tor gui, how jap's gui is nice but does not reflect the security
- they provide.
- Public perception, and thus advertising, is a security parameter.
- \subsection{Image, usability, and sustainability}
- Image: substantial non-infringing uses. Image is a security parameter,
- since it impacts user base and perceived sustainability.
- Sustainability. Previous attempts have been commercial which we think
- adds a lot of unnecessary complexity and accountability. Freedom didn't
- collect enough money to pay its servers; JAP bandwidth is supported by
- continued money, and they periodically ask what they will do when it
- dries up.
- good uses are kept private, bad uses are publicized. not good.
- \subsection{Reputability}
- Yet another factor in the safety of a given network is its reputability:
- the perception of its social value based on its current users. If I'm
- the only user of a system, it might be socially accepted, but I'm not
- getting any anonymity. Add a thousand Communists, and I'm anonymous,
- but everyone thinks I'm a Commie. Add a thousand random citizens (cancer
- survivors, privacy enthusiasts, and so on) and now I'm hard to profile.
- The more cancer survivors on Tor, the better for the human rights
- activists. The more script kiddies, the worse for the normal users. Thus,
- reputability is an anonymity issue for two reasons. First, it impacts
- the sustainability of the network: a network that's always about to be
- shut down has difficulty attracting and keeping users, so its anonymity
- set suffers. Second, a disreputable network attracts the attention of
- powerful attackers who may not mind revealing the identities of all the
- users to uncover the few bad ones.
- While people therefore have an incentive for the network to be used for
- ``more reputable'' activities than their own, there are still tradeoffs
- involved when it comes to anonymity. To follow the above example, a
- network used entirely by cancer survivors might welcome some Communists
- onto the network, though of course they'd prefer a wider variety of users.
- The impact of public perception on security is especially important
- during the bootstrapping phase of the network, where the first few
- widely publicized uses of the network can dictate the types of users it
- attracts next.
- \subsection{Tor and file-sharing}
- Bittorrent and dmca. Should we add an IDS to autodetect protocols and
- snipe them?
- \subsection{Tor and blacklists}
- Takedowns and efnet abuse and wikipedia complaints and irc
- networks.
- \subsection{Other}
- Tor's scope: How much should Tor aim to do? Applications that leak
- data. We can say they're not our problem, but they're somebody's problem.
- Should we allow revocation of anonymity if a threshold of
- servers want to?
- Logging. Making logs not revealing. A happy coincidence that verbose
- logging is our \#2 performance bottleneck. Is there a way to detect
- modified servers, or to have them volunteer the information that they're
- logging verbosely? Would that actually solve any attacks?
- \section{Crossroads: Scaling and Design choices}
- \label{sec:crossroads-design}
- \subsection{Transporting the stream vs transporting the packets}
- We periodically run into ex ZKS employees who tell us that the process of
- anonymizing IPs should ``obviously'' be done at the IP layer. Here are
- the issues that need to be resolved before we'll be ready to switch Tor
- over to arbitrary IP traffic.
- \begin{enumerate}
- \setlength{\itemsep}{0mm}
- \setlength{\parsep}{0mm}
- \item [IP packets reveal OS characteristics.] We still need to do
- IP-level packet normalization, to stop things like IP fingerprinting
- \cite{ip-fingerprinting}. There exist libraries \cite{ip-normalizing}
- that can help with this.
- \item [Application-level streams still need scrubbing.] We still need
- Tor to be easy to integrate with user-level application-specific proxies
- such as Privoxy. So it's not just a matter of capturing packets and
- anonymizing them at the IP layer.
- \item [Certain protocols will still leak information.] For example,
- DNS requests destined for my local DNS servers need to be rewritten
- to be delivered to some other unlinkable DNS server. This requires
- understanding the protocols we are transporting.
- \item [The crypto is unspecified.] First we need a block-level encryption
- approach that can provide security despite
- packet loss and out-of-order delivery. Freedom allegedly had one, but it was
- never publicly specified, and we believe it's likely vulnerable to tagging
- attacks \cite{tor-design}. Also, TLS over UDP is not implemented or even
- specified, though some early work has begun on that \cite{ben-tls-udp}.
- \item [We'll still need to tune network parameters]. Since the above
- encryption system will likely need sequence numbers and maybe more to do
- replay detection, handle duplicate frames, etc, we will be reimplementing
- some subset of TCP anyway to manage throughput, congestion control, etc.
- \item [Exit policies for arbitrary IP packets mean building a secure
- IDS.] Our server operators tell us that exit policies are one of
- the main reasons they're willing to run Tor over previous attempts
- at anonymizing networks. Adding an IDS to handle exit policies would
- increase the security complexity of Tor, and would likely not work anyway,
- as evidenced by the entire field of IDS and counter-IDS papers.
- \item [The Tor-internal name spaces would need to be redesigned.] We
- support hidden service \tt{.onion} addresses, and other special addresses
- like \tt{.exit} (see Section \ref{subsec:}), by intercepting the addresses
- when they are passed to the Tor client.
- \end{enumerate}
- \subsection{Mid-latency}
- Mid-latency. Can we do traffic shape to get any defense against George's
- PET2004 paper? Will padding or long-range dummies do anything then? Will
- it kill the user base or can we get both approaches to play well together?
- explain what mid-latency is. propose a single network where users of
- varying latency goals can combine.
- Note that in practice as the network is growing and we accept cable
- modem and dsl nodes, and nodes in other continents, we're *already*
- looking at many-second delays for some transactions. The engineering
- required to get this lower is going to be extremely hard. It's worth
- considering how hard it would be to accept the fixed (higher) latency
- and improve the protection we get from it.
- \subsection{Measuring performance and capacity}
- How to measure performance without letting people selectively deny service
- by distinguishing pings. Heck, just how to measure performance at all. In
- practice people have funny firewalls that don't match up to their exit
- policies and Tor doesn't deal.
- Network investigation: Is all this bandwidth publishing thing a good idea?
- How can we collect stats better? Note weasel's smokeping, at
- http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
- which probably gives george and steven enough info to break tor?
- \subsection{Plausible deniability}
- Does running a server help you or harm you? George's Oakland attack.
- Plausible deniability -- without even running your traffic through Tor! We
- have to pick the path length so adversary can't distinguish client from
- server (how many hops is good?).
- \subsection{Helper nodes}
- When does fixing your entry or exit node help you?
- Helper nodes in the literature don't deal with churn, and
- especially active attacks to induce churn.
- Do general DoS attacks have anonymity implications? See e.g. Adam
- Back's IH paper, but I think there's more to be pointed out here.
- \subsection{Location-hidden services}
- Survivable services are new in practice, yes? Hidden services seem
- less hidden than we'd like, since they stay in one place and get used
- a lot. They're the epitome of the need for helper nodes. This means
- that using Tor as a building block for Free Haven is going to be really
- hard. Also, they're brittle in terms of intersection and observation
- attacks. Would be nice to have hot-swap services, but hard to design.
- in practice, sites like bloggers without borders (www.b19s.org) are
- running tor servers but more important are advertising a hidden-service
- address on their front page. doing this can provide increased robustness
- if they used the dual-IP approach we describe in tor-design, but in
- practice they do it to a) increase visibility of the tor project and their
- support for privacy, and b) to offer a way for their users, using vanilla
- software, to get end-to-end encryption and end-to-end authentication to
- their website.
- \section{Crossroads: Scaling}
- Tor is running today with hundreds of servers and tens of thousands of
- users, but it will certainly not scale to millions.
- Scaling Tor involves three main challenges. First is safe server
- discovery, both bootstrapping -- how a Tor client can robustly find an
- initial server list -- and ongoing -- how a Tor client can learn about
- a fair sample of honest servers and not let the adversary control his
- circuits (see Section x). Second is detecting and handling the speed
- and reliability of the variety of servers we must use if we want to
- accept many servers (see Section y).
- Since the speed and reliability of a circuit is limited by its worst link,
- we must learn to track and predict performance. Finally, in order to get
- a large set of servers in the first place, we must address incentives
- for users to carry traffic for others (see Section incentives).
- \subsection{Incentives}
- There are three behaviors we need to encourage for each server: relaying
- traffic; providing good throughput and reliability while doing it;
- and allowing traffic to exit the network from that server.
- We encourage these behaviors through \emph{indirect} incentives, that
- is, designing the system and educating users in such a way that users
- with certain goals will choose to relay traffic. In practice, the
- main incentive for running a Tor server is social benefit: volunteers
- altruistically donate their bandwidth and time. We also keep public
- rankings of the throughput and reliability of servers, much like
- seti@home. We further explain to users that they can get \emph{better
- security} by operating a server, because they get plausible deniability
- (indeed, they may not need to route their own traffic through Tor at all
- -- blending directly with other traffic exiting Tor may be sufficient
- protection for them), and because they can use their own Tor server
- as entry or exit point and be confident it's not run by the adversary.
- Finally, we can improve the usability and feature set of the software:
- rate limiting support and easy packaging decrease the hassle of
- maintaining a server, and our configurable exit policies allow each
- operator to advertise a policy describing the hosts and ports to which
- he feels comfortable connecting.
- Beyond these, however, there is also a need for \emph{direct} incentives:
- providing payment or other resources in return for high-quality service.
- Paying actual money is problematic: decentralized e-cash systems are
- not yet practical, and a centralized collection system not only reduces
- robustness, but also has failed in the past (the history of commercial
- anonymizing networks is littered with failed attempts). A more promising
- option is to use a tit-for-tat incentive scheme: provide better service
- to nodes that have provided good service to you.
- Unfortunately, such an approach introduces new anonymity problems.
- Does the incentive system enable the adversary to attract more traffic by
- performing well? Typically a user who chooses evenly from all options is
- most resistant to an adversary targetting him, but that approach prevents
- us from handling heterogeneous servers \cite{casc-rep}.
- When a server (call him Steve) performs well for Alice, does Steve gain
- reputation with the entire system, or just with Alice? If the entire
- system, how does Alice tell everybody about her experience in a way that
- prevents her from lying about it yet still protects her identity? If
- Steve's behavior only affects Alice's behavior, does this allow Steve to
- selectively perform only for Alice, and then break her anonymity later
- when somebody (presumably Alice) routes through his node?
- These are difficult and open questions, yet choosing not to scale means
- leaving most users to a less secure network or no anonymizing network
- at all. We will start with a simplified approach to the tit-for-tat
- incentive scheme based on two rules: (1) each node should measure the
- service it receives from adjacent nodes, and provide service relative to
- the received service, but (2) when a node is making decisions that affect
- its own security (e.g. when building a circuit for its own application
- connections), it should choose evenly from a sufficiently large set of
- nodes that meet some minimum service threshold. This approach allows us
- to discourage bad service without opening Alice up as much to attacks.
- \subsection{Peer-to-peer / practical issues}
- Network discovery, sybil, node admission, scaling. It seems that the code
- will ship with something and that's our trust root. We could try to get
- people to build a web of trust, but no. Where we go from here depends
- on what threats we have in mind. Really decentralized if your threat is
- RIAA; less so if threat is to application data or individuals or...
- Making use of servers with little bandwidth. How to handle hammering by
- certain applications.
- Handling servers that are far away from the rest of the network, e.g. on
- the continents that aren't North America and Europe. High latency,
- often high packet loss.
- Running Tor servers behind NATs, behind great-firewalls-of-China, etc.
- Restricted routes. How to propagate to everybody the topology? BGP
- style doesn't work because we don't want just *one* path. Point to
- Geoff's stuff.
- \subsection{ISP-class adversaries}
- Routing-zones. It seems that our threat model comes down to diversity and
- dispersal. But hard for Alice to know how to act. Many questions remain.
- \subsection{The China problem}
- We have lots of users in Iran and similar (we stopped
- logging, so it's hard to know now, but many Persian sites on how to use
- Tor), and they seem to be doing ok. But the China problem is bigger. Cite
- Stefan's paper, and talk about how we need to route through clients,
- and we maybe we should start with a time-release IP publishing system +
- advogato based reputation system, to bound the number of IPs leaked to the
- adversary.
- \section{The Future}
- \label{sec:conclusion}
- \bibliographystyle{plain} \bibliography{tor-design}
- \end{document}
|