12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223 |
- \documentclass{llncs}
- \usepackage{url}
- \usepackage{amsmath}
- \usepackage{epsfig}
- \newenvironment{tightlist}{\begin{list}{$\bullet$}{
- \setlength{\itemsep}{0mm}
- \setlength{\parsep}{0mm}
-
-
-
- }}{\end{list}}
- \begin{document}
- \title{Challenges in practical low-latency stream anonymity (DRAFT)}
- \author{Roger Dingledine and Nick Mathewson}
- \institute{The Free Haven Project\\
- \email{\{arma,nickm\}@freehaven.net}}
- \maketitle
- \pagestyle{empty}
- \begin{abstract}
- foo
- \end{abstract}
- \section{Introduction}
- Tor is a low-latency anonymous communication overlay network designed
- to be practical and usable for protecting TCP streams over the
- Internet~\cite{tor-design}. We have been operating a publicly deployed
- Tor network since October 2003 that has grown to over a hundred volunteer
- nodes and carries on average over 70 megabits of traffic per second.
- Tor has a weaker threat model than many anonymity designs in the
- literature, because our foremost goal is to deploy a
- practical and useful network for interactive (low-latency) communications.
- Subject to this restriction, we try to
- provide as much anonymity as we can. In particular, because we
- support interactive communications without impractically expensive padding,
- we fall prey to a variety
- of intra-network~\cite{attack-tor-oak05,flow-correlation04,bar} and
- end-to-end~\cite{danezis-pet2004,SS03} anonymity-breaking attacks.
- Tor is secure so long as adversaries are unable to
- observe connections as they both enter and leave the Tor network.
- Therefore, Tor's defense lies in having a diverse enough set of servers
- that most real-world
- adversaries are unlikely to be in the right places to attack users.
- Specifically,
- Tor aims to resist observers and insiders by distributing each transaction
- over several nodes in the network. This ``distributed trust'' approach
- means the Tor network can be safely operated and used by a wide variety
- of mutually distrustful users, providing more sustainability and security
- than some previous attempts at anonymizing networks.
- The Tor network has a broad range of users, including ordinary citizens
- concerned about their privacy, corporations
- who don't want to reveal information to their competitors, and law
- enforcement and government intelligence agencies who need
- to do operations on the Internet without being noticed.
- Tor research and development has been funded by the U.S. Navy, for use
- in securing government
- communications, and also by the Electronic Frontier Foundation, for use
- in maintaining civil liberties for ordinary citizens online. The Tor
- protocol is one of the leading choices
- to be the anonymizing layer in the European Union's PRIME directive to
- help maintain privacy in Europe. The University of Dresden in Germany
- has integrated an independent implementation of the Tor protocol into
- their popular Java Anon Proxy anonymizing client. This wide variety of
- interests helps maintain both the stability and the security of the
- network.
- Tor's principal research strategy, in attempting to deploy a network that is
- practical, useful, and anonymous, has been to insist, when trade-offs arise
- between these properties, on remaining useful enough to attract many users,
- and practical enough to support them. Subject to these
- constraints, we aim to maximize anonymity. This is not the only possible
- direction in anonymity research: designs exist that provide more anonymity
- than Tor at the expense of significantly increased resource requirements, or
- decreased flexibility in application support (typically because of increased
- latency). Such research does not typically abandon aspirations towards
- deployability or utility, but instead tries to maximize deployability and
- utility subject to a certain degree of inherent anonymity (inherent because
- usability and practicality affect usage which affects the actual anonymity
- provided by the network \cite{back01,econymics}). We believe that these
- approaches can be promising and useful, but that by focusing on deploying a
- usable system in the wild, Tor helps us experiment with the actual parameters
- of what makes a system ``practical'' for volunteer operators and ``useful''
- for home users, and helps illuminate undernoticed issues which any deployed
- volunteer anonymity network will need to address.
- While~\cite{tor-design} gives an overall view of the Tor design and goals,
- this paper describes the policy and technical issues that Tor faces as
- we continue deployment. Rather than trying to provide complete solutions
- to every problem here, we lay out the assumptions and constraints
- that we have observed through deploying Tor in the wild. In doing so, we
- aim to create a research agenda for others to
- help in addressing these issues. Section~\ref{sec:what-is-tor} gives an
- overview of the Tor
- design and ours goals. Sections~\ref{sec:crossroads-policy}
- and~\ref{sec:crossroads-technical} go on to describe the practical challenges,
- both policy and technical respectively, that stand in the way of moving
- from a practical useful network to a practical useful anonymous network.
- \section{Distributed trust: safety in numbers}
- \label{sec:what-is-tor}
- Here we give a basic overview of the Tor design and its properties. For
- details on the design, assumptions, and security arguments, we refer
- the reader to the Tor design paper~\cite{tor-design}.
- Tor provides \emph{forward privacy}, so that users can connect to
- Internet sites without revealing their logical or physical locations
- to those sites or to observers. It also provides \emph{location-hidden
- services}, so that critical servers can support authorized users without
- giving adversaries an effective vector for physical or online attacks.
- The design provides this protection even when a portion of its own
- infrastructure is controlled by an adversary.
- To create a private network pathway with Tor, the user's software (client)
- incrementally builds a \emph{circuit} of encrypted connections through
- servers on the network. The circuit is extended one hop at a time, and
- each server along the way knows only which server gave it data and which
- server it is giving data to. No individual server ever knows the complete
- path that a data packet has taken. The client negotiates a separate set
- of encryption keys for each hop along the circuit to ensure that each
- hop can't trace these connections as they pass through.
- Once a circuit has been established, many kinds of data can be exchanged
- and several different sorts of software applications can be deployed over
- the Tor network. Because each server sees no more than one hop in the
- circuit, neither an eavesdropper nor a compromised server can use traffic
- analysis to link the connection's source and destination. Tor only works
- for TCP streams and can be used by any application with SOCKS support.
- For efficiency, the Tor software uses the same circuit for connections
- that happen within the same minute or so. Later requests are given a new
- circuit, to prevent long-term linkability between different actions by
- a single user.
- Tor also makes it possible for users to hide their locations while
- offering various kinds of services, such as web publishing or an instant
- messaging server. Using Tor ``rendezvous points'', other Tor users can
- connect to these hidden services, each without knowing the other's network
- identity.
- Tor attempts to anonymize the transport layer, not the application layer, so
- application protocols that include personally identifying information need
- additional application-level scrubbing proxies, such as
- Privoxy~\cite{privoxy} for HTTP. Furthermore, Tor does not permit arbitrary
- IP packets; it only anonymizes TCP and DNS, and only supports connections via
- SOCKS (see Section \ref{subsec:tcp-vs-ip}).
- Tor differs from other deployed systems for traffic analysis resistance
- in its security and flexibility. Mix networks such as
- Mixmaster~\cite{mixmaster-spec} or its successor Mixminion~\cite{minion-design}
- gain the highest degrees of anonymity at the expense of introducing highly
- variable delays, thus making them unsuitable for applications such as web
- browsing that require quick response times. Commercial single-hop
- proxies~\cite{anonymizer} present a single point of failure, where
- a single compromise can expose all users' traffic, and a single-point
- eavesdropper can perform traffic analysis on the entire network.
- Also, their proprietary implementations place any infrastucture that
- depends on these single-hop solutions at the mercy of their providers'
- financial health as well as network security.
- No organization can achieve this security on its own. If a single
- corporation or government agency were to build a private network to
- protect its operations, any connections entering or leaving that network
- would be obviously linkable to the controlling organization. The members
- and operations of that agency would be easier, not harder, to distinguish.
- Instead, to protect our networks from traffic analysis, we must
- collaboratively blend the traffic from many organizations and private
- citizens, so that an eavesdropper can't tell which users are which,
- and who is looking for what information. By bringing more users onto
- the network, all users become more secure~\cite{econymics}.
- Naturally, organizations will not want to depend on others for their
- security. If most participating providers are reliable, Tor tolerates
- some hostile infiltration of the network. For maximum protection,
- the Tor design includes an enclave approach that lets data be encrypted
- (and authenticated) end-to-end, so high-sensitivity users can be sure it
- hasn't been read or modified. This even works for Internet services that
- don't have built-in encryption and authentication, such as unencrypted
- HTTP or chat, and it requires no modification of those services to do so.
- As of January 2005, the Tor network has grown to around a hundred servers
- on four continents, with a total capacity exceeding 1Gbit/s. Appendix A
- shows a graph of the number of working servers over time, as well as a
- graph of the number of bytes being handled by the network over time. At
- this point the network is sufficiently diverse for further development
- and testing; but of course we always encourage and welcome new servers
- to join the network.
- Tor is not the only anonymity system that aims to be practical and useful.
- Commercial single-hop proxies~\cite{anonymizer}, as well as unsecured
- open proxies around the Internet~\cite{open-proxies}, can provide good
- performance and some security against a weaker attacker. Dresden's Java
- Anon Proxy~\cite{web-mix} provides similar functionality to Tor but only
- handles web browsing rather than arbitrary TCP. Also, JAP's network
- topology uses cascades (fixed routes through the network); since without
- end-to-end padding it is just as vulnerable as Tor to end-to-end timing
- attacks, its dispersal properties are therefore worse than Tor's.
- Zero-Knowledge Systems' commercial Freedom
- network~\cite{freedom21-security} was even more flexible than Tor in
- that it could transport arbitrary IP packets, and it also supported
- pseudonymous access rather than just anonymous access; but it had
- a different approach to sustainability (collecting money from users
- and paying ISPs to run servers), and has shut down due to financial
- load. Finally, more scalable designs like Tarzan~\cite{tarzan:ccs02} and
- MorphMix~\cite{morphmix:fc04} have been proposed in the literature, but
- have not yet been fielded. We direct the interested reader to Section
- 2 of~\cite{tor-design} for a more indepth review of related work.
- have a serious discussion of morphmix's assumptions, since they would
- seem to be the direct competition. in fact tor is a flexible architecture
- that would encompass morphmix, and they're nearly identical except for
- path selection and node discovery. and the trust system morphmix has
- seems overkill (and/or insecure) based on the threat model we've picked.
- \section{Threat model}
- \label{sec:threat-model}
- Tor does not attempt to defend against a global observer. Any adversary who
- can see a user's connection to the Tor network, and who can see the
- corresponding connection as it exits the Tor network, can use the timing
- correlation between the two connections to confirm the user's chosen
- communication partners. Defeating this attack would seem to require
- introducing a prohibitive degree of traffic padding between the user and the
- network, or introducing an unacceptable degree of latency (but see
- Section \ref{subsec:mid-latency}). Thus, Tor only
- attempts to defend against external observers who cannot observe both sides of a
- user's connection.
- Against internal attackers, who sign up Tor servers, the situation is more
- complicated. In the simplest case, if an adversary has compromised $c$ of
- $n$ servers on the Tor network, then the adversary will be able to compromise
- a random circuit with probability $\frac{c^2}{n^2}$ (since the circuit
- initiator chooses hops randomly). But there are
- complicating factors:
- \begin{tightlist}
- \item If the user continues to build random circuits over time, an adversary
- is pretty certain to see a statistical sample of the user's traffic, and
- thereby can build an increasingly accurate profile of her behavior. (See
- \ref{subsec:helper-nodes} for possible solutions.)
- \item If an adversary controls a popular service outside of the Tor network,
- he can be certain of observing all connections to that service; he
- therefore will trace connections to that service with probability
- $\frac{c}{n}$.
- \item Users do not in fact choose servers with uniform probability; they
- favor servers with high bandwidth, and exit servers that permit connections
- to their favorite services.
- \end{tightlist}
- in practice tor's threat model is based entirely on the goal of dispersal
- and diversity. george and steven describe an attack \cite{attack-tor-oak05} that
- lets them determine the nodes used in a circuit; yet they can't identify
- alice or bob through this attack. so it's really just the endpoints that
- remain secure. and the enclave model seems particularly threatened by
- this, since this attack lets us identify endpoints when they're servers.
- see \ref{subsec:helper-nodes} for discussion of some ways to address this
- issue.
- see \ref{subsec:routing-zones} for discussion of larger
- adversaries and our dispersal goals.
- [this section will get written once the rest of the paper is farther along]
- \section{Crossroads: Policy issues}
- \label{sec:crossroads-policy}
- Many of the issues the Tor project needs to address are not just a
- matter of system design or technology development. In particular, the
- Tor project's \emph{image} with respect to its users and the rest of
- the Internet impacts the security it can provide.
- As an example to motivate this section, some U.S.~Department of Enery
- penetration testing engineers are tasked with compromising DoE computers
- from the outside. They only have a limited number of ISPs from which to
- launch their attacks, and they found that the defenders were recognizing
- attacks because they came from the same IP space. These engineers wanted
- to use Tor to hide their tracks. First, from a technical standpoint,
- Tor does not support the variety of IP packets one would like to use in
- such attacks (see Section \ref{subsec:ip-vs-tcp}). But aside from this,
- we also decided that it would probably be poor precedent to encourage
- such use---even legal use that improves national security---and managed
- to dissuade them.
- With this image issue in mind, this section discusses the Tor user base and
- Tor's interaction with other services on the Internet.
- \subsection{Image and security}
- A growing field of papers argue that usability for anonymity systems
- contributes directly to their security, because how usable the system
- is impacts the possible anonymity set~\cite{back01,econymics}. Or
- conversely, an unusable system attracts few users and thus can't provide
- much anonymity.
- This phenomenon has a second-order effect: knowing this, users should
- choose which anonymity system to use based in part on how usable
- \emph{others} will find it, in order to get the protection of a larger
- anonymity set. Thus we might replace the adage ``usability is a security
- parameter''~\cite{back01} with a new one: ``perceived usability is a
- security parameter.'' From here we can better understand the effects
- of publicity and advertising on security: the more convincing your
- advertising, the more likely people will believe you have users, and thus
- the more users you will attract. Perversely, over-hyped systems (if they
- are not too broken) may be a better choice than modestly promoted ones,
- if the hype attracts more users~\cite{usability-network-effect}.
- So it follows that we should come up with ways to accurately communicate
- the available security levels to the user, so she can make informed
- decisions. Dresden's JAP project aims to do this, by including a
- comforting `anonymity meter' dial in the software's graphical interface,
- giving the user an impression of the level of protection for her current
- traffic.
- However, there's a catch. For users to share the same anonymity set,
- they need to act like each other. An attacker who can distinguish
- a given user's traffic from the rest of the traffic will not be
- distracted by other users on the network. For high-latency systems like
- Mixminion, where the threat model is based on mixing messages with each
- other, there's an arms race between end-to-end statistical attacks and
- counter-strategies~\cite{statistical-disclosure,minion-design,e2e-traffic,trickle02}.
- But for low-latency systems like Tor, end-to-end \emph{traffic
- confirmation} attacks~\cite{danezis-pet2004,SS03,defensive-dropping}
- allow an attacker who watches or controls both ends of a communication
- to use statistics to correlate packet timing and volume, quickly linking
- the initiator to her destination. This is why Tor's threat model is
- based on preventing the adversary from observing both the initiator and
- the responder.
- Like Tor, the current JAP implementation does not pad connections
- (apart from using small fixed-size cells for transport). In fact,
- its cascade-based network toplogy may be even more vulnerable to these
- attacks, because the network has fewer endpoints. JAP was born out of
- the ISDN mix design~\cite{isdn-mixes}, where padding made sense because
- every user had a fixed bandwidth allocation, but in its current context
- as a general Internet web anonymizer, adding sufficient padding to JAP
- would be prohibitively expensive.\footnote{Even if they could find and
- maintain extra funding to run higher-capacity nodes, our experience with
- users suggests that many users would not accept the increased per-user
- bandwidth requirements, leading to an overall much smaller user base. But
- see Section \ref{subsec:mid-latency}.} Therefore, since under this threat
- model the number of concurrent users does not seem to have much impact
- on the anonymity provided, we suggest that JAP's anonymity meter is not
- correctly communicating security levels to its users.
- On the other hand, while the number of active concurrent users may not
- matter as much as we'd like, it still helps to have some other users
- who use the network. We investigate this issue in the next section.
- \subsection{Reputability}
- Another factor impacting the network's security is its reputability:
- the perception of its social value based on its current user base. If I'm
- the only user who has ever downloaded the software, it might be socially
- accepted, but I'm not getting much anonymity. Add a thousand Communists,
- and I'm anonymous, but everyone thinks I'm a Commie. Add a thousand
- random citizens (cancer survivors, privacy enthusiasts, and so on)
- and now I'm harder to profile.
- The more cancer survivors on Tor, the better for the human rights
- activists. The more script kiddies, the worse for the normal users. Thus,
- reputability is an anonymity issue for two reasons. First, it impacts
- the sustainability of the network: a network that's always about to be
- shut down has difficulty attracting and keeping users, so its anonymity
- set suffers. Second, a disreputable network attracts the attention of
- powerful attackers who may not mind revealing the identities of all the
- users to uncover a few bad ones.
- While people therefore have an incentive for the network to be used for
- ``more reputable'' activities than their own, there are still tradeoffs
- involved when it comes to anonymity. To follow the above example, a
- network used entirely by cancer survivors might welcome some Communists
- onto the network, though of course they'd prefer a wider variety of users.
- Reputability becomes even more tricky in the case of privacy networks,
- since the good uses of the network (such as publishing by journalists in
- dangerous countries) are typically kept private, whereas network abuses
- or other problems tend to be more widely publicized.
- The impact of public perception on security is especially important
- during the bootstrapping phase of the network, where the first few
- widely publicized uses of the network can dictate the types of users it
- attracts next.
- \subsection{Usability and bandwidth and sustainability and incentives}
- low-pain-threshold users go away until all users are willing to use it
- Sustainability. Previous attempts have been commercial which we think
- adds a lot of unnecessary complexity and accountability. Freedom didn't
- collect enough money to pay its servers; JAP bandwidth is supported by
- continued money, and they periodically ask what they will do when it
- dries up.
- "outside of academia, jap has just lost, permanently"
- [nick will write this section]
- \subsection{Tor and file-sharing}
- [nick will write this section]
- Bittorrent and dmca. Should we add an IDS to autodetect protocols and
- snipe them?
- because only at the exit is it evident what port or protocol a given
- tor stream is, you can't choose not to carry file-sharing traffic.
- hibernation vs rate-limiting: do we want diversity or throughput? i
- think we're shifting back to wanting diversity.
- \subsection{Tor and blacklists}
- Takedowns and efnet abuse and wikipedia complaints and irc
- networks.
- It was long expected that, alongside Tor's legitimate users, it would also
- attract troublemakers who exploited Tor in order to abuse services on the
- Internet. Our initial answer to this situation was to use ``exit policies''
- to allow individual Tor servers to block access to specific IP/port ranges.
- This approach was meant to make operators more willing to run Tor by allowing
- them to prevent their servers from being used for abusing particular
- services. For example, all Tor servers currently block SMTP (port 25), in
- order to avoid being used to send spam.
- This approach is useful, but is insufficient for two reasons. First, since
- it is not possible to force all ORs to block access to any given service,
- many of those services try to block Tor instead. More broadly, while being
- blockable is important to being good netizens, we would like to encourage
- services to allow anonymous access; services should not need to decide
- between blocking legitimate anonymous use and allowing unlimited abuse.
- This is potentially a bigger problem than it may appear.
- On the one hand, if people want to refuse connections from you on
- their servers it would seem that they should be allowed to. But, a
- possible major problem with the blocking of Tor is that it's not just
- the decision of the individual server administrator whose deciding if
- he wants to post to wikipedia from his Tor node address or allow
- people to read wikipedia anonymously through his Tor node. If e.g.,
- s/he comes through a campus or corporate NAT, then the decision must
- be to have the entire population behind it able to have a Tor exit
- node or write access to wikipedia. This is a loss for both of us (Tor
- and wikipedia). We don't want to compete for (or divvy up) the NAT
- protected entities of the world.
- (A related problem is that many IP blacklists are not terribly fine-grained.
- No current IP blacklist, for example, allow a service provider to blacklist
- only those Tor servers that allow access to a specific IP or port, even
- though this information is readily available. One IP blacklist even bans
- every class C network that contains a Tor server, and recommends banning SMTP
- from these networks even though Tor does not allow SMTP at all.)
- Problems of abuse occur mainly with services such as IRC networks and
- Wikipedia, which rely on IP-blocking to ban abusive users. While at first
- blush this practice might seem to depend on the anachronistic assumption that
- each IP is an identifier for a single user, it is actually more reasonable in
- practice: it assumes that non-proxy IPs are a costly resource, and that an
- abuser can not change IPs at will. By blocking IPs which are used by Tor
- servers, open proxies, and service abusers, these systems hope to make
- ongoing abuse difficult. Although the system is imperfect, it works
- tolerably well for them in practice.
- But of course, we would prefer that legitimate anonymous users be able to
- access abuse-prone services. One conceivable approach would be to require
- would-be IRC users, for instance, to register accounts if they wanted to
- access the IRC network from Tor. But in practise, this would not
- significantly impede abuse if creating new accounts were easily automatable;
- this is why services use IP blocking. In order to deter abuse, pseudonymous
- identities need to impose a significant switching cost in resources or human
- time.
- Once approach, similar to that taken by Freedom, would be to bootstrap some
- non-anonymous costly identification mechanism to allow access to a
- blind-signature pseudonym protocol. This would effectively create costly
- pseudonyms, which services could require in order to allow anonymous access.
- This approach has difficulties in practise, however:
- \begin{tightlist}
- \item Unlike Freedom, Tor is not a commercial service. Therefore, it would
- be a shame to require payment in order to make Tor useful, or to make
- non-paying users second-class citizens.
- \item It is hard to think of an underlying resource that would actually work.
- We could use IP addresses, but that's the problem, isn't it?
- \item Managing single sign-on services is not considered a well-solved
- problem in practice. If Microsoft can't get universal acceptance for
- passport, why do we think that a Tor-specific solution would do any good?
- \item Even if we came up with a perfect authentication system for our needs,
- there's no guarantee that any service would actually start using it. It
- would require a nonzero effort for them to support it, and it might just
- be less hassle for them to block tor anyway.
- \end{tightlist}
- Squishy IP based ``authentication'' and ``authorization'' is a reality
- we must contend with. We should say something more about the analogy
- with SSNs.
- \subsection{Other}
- [Once you build a generic overlay network, everybody wants to use it.]
- Tor's scope: How much should Tor aim to do? Applications that leak
- data: we can say they're not our problem, but they're somebody's problem.
- Also, the more widely deployed Tor becomes, the more people who need a
- deployed overlay network tell us they'd like to use us if only we added
- the following more features. For example, Blossom \cite{blossom} and
- random community wireless projects both want source-routable overlay
- networks for their own purposes. Fortunately, our modular design separates
- routing from node discovery; so we could implement Morphmix in Tor just
- by implementing the Morphmix-specific node discovery and path selection
- pieces. On the other hand, we could easily get distracted building a
- general-purpose overlay library, and we're only a few developers.
- [arma will work on this]
- Logging. Making logs not revealing. A happy coincidence that verbose
- logging is our \#2 performance bottleneck. Is there a way to detect
- modified servers, or to have them volunteer the information that they're
- logging verbosely? Would that actually solve any attacks?
- \section{Crossroads: Scaling and Design choices}
- \label{sec:crossroads-design}
- \subsection{Transporting the stream vs transporting the packets}
- \label{subsec:tcp-vs-ip}
- We periodically run into ex ZKS employees who tell us that the process of
- anonymizing IPs should ``obviously'' be done at the IP layer. Here are
- the issues that need to be resolved before we'll be ready to switch Tor
- over to arbitrary IP traffic.
- \begin{enumerate}
- \setlength{\itemsep}{0mm}
- \setlength{\parsep}{0mm}
- \item \emph{IP packets reveal OS characteristics.} We still need to do
- IP-level packet normalization, to stop things like IP fingerprinting
- attacks. There likely exist libraries that can help with this.
- \item \emph{Application-level streams still need scrubbing.} We still need
- Tor to be easy to integrate with user-level application-specific proxies
- such as Privoxy. So it's not just a matter of capturing packets and
- anonymizing them at the IP layer.
- \item \emph{Certain protocols will still leak information.} For example,
- DNS requests destined for my local DNS servers need to be rewritten
- to be delivered to some other unlinkable DNS server. This requires
- understanding the protocols we are transporting.
- \item \emph{The crypto is unspecified.} First we need a block-level encryption
- approach that can provide security despite
- packet loss and out-of-order delivery. Freedom allegedly had one, but it was
- never publicly specified.
- Also, TLS over UDP is not implemented or even
- specified, though some early work has begun on that~\cite{dtls}.
- \item \emph{We'll still need to tune network parameters}. Since the above
- encryption system will likely need sequence numbers (and maybe more) to do
- replay detection, handle duplicate frames, etc, we will be reimplementing
- some subset of TCP anyway.
- \item \emph{Exit policies for arbitrary IP packets mean building a secure
- IDS.} Our server operators tell us that exit policies are one of
- the main reasons they're willing to run Tor.
- Adding an Intrusion Detection System to handle exit policies would
- increase the security complexity of Tor, and would likely not work anyway,
- as evidenced by the entire field of IDS and counter-IDS papers. Many
- potential abuse issues are resolved by the fact that Tor only transports
- valid TCP streams (as opposed to arbitrary IP including malformed packets
- and IP floods), so exit policies become even \emph{more} important as
- we become able to transport IP packets. We also need a way to compactly
- characterize the exit policies and let clients parse them to decide
- which nodes will allow which packets to exit.
- \item \emph{The Tor-internal name spaces would need to be redesigned.} We
- support hidden service {\tt{.onion}} addresses, and other special addresses
- like {\tt{.exit}} (see Section \ref{subsec:}), by intercepting the addresses
- when they are passed to the Tor client.
- \end{enumerate}
- This list is discouragingly long right now, but we recognize that it
- would be good to investigate each of these items in further depth and to
- understand which are actual roadblocks and which are easier to resolve
- than we think. We certainly wouldn't mind if Tor one day is able to
- transport a greater variety of protocols.
- \subsection{Mid-latency}
- \label{subsec:mid-latency}
- Though Tor has always been designed to be practical and usable first
- with as much anonymity as can be built in subject to those goals, we
- have contemplated that users might need resistance to at least simple
- traffic confirmation attacks. Raising the latency of communication
- slightly might make this feasible. If the latency could be kept to two
- or three times its current overhead, this might be acceptable to the
- majority of Tor users. However, it might also destroy much of the user
- base, and it is difficult to know in advance. Note also that in
- practice, as the network is growing and we accept cable modem, DSL
- nodes, and more nodes in various continents, we're \emph{already}
- looking at many-second delays for some transactions. The engineering
- required to get this lower is going to be extremely hard. It's worth
- considering how hard it would be to accept the fixed (higher) latency
- and improve the protection we get from it. Thus, it may be most
- practical to run a mid-latency option over the Tor network for those
- users either willing to experiment or in need of more a priori
- anonymity in the network. This will allow us to experiment with both
- the anonymity provided and the interest on the part of users.
- Adding a mid-latency option should not require significant fundamental
- change to the Tor client or server design; circuits can be labeled as
- low or mid latency on servers as they are set up. Low-latency traffic
- would be processed as now. Packets on circuits that are mid-latency
- would be sent in uniform size chunks at synchronized intervals. To
- some extent the chunking is already done because traffic moves through
- the network in uniform size cells, but this would occur at a coarser
- granularity. If servers forward these chunks in roughly synchronous
- fashion, it will increase the similarity of data stream timing
- signatures. By experimenting with the granularity of data chunks and
- of synchronization we can attempt once again to optimize for both
- usability and anonymity. Unlike in \cite{sync-batching}, it may be
- impractical to synchronize on network batches by dropping chunks from
- a batch that arrive late at a given node---unless Tor moves away from
- stream processing to a more loss-tolerant processing of traffic (cf.\
- Section~\ref{subsec:tcp-vs-ip}). In other words, there would
- probably be no direct attempt to synchronize on batches of data
- entering the Tor network at the same time. Rather, it is the link
- level batching that will add noise to the traffic patterns exiting the
- network. Similarly, if end-to-end traffic confirmation is the
- concern, there is little point in mixing. It might also be feasible to
- pad chunks to uniform size as is done now for cells; if this is link
- padding rather than end-to-end, then it will take less overhead,
- especially in bursty environments. This is another way in which it
- would be fairly practical to set up a mid-latency option within the
- existing Tor network. Other padding regimens might supplement the
- mid-latency option; however, we should continue the caution with which
- we have always approached padding lest the overhead cost us too much
- performance or too many volunteers.
- The distinction between traffic confirmation and traffic analysis is
- not as practically cut and dried as we might wish. In \cite{hintz-pet02} it was
- shown that if latencies to and/or data volumes of various popular
- responder destinations are catalogued, it may not be necessary to
- observe both ends of a stream to confirm a source-destination link.
- These are likely to entail high variability and massive storage since
- routes through the network to each site will be random even if they
- have relatively unique latency or volume characteristics. So these do
- not seem an immediate practical threat. Further along similar lines, in
- \cite{attack-tor-oak05}, it was shown that an outside attacker can
- trace a stream through the Tor network while a stream is still active
- simply by observing the latency of his own traffic sent through
- various Tor nodes. These attacks are especially significant since they
- counter previous results that running one's own onion router protects
- better than using the network from the outside. The attacks do not
- show the client address, only the first server within the Tor network,
- making helper nodes all the more worthy of exploration for enclave
- protection. Setting up a mid-latency subnet as described above would
- be another significant step to evaluating resistance to such attacks.
- The attacks in \cite{attack-tor-oak05} are also dependent on
- cooperation of the responding application or the ability to modify or
- monitor the responder stream, in order of decreasing attack
- effectiveness. So, another way to counter these attacks in some cases
- would be to employ caching of responses. This is infeasible for
- application data that is not relatively static and from frequently
- visited sites; however, it might be useful for DNS lookups. This is
- also likely to be trading one practical threat for another. To be
- useful, such caches would need to be distributed to any likely exit
- nodes of recurred requests for the same data. Aside from the logistic
- difficulties and overhead of distribution, they constitute a collected
- record of destinations and/or data visited by Tor users. While
- limited to network insiders, given the need for wide distribution
- they could serve as useful data to an attacker deciding which locations
- to target for confirmation.
- [nick will work on this]
- \subsection{Application support: socks doesn't solve all our problems}
- socks4a isn't everywhere. the dns problem. etc.
- nick will work on this.
- \subsection{Measuring performance and capacity}
- How to measure performance without letting people selectively deny service
- by distinguishing pings. Heck, just how to measure performance at all. In
- practice people have funny firewalls that don't match up to their exit
- policies and Tor doesn't deal.
- Network investigation: Is all this bandwidth publishing thing a good idea?
- How can we collect stats better? Note weasel's smokeping, at
- http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
- which probably gives george and steven enough info to break tor?
- [nick will work on this section, unless arma gets there first]
- \subsection{Anonymity benefits for running a server}
- Does running a server help you or harm you? George's Oakland attack.
- Plausible deniability -- without even running your traffic through Tor!
- But nobody knows about Tor, and the legal situation is fuzzy, so this
- isn't very true really.
- We have to pick the path length so adversary can't distinguish client from
- server (how many hops is good?).
- in practice, plausible deniability is hypothetical and doesn't seem very
- convincing. if ISPs find the activity antisocial, they don't care *why*
- your computer is doing that behavior.
- [arma will write this section]
- \subsection{Helper nodes}
- When does fixing your entry or exit node help you?
- Helper nodes in the literature don't deal with churn, and
- especially active attacks to induce churn.
- Do general DoS attacks have anonymity implications? See e.g. Adam
- Back's IH paper, but I think there's more to be pointed out here.
- Game theory for helper nodes: if Alice offers a hidden service on a
- server (enclave model), and nobody ever uses helper nodes, then against
- George+Steven's attack she's totally nailed. If only Alice uses a helper
- node, then she's still identified as the source of the data. If everybody
- uses a helper node (including Alice), then the attack identifies the
- helper node and also Alice, and knows which one is which. If everybody
- uses a helper node (but not Alice), then the attacker figures the real
- source was a client that is using Alice as a helper node. [How's my
- logic here?]
- point to routing-zones section re: helper nodes to defend against
- big stuff.
- [nick will write this section]
- \subsection{Location-hidden services}
- [arma will write this section]
- Survivable services are new in practice, yes? Hidden services seem
- less hidden than we'd like, since they stay in one place and get used
- a lot. They're the epitome of the need for helper nodes. This means
- that using Tor as a building block for Free Haven is going to be really
- hard. Also, they're brittle in terms of intersection and observation
- attacks. Would be nice to have hot-swap services, but hard to design.
- people are using hidden services as a poor man's vpn and firewall-buster.
- rather than playing with dyndns and trying to pierce holes in their
- firewall (say, so they can ssh in from the outside), they run a hidden
- service on the inside and then rendezvous with that hidden service
- externally.
- in practice, sites like bloggers without borders (www.b19s.org) are
- running tor servers but more important are advertising a hidden-service
- address on their front page. doing this can provide increased robustness
- if they used the dual-IP approach we describe in tor-design, but in
- practice they do it to a) increase visibility of the tor project and their
- support for privacy, and b) to offer a way for their users, using vanilla
- software, to get end-to-end encryption and end-to-end authentication to
- their website.
- \subsection{Trust and discovery}
- [arma will edit this and expand/retract it]
- The published Tor design adopted a deliberately simplistic design for
- authorizing new nodes and informing clients about servers and their status.
- In the early Tor designs, all ORs periodically uploaded a signed description
- of their locations, keys, and capabilities to each of several well-known {\it
- directory servers}. These directory servers constructed a signed summary
- of all known ORs (a ``directory''), and a signed statement of which ORs they
- believed to be operational at any given time (a ``network status''). Clients
- periodically downloaded a directory in order to learn the latest ORs and
- keys, and more frequently downloaded a network status to learn which ORs are
- likely to be running. ORs also operate as directory caches, in order to
- lighten the bandwidth on the authoritative directory servers.
- In order to prevent Sybil attacks (wherein an adversary signs up many
- purportedly independent servers in order to increase her chances of observing
- a stream as it enters and leaves the network), the early Tor directory design
- required the operators of the authoritative directory servers to manually
- approve new ORs. Unapproved ORs were included in the directory, but clients
- did not use them at the start or end of their circuits. In practice,
- directory administrators performed little actual verification, and tended to
- approve any OR whose operator could compose a coherent email. This procedure
- may have prevented trivial automated Sybil attacks, but would do little
- against a clever attacker.
- There are a number of flaws in this system that need to be addressed as we
- move forward. They include:
- \begin{tightlist}
- \item Each directory server represents an independent point of failure; if
- any one were compromised, it could immediately compromise all of its users
- by recommending only compromised ORs.
- \item The more servers appear join the network, the more unreasonable it
- becomes to expect clients to know about them all. Directories
- become unfeasibly large, and downloading the list of servers becomes
- burdonsome.
- \item The validation scheme may do as much harm as it does good. It is not
- only incapable of preventing clever attackers from mounting Sybil attacks,
- but may deter server operators from joining the network. (For instance, if
- they expect the validation process to be difficult, or if they do not share
- any languages in common with the directory server operators.)
- \end{tightlist}
- We could try to move the system in several directions, depending on our
- choice of threat model and requirements. If we did not need to increase
- network capacity in order to support more users, there would be no reason not
- to adopt even stricter validation requirements, and reduce the number of
- servers in the network to a trusted minimum. But since we want Tor to work
- for as many users as it can, we need XXXXX
- In order to address the first two issues, it seems wise to move to a system
- including a number of semi-trusted directory servers, no one of which can
- compromise a user on its own. Ultimately, of course, we cannot escape the
- problem of a first introducer: since most users will run Tor in whatever
- configuration the software ships with, the Tor distribution itself will
- remain a potential single point of failure so long as it includes the seed
- keys for directory servers, a list of directory servers, or any other means
- to learn which servers are on the network. But omitting this information
- from the Tor distribution would only delegate the trust problem to the
- individual users, most of whom are presumably less informed about how to make
- trust decisions than the Tor developers.
- \section{Crossroads: Scaling}
- Tor is running today with hundreds of servers and tens of thousands of
- users, but it will certainly not scale to millions.
- Scaling Tor involves three main challenges. First is safe server
- discovery, both bootstrapping -- how a Tor client can robustly find an
- initial server list -- and ongoing -- how a Tor client can learn about
- a fair sample of honest servers and not let the adversary control his
- circuits (see Section x). Second is detecting and handling the speed
- and reliability of the variety of servers we must use if we want to
- accept many servers (see Section y).
- Since the speed and reliability of a circuit is limited by its worst link,
- we must learn to track and predict performance. Finally, in order to get
- a large set of servers in the first place, we must address incentives
- for users to carry traffic for others (see Section incentives).
- \subsection{Incentives by Design}
- [nick will try to make this section shorter and more to the point.]
- [most of the technical incentive schemes in the literature introduce
- anonymity issues which we don't understand yet, and we seem to be doing
- ok without them]
- There are three behaviors we need to encourage for each server: relaying
- traffic; providing good throughput and reliability while doing it;
- and allowing traffic to exit the network from that server.
- We encourage these behaviors through \emph{indirect} incentives, that
- is, designing the system and educating users in such a way that users
- with certain goals will choose to relay traffic. In practice, the
- main incentive for running a Tor server is social benefit: volunteers
- altruistically donate their bandwidth and time. We also keep public
- rankings of the throughput and reliability of servers, much like
- seti@home. We further explain to users that they can get \emph{better
- security} by operating a server, because they get plausible deniability
- (indeed, they may not need to route their own traffic through Tor at all
- -- blending directly with other traffic exiting Tor may be sufficient
- protection for them), and because they can use their own Tor server
- as entry or exit point and be confident it's not run by the adversary.
- Finally, we can improve the usability and feature set of the software:
- rate limiting support and easy packaging decrease the hassle of
- maintaining a server, and our configurable exit policies allow each
- operator to advertise a policy describing the hosts and ports to which
- he feels comfortable connecting.
- Beyond these, however, there is also a need for \emph{direct} incentives:
- providing payment or other resources in return for high-quality service.
- Paying actual money is problematic: decentralized e-cash systems are
- not yet practical, and a centralized collection system not only reduces
- robustness, but also has failed in the past (the history of commercial
- anonymizing networks is littered with failed attempts). A more promising
- option is to use a tit-for-tat incentive scheme: provide better service
- to nodes that have provided good service to you.
- Unfortunately, such an approach introduces new anonymity problems.
- Does the incentive system enable the adversary to attract more traffic by
- performing well? Typically a user who chooses evenly from all options is
- most resistant to an adversary targetting him, but that approach prevents
- us from handling heterogeneous servers \cite{casc-rep}.
- When a server (call him Steve) performs well for Alice, does Steve gain
- reputation with the entire system, or just with Alice? If the entire
- system, how does Alice tell everybody about her experience in a way that
- prevents her from lying about it yet still protects her identity? If
- Steve's behavior only affects Alice's behavior, does this allow Steve to
- selectively perform only for Alice, and then break her anonymity later
- when somebody (presumably Alice) routes through his node?
- These are difficult and open questions, yet choosing not to scale means
- leaving most users to a less secure network or no anonymizing network
- at all. We will start with a simplified approach to the tit-for-tat
- incentive scheme based on two rules: (1) each node should measure the
- service it receives from adjacent nodes, and provide service relative to
- the received service, but (2) when a node is making decisions that affect
- its own security (e.g. when building a circuit for its own application
- connections), it should choose evenly from a sufficiently large set of
- nodes that meet some minimum service threshold. This approach allows us
- to discourage bad service without opening Alice up as much to attacks.
- \subsection{Peer-to-peer / practical issues}
- [leave this section for now, and make sure things here are covered
- elsewhere. then remove it.]
- Making use of servers with little bandwidth. How to handle hammering by
- certain applications.
- Handling servers that are far away from the rest of the network, e.g. on
- the continents that aren't North America and Europe. High latency,
- often high packet loss.
- Running Tor servers behind NATs, behind great-firewalls-of-China, etc.
- Restricted routes. How to propagate to everybody the topology? BGP
- style doesn't work because we don't want just *one* path. Point to
- Geoff's stuff.
- \subsection{Location diversity and ISP-class adversaries}
- \label{subsec:routing-zones}
- Anonymity networks have long relied on diversity of node location for
- protection against attacks---typically an adversary who can observe a
- larger fraction of the network can launch a more effective attack. One
- way to achieve dispersal involves growing the network so a given adversary
- sees less. Alternately, we can arrange the topology so traffic can enter
- or exit at many places (for example, by using a free-route network
- like Tor rather than a cascade network like JAP). Lastly, we can use
- distributed trust to spread each transaction over multiple jurisdictions.
- But how do we decide whether two nodes are in related locations?
- Feamster and Dingledine defined a \emph{location diversity} metric
- in \cite{feamster:wpes2004}, and began investigating a variant of location
- diversity based on the fact that the Internet is divided into thousands of
- independently operated networks called {\em autonomous systems} (ASes).
- The key insight from their paper is that while we typically think of a
- connection as going directly from the Tor client to her first Tor node,
- actually it traverses many different ASes on each hop. An adversary at
- any of these ASes can monitor or influence traffic. Specifically, given
- plausible initiators and recipients and path random path selection,
- some ASes in the simulation were able to observe 10\% to 30\% of the
- transactions (that is, learn both the origin and the destination) on
- the deployed Tor network (33 nodes as of June 2004).
- The paper concludes that for best protection against the AS-level
- adversary, nodes should be in ASes that have the most links to other ASes:
- Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction
- is safest when it starts or ends in a Tier-1 ISP. Therefore, assuming
- initiator and responder are both in the U.S., it actually \emph{hurts}
- our location diversity to add far-flung nodes in continents like Asia
- or South America.
- Many open questions remain. First, it will be an immense engineering
- challenge to get an entire BGP routing table to each Tor client, or at
- least summarize it sufficiently. Without a local copy, clients won't be
- able to safely predict what ASes will be traversed on the various paths
- through the Tor network to the final destination. Tarzan~\cite{tarzan:ccs02}
- and MorphMix~\cite{morphmix:fc04} suggest that we compare IP prefixes to
- determine location diversity; but the above paper showed that in practice
- many of the Mixmaster nodes that share a single AS have entirely different
- IP prefixes. When the network has scaled to thousands of nodes, does IP
- prefix comparison become a more useful approximation?
- Second, can take advantage of caching certain content at the exit nodes, to
- limit the number of requests that need to leave the network at all.
- what about taking advantage of caches like akamai's or googles? what
- about treating them as adversaries?
- Third, if we follow the paper's recommendations and tailor path selection
- to avoid choosing endpoints in similar locations, how much are we hurting
- anonymity against larger real-world adversaries who can take advantage
- of knowing our algorithm?
- Lastly, can we use this knowledge to figure out which gaps in our network
- would most improve our robustness to this class of attack, and go recruit
- new servers with those ASes in mind?
- Tor's security relies in large part on the dispersal properties of its
- network. We need to be more aware of the anonymity properties of various
- approaches we can make better design decisions in the future.
- \subsection{The China problem}
- \label{subsec:china}
- Citizens in a variety of countries, such as most recently China and
- Iran, are periodically blocked from accessing various sites outside
- their country. These users try to find any tools available to allow
- them to get-around these firewalls. Some anonymity networks, such as
- Six-Four~\cite{six-four}, are designed specifically with this goal in
- mind; others like the Anonymizer~\cite{anonymizer} are paid by sponsors
- such as Voice of America to set up a network to encourage `Internet
- freedom'~\cite{voice-of-america-anonymizer}. Even though Tor wasn't
- designed with ubiquitous access to the network in mind, thousands of
- users across the world are trying to use it for exactly this purpose.
- Anti-censorship networks hoping to bridge country-level blocks face
- a variety of challenges. One of these is that they need to find enough
- exit nodes---servers on the `free' side that are willing to relay
- arbitrary traffic from users to their final destinations. Anonymizing
- networks including Tor are well-suited to this task, since we have
- already gathered a set of exit nodes that are willing to tolerate some
- political heat.
- The other main challenge is to distribute a list of reachable relays
- to the users inside the country, and give them software to use them,
- without letting the authorities also enumerate this list and block each
- relay. Anonymizer solves this by buying lots of seemingly-unrelated IP
- addresses (or having them donated), abandoning old addresses as they are
- `used up', and telling a few users about the new ones. Distributed
- anonymizing networks again have an advantage here, in that we already
- have tens of thousands of separate IP addresses whose users might
- volunteer to provide this service since they've already installed and use
- the software for their own privacy~\cite{koepsell:wpes2004}. Because
- the Tor protocol separates routing from network discovery (see Section
- \ref{do-we-discuss-this?}), volunteers could configure their Tor clients
- to generate server descriptors and send them to a special directory
- server that gives them out to dissidents who need to get around blocks.
- Of course, this still doesn't prevent the adversary
- from enumerating all the volunteer relays and blocking them preemptively.
- Perhaps a tiered-trust system could be built where a few individuals are
- given relays' locations, and they recommend other individuals by telling them
- those addresses, thus providing a built-in incentive to avoid letting the
- adversary intercept them. Max-flow trust algorithms~\cite{advogato}
- might help to bound the number of IP addresses leaked to the adversary. Groups
- like the W3C are looking into using Tor as a component in an overall system to
- help address censorship; we wish them luck.
- \subsection{Non-clique topologies}
- [nick will try to shrink this section]
- Because of its threat model that is substantially weaker than high
- latency mixnets, Tor is actually in a potentially better position to
- scale at least initially. From the perspective of a mix network, one
- of the worst things that can happen is partitioning. The more
- potential senders of messages entering the network the better the
- anonymity. Roughly, if a network is, e.g., split in half, then your
- anonymity is cut in half. Attacks become half as hard (if they're
- linear in network size), etc. In some sense this is still true for
- Tor: if you want to know who Alice is talking to, you can watch her
- for one end of a circuit. For a half size network, you then only have
- to brute force examine half as many nodes to find the other end. But
- Tor is not meant to cope with someone directly attacking many dozens
- of nodes in a few minutes. It was meant to cope with traffic
- confirmation attacks. And, these are independent of the size of the
- network. So, a simple possibility when the scale of a Tor network
- exceeds some size is to simply split it. Care could be taken in
- allocating which nodes go to which network along the lines of
- \cite{casc-rep} to insure that collaborating hostile nodes are not
- able to gain any advantage in network splitting that they do not
- already have in joining a network.
- The attacks in \cite{attack-tor-oak05} show that certain types of
- brute force attacks are in fact feasible; however they make the
- above point stronger not weaker. The attacks do not appear to be
- significantly more difficult to mount against a network that is
- twice the size. Also, they only identify the Tor nodes used in a
- circuit, not the client. Finally note that even if the network is split,
- a client does not need to use just one of the two resulting networks.
- Alice could use either of them, and it would not be difficult to make
- the Tor client able to access several such network on a per circuit
- basis. More analysis is needed; we simply note here that splitting
- a Tor network is an easy way to achieve moderate scalability and that
- it does not necessarily have the same implications as splitting a mixnet.
- Alternatively, we can try to scale a single network. Some issues for
- scaling include how many neighbors can nodes support and how many
- users (and how much application traffic capacity) can the network
- handle for each new node that comes into the network. This depends on
- many things, most notably the traffic capacity of the new nodes. We
- can observe, however, that adding a tor node of any feasible bandwidth
- will increase the traffic capacity of the network. This means that, as
- a first step to scaling, we can focus on the interconnectivity of the
- nodes, followed by directories, discovery, etc.
- By reducing the connectivity of the network we increase the total
- number of nodes that the network can contain. Anonymity implications
- of restricted routes for mix networks have already been explored by
- Danezis~\cite{danezis-pets03}. That paper explicitly considered only
- traffic analysis resistance provided by a mix network and sidestepped
- questions of traffic confirmation resistance. But, Tor is designed
- only to resist traffic confirmation. For this and other reasons, we
- cannot simply adopt his mixnet results to onion routing networks. If
- an attacker gains minimal increase in the likelyhood of compromising
- the endpoints of a Tor circuit through a sparse network (vs.\ a clique
- on the same node set), then the restriction will have had minimal
- impact on the anonymity provided by that network.
- The approach Danezis describes is based on expander graphs, i.e.,
- graphs in which any subgraph of nodes is likely to have lots of nodes
- as neighbors. For Tor, we may not need to have an expander per se, it
- may be enough to have a single subnet that is highly connected. As an
- example, assume fifty nodes of relatively high traffic capacity. This
- \emph{center} forms are a clique. Assume each center node can each
- handle 200 connections to other nodes (including the other ones in the
- center). Assume every noncenter node connects to three nodes in the
- center and anyone out of the center that they want to. Then the
- network easily scales to c. 2500 nodes with commensurate increase in
- bandwidth. There are many open questions: how directory information
- is distributed (presumably information about the center nodes could
- be given to any new nodes with their codebase), whether center nodes
- will need to function as a `backbone', etc. As above the point is
- that this would create problems for the expected anonymity for a mixnet,
- but for an onion routing network where anonymity derives largely from
- the edges, it may be feasible.
- Another point is that we already have a non-clique topology.
- Individuals can set up and run Tor nodes without informing the
- directory servers. This will allow, e.g., dissident groups to run a
- local Tor network of such nodes that connects to the public Tor
- network. This network is hidden behind the Tor network and its
- only visible connection to Tor at those points where it connects.
- As far as the public network is concerned or anyone observing it,
- they are running clients.
- \section{The Future}
- \label{sec:conclusion}
- we should put random thoughts here until there are enough for a
- conclusion.
- will our sustainability approach work? we'll see.
- "These are difficult and open questions, yet choosing not to solve them
- means leaving most users to a less secure network or no anonymizing
- network at all."
- \bibliographystyle{plain} \bibliography{tor-design}
- \clearpage
- \appendix
- \begin{figure}[t]
- \centering
- \mbox{\epsfig{figure=graphnodes,width=5in}}
- \caption{Number of servers over time. Lowest line is number of exit
- nodes that allow connections to port 80. Middle line is total number of
- verified (registered) servers. The line above that represents servers
- that are not yet registered.}
- \label{fig:graphnodes}
- \end{figure}
- \begin{figure}[t]
- \centering
- \mbox{\epsfig{figure=graphtraffic,width=5in}}
- \caption{The sum of traffic reported by each server over time. The bottom
- pair show average throughput, and the top pair represent the largest 15
- minute burst in each 4 hour period.}
- \label{fig:graphtraffic}
- \end{figure}
- \end{document}
|