12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505 |
- \documentclass{llncs}
- \usepackage{url}
- \usepackage{amsmath}
- \usepackage{epsfig}
- \setlength{\textwidth}{5.9in}
- \setlength{\textheight}{8.4in}
- \setlength{\topmargin}{.5cm}
- \setlength{\oddsidemargin}{1cm}
- \setlength{\evensidemargin}{1cm}
- \newenvironment{tightlist}{\begin{list}{$\bullet$}{
- \setlength{\itemsep}{0mm}
- \setlength{\parsep}{0mm}
- % \setlength{\labelsep}{0mm}
- % \setlength{\labelwidth}{0mm}
- % \setlength{\topsep}{0mm}
- }}{\end{list}}
- \begin{document}
- \title{Design of a blocking-resistant anonymity system\\DRAFT}
- %\author{Roger Dingledine\inst{1} \and Nick Mathewson\inst{1}}
- \author{Roger Dingledine \and Nick Mathewson}
- \institute{The Free Haven Project\\
- \email{\{arma,nickm\}@freehaven.net}}
- \maketitle
- \pagestyle{plain}
- \begin{abstract}
- Websites around the world are increasingly being blocked by
- government-level firewalls. Many people use anonymizing networks like
- Tor to contact sites without letting an attacker trace their activities,
- and as an added benefit they are no longer affected by local censorship.
- But if the attacker simply denies access to the Tor network itself,
- blocked users can no longer benefit from the security Tor offers.
- Here we describe a design that builds upon the current Tor network
- to provide an anonymizing network that resists blocking
- by government-level attackers.
- \end{abstract}
- \section{Introduction and Goals}
- Anonymizing networks such as Tor~\cite{tor-design} bounce traffic around
- a network of relays. They aim to hide not only what is being said, but
- also who is communicating with whom, which users are using which websites,
- and so on. These systems have a broad range of users, including ordinary
- citizens who want to avoid being profiled for targeted advertisements,
- corporations who don't want to reveal information to their competitors,
- and law enforcement and government intelligence agencies who need to do
- operations on the Internet without being noticed.
- Historically, research on anonymizing systems has focused on a passive
- attacker who monitors the user (call her Alice) and tries to discover her
- activities, yet lets her reach any piece of the network. In more modern
- threat models such as Tor's, the adversary is allowed to perform active
- attacks such as modifying communications to trick Alice
- into revealing her destination, or intercepting some connections
- to run a man-in-the-middle attack. But these systems still assume that
- Alice can eventually reach the anonymizing network.
- An increasing number of users are using the Tor software
- less for its anonymity properties than for its censorship
- resistance properties---if they use Tor to access Internet sites like
- Wikipedia
- and Blogspot, they are no longer affected by local censorship
- and firewall rules. In fact, an informal user study (described in
- Appendix~\ref{app:geoip}) showed China as the third largest user base
- for Tor clients, with perhaps ten thousand people accessing the Tor
- network from China each day.
- The current Tor design is easy to block if the attacker controls Alice's
- connection to the Tor network---by blocking the directory authorities,
- by blocking all the server IP addresses in the directory, or by filtering
- based on the signature of the Tor TLS handshake. Here we describe a
- design that builds upon the current Tor network to provide an anonymizing
- network that also resists this blocking. Specifically,
- Section~\ref{sec:adversary} discusses our threat model---that is,
- the assumptions we make about our adversary; Section~\ref{sec:current-tor}
- describes the components of the current Tor design and how they can be
- leveraged for a new blocking-resistant design; Section~\ref{sec:related}
- explains the features and drawbacks of the currently deployed solutions;
- and ...
- % The other motivation is for places where we're concerned they will
- % try to enumerate a list of Tor users. So even if they're not blocking
- % the Tor network, it may be smart to not be visible as connecting to it.
- %And adding more different classes of users and goals to the Tor network
- %improves the anonymity for all Tor users~\cite{econymics,usability:weis2006}.
- % Adding use classes for countering blocking as well as anonymity has
- % benefits too. Should add something about how providing undetected
- % access to Tor would facilitate people talking to, e.g., govt. authorities
- % about threats to public safety etc. in an environment where Tor use
- % is not otherwise widespread and would make one stand out.
- \section{Adversary assumptions}
- \label{sec:adversary}
- The history of blocking-resistance designs is littered with conflicting
- assumptions about what adversaries to expect and what problems are
- in the critical path to a solution. Here we try to enumerate our best
- understanding of the current situation around the world.
- In the traditional security style, we aim to describe a strong
- attacker---if we can defend against this attacker, we inherit protection
- against weaker attackers as well. After all, we want a general design
- that will work for citizens of China, Iran, Thailand, and other censored
- countries; for
- whistleblowers in firewalled corporate network; and for people in
- unanticipated oppressive situations. In fact, by designing with
- a variety of adversaries in mind, we can take advantage of the fact that
- adversaries will be in different stages of the arms race at each location,
- so a server blocked in one locale can still be useful in others.
- We assume there are three main network attacks in use by censors
- currently~\cite{clayton:pet2006}:
- \begin{tightlist}
- \item Block a destination or type of traffic by automatically searching for
- certain strings or patterns in TCP packets.
- \item Block a destination by manually listing its IP address at the
- firewall.
- \item Intercept DNS requests and give bogus responses for certain
- destination hostnames.
- \end{tightlist}
- We assume the network firewall has limited CPU and memory per
- connection~\cite{clayton:pet2006}. Against an adversary who carefully
- examines the contents of every packet, we would need
- some stronger mechanism such as steganography, which introduces its
- own problems~\cite{active-wardens,tcpstego,bar}.
- More broadly, we assume that the authorities are more likely to
- block a given system as its popularity grows. That is, a system
- used by only a few users will probably never be blocked, whereas a
- well-publicized system with many users will receive much more scrutiny.
- We assume that readers of blocked content are not in as much danger
- as publishers. So far in places like China, the authorities mainly go
- after people who publish materials and coordinate organized
- movements~\cite{mackinnon}.
- If they find that a user happens
- to be reading a site that should be blocked, the typical response is
- simply to block the site. Of course, even with an encrypted connection,
- the adversary may be able to distinguish readers from publishers by
- observing whether Alice is mostly downloading bytes or mostly uploading
- them---we discuss this issue more in Section~\ref{subsec:upload-padding}.
- We assume that while various different regimes can coordinate and share
- notes, there will be a time lag between one attacker learning
- how to overcome a facet of our design and other attackers picking it up.
- Similarly, we assume that in the early stages of deployment the insider
- threat isn't as high of a risk, because no attackers have put serious
- effort into breaking the system yet.
- We do not assume that government-level attackers are always uniform across
- the country. For example, there is no single centralized place in China
- that coordinates its specific censorship decisions and steps.
- We assume that our users have control over their hardware and
- software---they don't have any spyware installed, there are no
- cameras watching their screens, etc. Unfortunately, in many situations
- these threats are real~\cite{zuckerman-threatmodels}; yet
- software-based security systems like ours are poorly equipped to handle
- a user who is entirely observed and controlled by the adversary. See
- Section~\ref{subsec:cafes-and-livecds} for more discussion of what little
- we can do about this issue.
- We assume that widespread access to the Internet is economically,
- politically, and/or
- socially valuable to the policymakers of each deployment country. After
- all, if censorship
- is more important than Internet access, the firewall administrators have
- an easy job: they should simply block everything. The corollary to this
- assumption is that we should design so that increased blocking of our
- system results in increased economic damage or public outcry.
- We assume that the user will be able to fetch a genuine
- version of Tor, rather than one supplied by the adversary; see
- Section~\ref{subsec:trust-chain} for discussion on helping the user
- confirm that he has a genuine version and that he can connect to the
- real Tor network.
- \section{Components of the current Tor design}
- \label{sec:current-tor}
- Tor is popular and sees a lot of use. It's the largest anonymity
- network of its kind.
- Tor has attracted more than 800 volunteer-operated routers from around the
- world. Tor protects users by routing their traffic through a multiply
- encrypted ``circuit'' built of a few randomly selected servers, each of which
- can remove only a single layer of encryption. Each server sees only the step
- before it and the step after it in the circuit, and so no single server can
- learn the connection between a user and her chosen communication partners.
- In this section, we examine some of the reasons why Tor has become popular,
- with particular emphasis to how we can take advantage of these properties
- for a blocking-resistance design.
- Tor aims to provide three security properties:
- \begin{tightlist}
- \item 1. A local network attacker can't learn, or influence, your
- destination.
- \item 2. No single router in the Tor network can link you to your
- destination.
- \item 3. The destination, or somebody watching the destination,
- can't learn your location.
- \end{tightlist}
- For blocking-resistance, we care most clearly about the first
- property. But as the arms race progresses, the second property
- will become important---for example, to discourage an adversary
- from volunteering a relay in order to learn that Alice is reading
- or posting to certain websites. The third property helps keep users safe from
- collaborating websites: consider websites and other Internet services
- that have been pressured
- recently into revealing the identity of bloggers~\cite{arrested-bloggers}
- or treating clients differently depending on their network
- location~\cite{google-geolocation}.
- % and cite{goodell-syverson06} once it's finalized.
- The Tor design provides other features as well that are not typically
- present in manual or ad hoc circumvention techniques.
- First, the Tor directory authorities automatically aggregate, test,
- and publish signed summaries of the available Tor routers. Tor clients
- can fetch these summaries to learn which routers are available and
- which routers are suitable for their needs. Directory information is cached
- throughout the Tor network, so once clients have bootstrapped they never
- need to interact with the authorities directly. (To tolerate a minority
- of compromised directory authorities, we use a threshold trust scheme---
- see Section~\ref{subsec:trust-chain} for details.)
- Second, Tor clients can be configured to use any directory authorities
- they want. They use the default authorities if no others are specified,
- but it's easy to start a separate (or even overlapping) Tor network just
- by running a different set of authorities and convincing users to prefer
- a modified client. For example, we could launch a distinct Tor network
- inside China; some users could even use an aggregate network made up of
- both the main network and the China network. (But we should not be too
- quick to create other Tor networks---part of Tor's anonymity comes from
- users behaving like other users, and there are many unsolved anonymity
- questions if different users know about different pieces of the network.)
- Third, in addition to automatically learning from the chosen directories
- which Tor routers are available and working, Tor takes care of building
- paths through the network and rebuilding them as needed. So the user
- never has to know how paths are chosen, never has to manually pick
- working proxies, and so on. More generally, at its core the Tor protocol
- is simply a tool that can build paths given a set of routers. Tor is
- quite flexible about how it learns about the routers and how it chooses
- the paths. Harvard's Blossom project~\cite{blossom-thesis} makes this
- flexibility more concrete: Blossom makes use of Tor not for its security
- properties but for its reachability properties. It runs a separate set
- of directory authorities, its own set of Tor routers (called the Blossom
- network), and uses Tor's flexible path-building to let users view Internet
- resources from any point in the Blossom network.
- Fourth, Tor separates the role of \emph{internal relay} from the
- role of \emph{exit relay}. That is, some volunteers choose just to relay
- traffic between Tor users and Tor routers, and others choose to also allow
- connections to external Internet resources. Because we don't force all
- volunteers to play both roles, we end up with more relays. This increased
- diversity in turn is what gives Tor its security: the more options the
- user has for her first hop, and the more options she has for her last hop,
- the less likely it is that a given attacker will be watching both ends
- of her circuit~\cite{tor-design}. As a bonus, because our design attracts
- more internal relays that want to help out but don't want to deal with
- being an exit relay, we end up with more options for the first hop---the
- one most critical to being able to reach the Tor network.
- Fifth, Tor is sustainable. Zero-Knowledge Systems offered the commercial
- but now defunct Freedom Network~\cite{freedom21-security}, a design with
- security comparable to Tor's, but its funding model relied on collecting
- money from users to pay relay operators. Modern commercial proxy systems
- similarly
- need to keep collecting money to support their infrastructure. On the
- other hand, Tor has built a self-sustaining community of volunteers who
- donate their time and resources. This community trust is rooted in Tor's
- open design: we tell the world exactly how Tor works, and we provide all
- the source code. Users can decide for themselves, or pay any security
- expert to decide, whether it is safe to use. Further, Tor's modularity
- as described above, along with its open license, mean that its impact
- will continue to grow.
- Sixth, Tor has an established user base of hundreds of
- thousands of people from around the world. This diversity of
- users contributes to sustainability as above: Tor is used by
- ordinary citizens, activists, corporations, law enforcement, and
- even government and military users~\cite{tor-use-cases}, and they can
- only achieve their security goals by blending together in the same
- network~\cite{econymics,usability:weis2006}. This user base also provides
- something else: hundreds of thousands of different and often-changing
- addresses that we can leverage for our blocking-resistance design.
- We discuss and adapt these components further in
- Section~\ref{sec:bridges}. But first we examine the strengths and
- weaknesses of other blocking-resistance approaches, so we can expand
- our repertoire of building blocks and ideas.
- \section{Current proxy solutions}
- \label{sec:related}
- Relay-based blocking-resistance schemes generally have two main
- components: a relay component and a discovery component. The relay part
- encompasses the process of establishing a connection, sending traffic
- back and forth, and so on---everything that's done once the user knows
- where she's going to connect. Discovery is the step before that: the
- process of finding one or more usable relays.
- For example, we can divide the pieces of Tor in the previous section
- into the process of building paths and sending
- traffic over them (relay) and the process of learning from the directory
- servers about what routers are available (discovery). With this distinction
- in mind, we now examine several categories of relay-based schemes.
- \subsection{Centrally-controlled shared proxies}
- Existing commercial anonymity solutions (like Anonymizer.com) are based
- on a set of single-hop proxies. In these systems, each user connects to
- a single proxy, which then relays traffic between the user and her
- destination. These public proxy
- systems are typically characterized by two features: they control and
- operate the proxies centrally, and many different users get assigned
- to each proxy.
- In terms of the relay component, single proxies provide weak security
- compared to systems that distribute trust over multiple relays, since a
- compromised proxy can trivially observe all of its users' actions, and
- an eavesdropper only needs to watch a single proxy to perform timing
- correlation attacks against all its users' traffic and thus learn where
- everyone is connecting. Worse, all users
- need to trust the proxy company to have good security itself as well as
- to not reveal user activities.
- On the other hand, single-hop proxies are easier to deploy, and they
- can provide better performance than distributed-trust designs like Tor,
- since traffic only goes through one relay. They're also more convenient
- from the user's perspective---since users entirely trust the proxy,
- they can just use their web browser directly.
- Whether public proxy schemes are more or less scalable than Tor is
- still up for debate: commercial anonymity systems can use some of their
- revenue to provision more bandwidth as they grow, whereas volunteer-based
- anonymity systems can attract thousands of fast relays to spread the load.
- The discovery piece can take several forms. Most commercial anonymous
- proxies have one or a handful of commonly known websites, and their users
- log in to those websites and relay their traffic through them. When
- these websites get blocked (generally soon after the company becomes
- popular), if the company cares about users in the blocked areas, they
- start renting lots of disparate IP addresses and rotating through them
- as they get blocked. They notify their users of new addresses (by email,
- for example). It's an arms race, since attackers can sign up to receive the
- email too, but operators have one nice trick available to them: because they
- have a list of paying subscribers, they can notify certain subscribers
- about updates earlier than others.
- Access control systems on the proxy let them provide service only to
- users with certain characteristics, such as paying customers or people
- from certain IP address ranges.
- Discovery in the face of a government-level firewall is a complex and
- unsolved
- topic, and we're stuck in this same arms race ourselves; we explore it
- in more detail in Section~\ref{sec:discovery}. But first we examine the
- other end of the spectrum---getting volunteers to run the proxies,
- and telling only a few people about each proxy.
- \subsection{Independent personal proxies}
- Personal proxies such as Circumventor~\cite{circumventor} and
- CGIProxy~\cite{cgiproxy} use the same technology as the public ones as
- far as the relay component goes, but they use a different strategy for
- discovery. Rather than managing a few centralized proxies and constantly
- getting new addresses for them as the old addresses are blocked, they
- aim to have a large number of entirely independent proxies, each managing
- its own (much smaller) set of users.
- As the Circumventor site~\cite{circumventor} explains, ``You don't
- actually install the Circumventor \emph{on} the computer that is blocked
- from accessing Web sites. You, or a friend of yours, has to install the
- Circumventor on some \emph{other} machine which is not censored.''
- This tactic has great advantages in terms of blocking-resistance---recall
- our assumption in Section~\ref{sec:adversary} that the attention
- a system attracts from the attacker is proportional to its number of
- users and level of publicity. If each proxy only has a few users, and
- there is no central list of proxies, most of them will never get noticed by
- the censors.
- On the other hand, there's a huge scalability question that so far has
- prevented these schemes from being widely useful: how does the fellow
- in China find a person in Ohio who will run a Circumventor for him? In
- some cases he may know and trust some people on the outside, but in many
- cases he's just out of luck. Just as hard, how does a new volunteer in
- Ohio find a person in China who needs it?
- % another key feature of a proxy run by your uncle is that you
- % self-censor, so you're unlikely to bring abuse complaints onto
- % your uncle. self-censoring clearly has a downside too, though.
- This challenge leads to a hybrid design---centrally-distributed
- personal proxies---which we will investigate in more detail in
- Section~\ref{sec:discovery}.
- \subsection{Open proxies}
- Yet another currently used approach to bypassing firewalls is to locate
- open and misconfigured proxies on the Internet. A quick Google search
- for ``open proxy list'' yields a wide variety of freely available lists
- of HTTP, HTTPS, and SOCKS proxies. Many small companies have sprung up
- providing more refined lists to paying customers.
- There are some downsides to using these open proxies though. First,
- the proxies are of widely varying quality in terms of bandwidth and
- stability, and many of them are entirely unreachable. Second, unlike
- networks of volunteers like Tor, the legality of routing traffic through
- these proxies is questionable: it's widely believed that most of them
- don't realize what they're offering, and probably wouldn't allow it if
- they realized. Third, in many cases the connection to the proxy is
- unencrypted, so firewalls that filter based on keywords in IP packets
- will not be hindered. And last, many users are suspicious that some
- open proxies are a little \emph{too} convenient: are they run by the
- adversary, in which case they get to monitor all the user's requests
- just as single-hop proxies can?
- A distributed-trust design like Tor resolves each of these issues for
- the relay component, but a constantly changing set of thousands of open
- relays is clearly a useful idea for a discovery component. For example,
- users might be able to make use of these proxies to bootstrap their
- first introduction into the Tor network.
- \subsection{JAP}
- Stefan's WPES paper~\cite{koepsell:wpes2004} is probably the closest
- related work, and is
- the starting point for the design in this paper.
- \subsection{steganography}
- infranet
- \subsection{break your sensitive strings into multiple tcp packets;
- ignore RSTs}
- \subsection{Internal caching networks}
- Freenet is deployed inside China and caches outside content.
- \subsection{Skype}
- port-hopping. encryption. voice communications not so susceptible to
- keystroke loggers (even graphical ones).
- \subsection{Tor itself}
- And last, we include Tor itself in the list of current solutions
- to firewalls. Tens of thousands of people use Tor from countries that
- routinely filter their Internet. Tor's website has been blocked in most
- of them. But why hasn't the Tor network been blocked yet?
- We have several theories. The first is the most straightforward: tens of
- thousands of people are simply too few to matter. It may help that Tor is
- perceived to be for experts only, and thus not worth attention yet. The
- more subtle variant on this theory is that we've positioned Tor in the
- public eye as a tool for retaining civil liberties in more free countries,
- so perhaps blocking authorities don't view it as a threat. (We revisit
- this idea when we consider whether and how to publicize a Tor variant
- that improves blocking-resistance---see Section~\ref{subsec:publicity}
- for more discussion.)
- The broader explanation is that the maintainance of most government-level
- filters is aimed at stopping widespread information flow and appearing to be
- in control, not by the impossible goal of blocking all possible ways to bypass
- censorship. Censors realize that there will always
- be ways for a few people to get around the firewall, and as long as Tor
- has not publically threatened their control, they see no urgent need to
- block it yet.
- We should recognize that we're \emph{already} in the arms race. These
- constraints can give us insight into the priorities and capabilities of
- our various attackers.
- \section{The relay component of our blocking-resistant design}
- \label{sec:bridges}
- Section~\ref{sec:current-tor} describes many reasons why Tor is
- well-suited as a building block in our context, but several changes will
- allow the design to resist blocking better. The most critical changes are
- to get more relay addresses, and to distribute them to users differently.
- %We need to address three problems:
- %- adapting the relay component of Tor so it resists blocking better.
- %- Discovery.
- %- Tor's network signature.
- %Here we describe the new pieces we need to add to the current Tor design.
- \subsection{Bridge relays}
- Today, Tor servers operate on less than a thousand distinct IP addresses;
- an adversary
- could enumerate and block them all with little trouble. To provide a
- means of ingress to the network, we need a larger set of entry points, most
- of which an adversary won't be able to enumerate easily. Fortunately, we
- have such a set: the Tor users.
- Hundreds of thousands of people around the world use Tor. We can leverage
- our already self-selected user base to produce a list of thousands of
- often-changing IP addresses. Specifically, we can give them a little
- button in the GUI that says ``Tor for Freedom'', and users who click
- the button will turn into \emph{bridge relays} (or just \emph{bridges}
- for short). They can rate limit relayed connections to 10 KB/s (almost
- nothing for a broadband user in a free country, but plenty for a user
- who otherwise has no access at all), and since they are just relaying
- bytes back and forth between blocked users and the main Tor network, they
- won't need to make any external connections to Internet sites. Because
- of this separation of roles, and because we're making use of software
- that the volunteers have already installed for their own use, we expect
- our scheme to attract and maintain more volunteers than previous schemes.
- As usual, there are new anonymity and security implications from running a
- bridge relay, particularly from letting people relay traffic through your
- Tor client; but we leave this discussion for Section~\ref{sec:security}.
- %...need to outline instructions for a Tor config that will publish
- %to an alternate directory authority, and for controller commands
- %that will do this cleanly.
- \subsection{The bridge directory authority}
- How do the bridge relays advertise their existence to the world? We
- introduce a second new component of the design: a specialized directory
- authority that aggregates and tracks bridges. Bridge relays periodically
- publish server descriptors (summaries of their keys, locations, etc,
- signed by their long-term identity key), just like the relays in the
- ``main'' Tor network, but in this case they publish them only to the
- bridge directory authorities.
- The main difference between bridge authorities and the directory
- authorities for the main Tor network is that the main authorities provide
- a list of every known relay, but the bridge authorities only give
- out a server descriptor if you already know its identity key. That is,
- you can keep up-to-date on a bridge's location and other information
- once you know about it, but you can't just grab a list of all the bridges.
- The identity key, IP address, and directory port for each bridge
- authority ship by default with the Tor software, so the bridge relays
- can be confident they're publishing to the right location, and the
- blocked users can establish an encrypted authenticated channel. See
- Section~\ref{subsec:trust-chain} for more discussion of the public key
- infrastructure and trust chain.
- Bridges use Tor to publish their descriptors privately and securely,
- so even an attacker monitoring the bridge directory authority's network
- can't make a list of all the addresses contacting the authority.
- Bridges may publish to only a subset of the
- authorities, to limit the potential impact of an authority compromise.
- %\subsection{A simple matter of engineering}
- %
- %Although we've described bridges and bridge authorities in simple terms
- %above, some design modifications and features are needed in the Tor
- %codebase to add them. We describe the four main changes here.
- %
- %Firstly, we need to get smarter about rate limiting:
- %Bandwidth classes
- %
- %Secondly, while users can in fact configure which directory authorities
- %they use, we need to add a new type of directory authority and teach
- %bridges to fetch directory information from the main authorities while
- %publishing server descriptors to the bridge authorities. We're most of
- %the way there, since we can already specify attributes for directory
- %authorities:
- %add a separate flag named ``blocking''.
- %
- %Thirdly, need to build paths using bridges as the first
- %hop. One more hole in the non-clique assumption.
- %
- %Lastly, since bridge authorities don't answer full network statuses,
- %we need to add a new way for users to learn the current status for a
- %single relay or a small set of relays---to answer such questions as
- %``is it running?'' or ``is it behaving correctly?'' We describe in
- %Section~\ref{subsec:enclave-dirs} a way for the bridge authority to
- %publish this information without resorting to signing each answer
- %individually.
- \subsection{Putting them together}
- \label{subsec:relay-together}
- If a blocked user knows the identity keys of a set of bridge relays, and
- he has correct address information for at least one of them, he can use
- that one to make a secure connection to the bridge authority and update
- his knowledge about the other bridge relays. He can also use it to make
- secure connections to the main Tor network and directory servers, so he
- can build circuits and connect to the rest of the Internet. All of these
- updates happen in the background: from the blocked user's perspective,
- he just accesses the Internet via his Tor client like always.
- So now we've reduced the problem from how to circumvent the firewall
- for all transactions (and how to know that the pages you get have not
- been modified by the local attacker) to how to learn about a working
- bridge relay.
- There's another catch though. We need to make sure that the network
- traffic we generate by simply connecting to a bridge relay doesn't stand
- out too much.
- %The following section describes ways to bootstrap knowledge of your first
- %bridge relay, and ways to maintain connectivity once you know a few
- %bridge relays.
- % (See Section~\ref{subsec:first-bridge} for a discussion
- %of exactly what information is sufficient to characterize a bridge relay.)
- \section{Hiding Tor's network signatures}
- \label{sec:network-signature}
- \label{subsec:enclave-dirs}
- Currently, Tor uses two protocols for its network communications. The
- main protocol uses TLS for encrypted and authenticated communication
- between Tor instances. The second protocol is standard HTTP, used for
- fetching directory information. All Tor servers listen on their ``ORPort''
- for TLS connections, and some of them opt to listen on their ``DirPort''
- as well, to serve directory information. Tor servers choose whatever port
- numbers they like; the server descriptor they publish to the directory
- tells users where to connect.
- One format for communicating address information about a bridge relay is
- its IP address and DirPort. From there, the user can ask the bridge's
- directory cache for an up-to-date copy of its server descriptor, and
- learn its current circuit keys, its ORPort, and so on.
- However, connecting directly to the directory cache involves a plaintext
- HTTP request. A censor could create a network signature for the request
- and/or its response, thus preventing these connections. To resolve this
- vulnerability, we've modified the Tor protocol so that users can connect
- to the directory cache via the main Tor port---they establish a TLS
- connection with the bridge as normal, and then send a special ``begindir''
- relay command to establish an internal connection to its directory cache.
- Therefore a better way to summarize a bridge's address is by its IP
- address and ORPort, so all communications between the client and the
- bridge will use ordinary TLS. But there are other details that need
- more investigation.
- What port should bridges pick for their ORPort? We currently recommend
- that they listen on port 443 (the default HTTPS port) if they want to
- be most useful, because clients behind standard firewalls will have
- the best chance to reach them. Is this the best choice in all cases,
- or should we encourage some fraction of them pick random ports, or other
- ports commonly permitted through firewalls like 53 (DNS) or 110
- (POP)? Or perhaps we should use other ports where TLS traffic is
- expected, like 993 (IMAPS) or 995 (POP3S). We need more research on our
- potential users, and their current and anticipated firewall restrictions.
- Furthermore, we need to look at the specifics of Tor's TLS handshake.
- Right now Tor uses some predictable strings in its TLS handshakes. For
- example, it sets the X.509 organizationName field to ``Tor'', and it puts
- the Tor server's nickname in the certificate's commonName field. We
- should tweak the handshake protocol so it doesn't rely on any unusual details
- in the certificate, yet it remains secure; the certificate itself
- should be made to resemble an ordinary HTTPS certificate. We should also try
- to make our advertised cipher-suites closer to what an ordinary web server
- would support.
- Tor's TLS handshake uses two-certificate chains: one certificate
- contains the self-signed identity key for
- the router, and the second contains a current TLS key, signed by the
- identity key. We use these to authenticate that we're talking to the right
- router, and to limit the impact of TLS-key exposure. Most (though far from
- all) consumer-oriented HTTPS services provide only a single certificate.
- These extra certificates may help identify Tor's TLS handshake; instead,
- bridges should consider using only a single TLS key certificate signed by
- their identity key, and providing the full value of the identity key in an
- early handshake cell. More significantly, Tor currently has all clients
- present certificates, so that clients are harder to distinguish from servers.
- But in a blocking-resistance environment, clients should not present
- certificates at all.
- Last, what if the adversary starts observing the network traffic even
- more closely? Even if our TLS handshake looks innocent, our traffic timing
- and volume still look different than a user making a secure web connection
- to his bank. The same techniques used in the growing trend to build tools
- to recognize encrypted Bittorrent traffic~\cite{bt-traffic-shaping}
- could be used to identify Tor communication and recognize bridge
- relays. Rather than trying to look like encrypted web traffic, we may be
- better off trying to blend with some other encrypted network protocol. The
- first step is to compare typical network behavior for a Tor client to
- typical network behavior for various other protocols. This statistical
- cat-and-mouse game is made more complex by the fact that Tor transports a
- variety of protocols, and we'll want to automatically handle web browsing
- differently from, say, instant messaging.
- % Tor cells are 512 bytes each. So TLS records will be roughly
- % multiples of this size? How bad is this? -RD
- % Look at ``Inferring the Source of Encrypted HTTP Connections''
- % by Marc Liberatore and Brian Neil Levine (CCS 2006)
- % They substantially flesh out the numbers for the web fingerprinting
- % attack. -PS
- % Yes, but I meant detecting the signature of Tor traffic itself, not
- % learning what websites we're going to. I wouldn't be surprised to
- % learn that these are related problems, but it's not obvious to me. -RD
- \subsection{Identity keys as part of addressing information}
- We have described a way for the blocked user to bootstrap into the
- network once he knows the IP address and ORPort of a bridge. What about
- local spoofing attacks? That is, since we never learned an identity
- key fingerprint for the bridge, a local attacker could intercept our
- connection and pretend to be the bridge we had in mind. It turns out
- that giving false information isn't that bad---since the Tor client
- ships with trusted keys for the bridge directory authority and the Tor
- network directory authorities, the user can learn whether he's being
- given a real connection to the bridge authorities or not. (After all,
- if the adversary intercepts every connection the user makes and gives
- him a bad connection each time, there's nothing we can do.)
- What about anonymity-breaking attacks from observing traffic, if the
- blocked user doesn't start out knowing the identity key of his intended
- bridge? The vulnerabilities aren't so bad in this case either---the
- adversary could do similar attacks just by monitoring the network
- traffic.
- % cue paper by steven and george
- Once the Tor client has fetched the bridge's server descriptor, it should
- remember the identity key fingerprint for that bridge relay. Thus if
- the bridge relay moves to a new IP address, the client can query the
- bridge directory authority to look up a fresh server descriptor using
- this fingerprint.
- So we've shown that it's \emph{possible} to bootstrap into the network
- just by learning the IP address and ORPort of a bridge, but are there
- situations where it's more convenient or more secure to learn the bridge's
- identity fingerprint as well as instead, while bootstrapping? We keep
- that question in mind as we next investigate bootstrapping and discovery.
- \section{Discovering and maintaining working bridge relays}
- \label{sec:discovery}
- Tor's modular design means that we can develop a better relay component
- independently of developing the discovery component. This modularity's
- great promise is that we can pick any discovery approach we like; but the
- unfortunate fact is that we have no magic bullet for discovery. We're
- in the same arms race as all the other designs we described in
- Section~\ref{sec:related}.
- In this section we describe three approaches to adding discovery
- components for our design. Note that we should deploy all the schemes
- at once---bridges and blocked users can then use the discovery approach
- that is most appropriate for their situation.
- \subsection{Independent bridges, no central discovery}
- The first design is simply to have no centralized discovery component at
- all. Volunteers run bridges, and we assume they have some blocked users
- in mind and communicate their address information to them out-of-band
- (for example, through Gmail). This design allows for small personal
- bridges that have only one or a handful of users in mind, but it can
- also support an entire community of users. For example, Citizen Lab's
- upcoming Psiphon single-hop proxy tool~\cite{psiphon} plans to use this
- \emph{social network} approach as its discovery component.
- There are several ways to do bootstrapping in this design. In the simple
- case, the operator of the bridge informs each chosen user about his
- bridge's address information and/or keys. A different approach involves
- blocked users introducing new blocked users to the bridges they know.
- That is, somebody in the blocked area can pass along a bridge's address to
- somebody else they trust. This scheme brings in appealing but complex game
- theoretic properties: the blocked user making the decision has an incentive
- only to delegate to trustworthy people, since an adversary who learns
- the bridge's address and filters it makes it unavailable for both of them.
- Note that a central set of bridge directory authorities can still be
- compatible with a decentralized discovery process. That is, how users
- first learn about bridges is entirely up to the bridges, but the process
- of fetching up-to-date descriptors for them can still proceed as described
- in Section~\ref{sec:bridges}. Of course, creating a central place that
- knows about all the bridges may not be smart, especially if every other
- piece of the system is decentralized. Further, if a user only knows
- about one bridge and he loses track of it, it may be quite a hassle to
- reach the bridge authority. We address these concerns next.
- \subsection{Families of bridges, no central discovery}
- Because the blocked users are running our software too, we have many
- opportunities to improve usability or robustness. Our second design builds
- on the first by encouraging volunteers to run several bridges at once
- (or coordinate with other bridge volunteers), such that some fraction
- of the bridges are likely to be available at any given time.
- The blocked user's Tor client would periodically fetch an updated set of
- recommended bridges from any of the working bridges. Now the client can
- learn new additions to the bridge pool, and can expire abandoned bridges
- or bridges that the adversary has blocked, without the user ever needing
- to care. To simplify maintenance of the community's bridge pool, each
- community could run its own bridge directory authority---reachable via
- the available bridges, and also mirrored at each bridge.
- \subsection{Public bridges with central discovery}
- What about people who want to volunteer as bridges but don't know any
- suitable blocked users? What about people who are blocked but don't
- know anybody on the outside? Here we describe a way to make use of these
- \emph{public bridges} in a way that still makes it hard for the attacker
- to learn all of them.
- The basic idea is to divide public bridges into a set of buckets based on
- identity key, where each bucket has a different policy for distributing
- its bridge addresses to users. Each of these \emph{distribution policies}
- is designed to exercise a different scarce resource or property of
- the user.
- How do we divide bridges into buckets such that they're evenly distributed
- and the allocation is hard to influence or predict, but also in a way
- that's amenable to creating more buckets later on without reshuffling
- all the bridges? We compute the bucket for a given bridge by hashing the
- bridge's identity key along with a secret that only the bridge authority
- knows: the first $n$ bits of this hash dictate the bucket number,
- where $n$ is a parameter that describes how many buckets we want at this
- point. We choose $n=3$ to start, so we have 8 buckets available; but as
- we later invent new distribution policies, we can increment $n$ to split
- the 8 into 16 buckets. Since a bridge can't predict the next bit in its
- hash, it can't anticipate which identity key will correspond to a certain
- bucket when the buckets are split. Further, since the bridge authority
- doesn't provide any feedback to the bridge about which bucket it's in,
- an adversary signing up bridges to fill a certain bucket will be slowed.
- % This algorithm is not ideal. When we split buckets, each existing
- % bucket is cut in half, where half the bridges remain with the
- % old distribution policy, and half will be under what the new one
- % is. So the new distribution policy inherits a bunch of blocked
- % bridges if the old policy was too loose, or a bunch of unblocked
- % bridges if its policy was still secure. -RD
- %
- %
- % Having talked to Roger on the phone, I realized that the following
- % paragraph was based on completely misunderstanding ``bucket'' as
- % used here. But as per his request, I'm leaving it in in case it
- % guides rewording so that equally careless readers are less likely
- % to go astray. -PFS
- %
- % I don't understand this adversary. Why do we care if an adversary
- % fills a particular bucket if bridge requests are returned from
- % random buckets? Put another way, bridge requests _should_ be returned
- % from unpredictable buckets because we want to be resilient against
- % whatever optimal distribution of adversary bridges an adversary manages
- % to arrange. (Cf. casc-rep) I think it should be more chordlike.
- % Bridges are allocated to wherever on the ring which is divided
- % into arcs (buckets).
- % If a bucket gets too full, you can just split it.
- % More on this below. -PFS
- The first distribution policy (used for the first bucket) publishes bridge
- addresses in a time-release fashion. The bridge authority divides the
- available bridges into partitions which are deterministically available
- only in certain time windows. That is, over the course of a given time
- slot (say, an hour), each requestor is given a random bridge from within
- that partition. When the next time slot arrives, a new set of bridges
- are available for discovery. Thus a bridge is always available when a new
- user arrives, but to learn about all bridges the attacker needs to fetch
- the new addresses at every new time slot. By varying the length of the
- time slots, we can make it harder for the attacker to guess when to check
- back. We expect these bridges will be the first to be blocked, but they'll
- help the system bootstrap until they \emph{do} get blocked. Further,
- remember that we're dealing with different blocking regimes around the
- world that will progress at different rates---so this bucket will still
- be useful to some users even as the arms race progresses.
- The second distribution policy publishes bridge addresses based on the IP
- address of the requesting user. Specifically, the bridge authority will
- divide the available bridges in the bucket into a bunch of partitions
- (as in the first distribution scheme), hash the requestor's IP address
- with a secret of its own (as in the above allocation scheme for creating
- buckets), and give the requestor a random bridge from the appropriate
- partition. To raise the bar, we should discard the last octet of the
- IP address before inputting it to the hash function, so an attacker
- who only controls a ``/24'' address only counts as one user. A large
- attacker like China will still be able to control many addresses, but
- the hassle of needing to establish connections from each network (or
- spoof TCP connections) may still slow them down. (We could also imagine
- a policy that combines the time-based and location-based policies to
- further constrain and rate-limit the available bridge addresses.)
- The third policy is based on Circumventor's discovery strategy. Realizing
- that its adoption will remain limited without some central coordination
- mechanism, the Circumventor project has started a mailing list to
- distribute new proxy addresses every few days. From experimentation it
- seems they have concluded that sending updates every three or four days
- is sufficient to stay ahead of the current attackers. We could give out
- bridge addresses from the third bucket in a similar fashion
- The fourth policy provides an alternative approach to a mailing list:
- users provide an email address, and receive an automated response
- listing an available bridge address. We could limit one response per
- email address. To further rate limit queries, we could require a CAPTCHA
- solution~\cite{captcha} in each case too. In fact, we wouldn't need to
- implement the CAPTCHA on our side: if we only deliver bridge addresses
- to Yahoo or GMail addresses, we can leverage the rate-limiting schemes
- that other parties already impose for account creation.
- The fifth policy ties in
- ...
- reputation system
- Pick some seeds---trusted people in the blocked area---and give
- them each a few hundred bridge addresses. Run a website next to the
- bridge authority, where they can log in (they only need persistent
- pseudonyms). Give them tokens slowly over time. They can use these
- tokens to delegate trust to other people they know. The tokens can
- be exchanged for new accounts on the website.
- Accounts in ``good standing'' accrue new bridge addresses and new
- tokens.
- This is great, except how do we decide that an account is in good
- standing? One answer is to measure based on whether the bridge addresses
- we give it end up blocked. But how do we decide if they get blocked?
- Other questions below too.
- \ref{sec:accounts}
- Buckets six through eight are held in reserve, in case our currently
- deployed tricks all fail at once---so we can adapt and move to
- new approaches quickly, and have some bridges available for the new
- schemes. (Bridges that sign up and don't get used yet may be unhappy that
- they're not being used; but this is a transient problem: if bridges are
- on by default, nobody will mind not being used yet.)
- \subsubsection{Public Bridges with Coordinated Discovery}
- ****Pretty much this whole subsubsection will probably need to be
- deferred until ``later'' and moved to after end document, but I'm leaving
- it here for now in case useful.******
- Rather than be entirely centralized, we can have a coordinated
- collection of bridge authorities, analogous to how Tor network
- directory authorities now work.
- Key components
- ``Authorities'' will distribute caches of what they know to overlapping
- collections of nodes so that no one node is owned by one authority.
- Also so that it is impossible to DoS info maintained by one authority
- simply by making requests to it.
- Where a bridge gets assigned is not predictable by the bridge?
- If authorities don't know the IP addresses of the bridges they
- are responsible for, they can't abuse that info (or be attacked for
- having it). But, they also can't, e.g., control being sent massive
- lists of nodes that were never good. This raises another question.
- We generally decry use of IP address for location, etc. but we
- need to do that to limit the introduction of functional but useless
- IP addresses because, e.g., they are in China and the adversary
- owns massive chunks of the IP space there.
- We don't want an arbitrary someone to be able to contact the
- authorities and say an IP address is bad because it would be easy
- for an adversary to take down all the suspicious bridges
- even if they provide good cover websites, etc. Only the bridge
- itself and/or the directory authority can declare a bridge blocked
- from somewhere.
- 9. Bridge directories must not simply be a handful of nodes that
- provide the list of bridges. They must flood or otherwise distribute
- information out to other Tor nodes as mirrors. That way it becomes
- difficult for censors to flood the bridge directory servers with
- requests, effectively denying access for others. But, there's lots of
- churn and a much larger size than Tor directories. We are forced to
- handle the directory scaling problem here much sooner than for the
- network in general. Authorities can pass their bridge directories
- (and policy info) to some moderate number of unidentified Tor nodes.
- Anyone contacting one of those nodes can get bridge info. the nodes
- must remain somewhat synched to prevent the adversary from abusing,
- e.g., a timed release policy or the distribution to those nodes must
- be resilient even if they are not coordinating.
- I think some kind of DHT like scheme would work here. A Tor node is
- assigned a chunk of the directory. Lookups in the directory should be
- via hashes of keys (fingerprints) and that should determine the Tor
- nodes responsible. Ordinary directories can publish lists of Tor nodes
- responsible for fingerprint ranges. Clients looking to update info on
- some bridge will make a Tor connection to one of the nodes responsible
- for that address. Instead of shutting down a circuit after getting
- info on one address, extend it to another that is responsible for that
- address (the node from which you are extending knows you are doing so
- anyway). Keep going. This way you can amortize the Tor connection.
- 10. We need some way to give new identity keys out to those who need
- them without letting those get immediately blocked by authorities. One
- way is to give a fingerprint that gets you more fingerprints, as
- already described. These are meted out/updated periodically but allow
- us to keep track of which sources are compromised: if a distribution
- fingerprint repeatedly leads to quickly blocked bridges, it should be
- suspect, dropped, etc. Since we're using hashes, there shouldn't be a
- correlation with bridge directory mirrors, bridges, portions of the
- network observed, etc. It should just be that the authorities know
- about that key that leads to new addresses.
- This last point is very much like the issues in the valet nodes paper,
- which is essentially about blocking resistance wrt exiting the Tor network,
- while this paper is concerned with blocking the entering to the Tor network.
- In fact the tickets used to connect to the IPo (Introduction Point),
- could serve as an example, except that instead of authorizing
- a connection to the Hidden Service, it's authorizing the downloading
- of more fingerprints.
- Also, the fingerprints can follow the hash(q + '1' + cookie) scheme of
- that paper (where q = hash(PK + salt) gave the q.onion address). This
- allows us to control and track which fingerprint was causing problems.
- Note that, unlike many settings, the reputation problem should not be
- hard here. If a bridge says it is blocked, then it might as well be.
- If an adversary can say that the bridge is blocked wrt
- $\mathit{censor}_i$, then it might as well be, since
- $\mathit{censor}_i$ can presumably then block that bridge if it so
- chooses.
- 11. How much damage can the adversary do by running nodes in the Tor
- network and watching for bridge nodes connecting to it? (This is
- analogous to an Introduction Point watching for Valet Nodes connecting
- to it.) What percentage of the network do you need to own to do how
- much damage. Here the entry-guard design comes in helpfully. So we
- need to have bridges use entry-guards, but (cf. 3 above) not use
- bridges as entry-guards. Here's a serious tradeoff (again akin to the
- ratio of valets to IPos) the more bridges/client the worse the
- anonymity of that client. The fewer bridges/client the worse the
- blocking resistance of that client.
- \subsubsection{Bootstrapping: finding your first bridge.}
- \label{subsec:first-bridge}
- How do users find their first public bridge, so they can reach the
- bridge authority to learn more?
- Most government firewalls are not perfect. That is, they allow connections to
- Google cache or some open proxy servers, or they let file-sharing traffic or
- Skype or World-of-Warcraft connections through. We assume that the
- users have some mechanism for bypassing the firewall initially.
- For users who can't use any of these techniques, hopefully they know
- a friend who can---for example, perhaps the friend already knows some
- bridge relay addresses.
- (If they can't get around it at all, then we can't help them---they
- should go meet more people.)
- Is it useful to load balance which bridges are handed out? The above
- bucket concept makes some bridges wildly popular and others less so.
- But I guess that's the point.
- Families of bridges: give out 4 or 8 at once, bound together.
- \subsection{Advantages of deploying all solutions at once}
- For once we're not in the position of the defender: we don't have to
- defend against every possible filtering scheme, we just have to defend
- against at least one.
- \subsection{Remaining unsorted notes}
- In the first subsection we describe how to find a first bridge.
- Thus they can reach the BDA. From here we either assume a social
- network or other mechanism for learning IP:dirport or key fingerprints
- as above, or we assume an account server that allows us to limit the
- number of new bridge relays an external attacker can discover.
- Going to be an arms race. Need a bag of tricks. Hard to say
- which ones will work. Don't spend them all at once.
- Some techniques are sufficient to get us an IP address and a port,
- and others can get us IP:port:key. Lay out some plausible options
- for how users can bootstrap into learning their first bridge.
- attack: adversary can reconstruct your social network by learning who
- knows which bridges.
- %\section{The account / reputation system}
- \section{Social networks with directory-side support}
- \label{sec:accounts}
- Perhaps each bridge should be known by a single bridge directory
- authority. This makes it easier to trace which users have learned about
- it, so easier to blame or reward. It also makes things more brittle,
- since loss of that authority means its bridges aren't advertised until
- they switch, and means its bridge users are sad too.
- (Need a slick hash algorithm that will map our identity key to a
- bridge authority, in a way that's sticky even when we add bridge
- directory authorities, but isn't sticky when our authority goes
- away. Does this exist?)
- \subsection{Discovery based on social networks}
- A token that can be exchanged at the bridge authority (assuming you
- can reach it) for a new bridge address.
- The account server runs as a Tor controller for the bridge authority.
- Users can establish reputations, perhaps based on social network
- connectivity, perhaps based on not getting their bridge relays blocked,
- Probably the most critical lesson learned in past work on reputation
- systems in privacy-oriented environments~\cite{rep-anon} is the need for
- verifiable transactions. That is, the entity computing and advertising
- reputations for participants needs to actually learn in a convincing
- way that a given transaction was successful or unsuccessful.
- (Lesson from designing reputation systems~\cite{rep-anon}: easy to
- reward good behavior, hard to punish bad behavior.
- \subsection{How do we know if a bridge relay has been blocked?}
- We need some mechanism for testing reachability from inside the
- blocked area.
- The easiest answer is for certain users inside the area to sign up as
- testing relays, and then we can route through them and see if it works.
- First problem is that different network areas block different net masks,
- and it will likely be hard to know which users are in which areas. So
- if a bridge relay isn't reachable, is that because of a network block
- somewhere, because of a problem at the bridge relay, or just a temporary
- outage?
- Second problem is that if we pick random users to test random relays, the
- adversary should sign up users on the inside, and enumerate the relays
- we test. But it seems dangerous to just let people come forward and
- declare that things are blocked for them, since they could be tricking
- us. (This matters even moreso if our reputation system above relies on
- whether things get blocked to punish or reward.)
- Another answer is not to measure directly, but rather let the bridges
- report whether they're being used. If they periodically report to their
- bridge directory authority how much use they're seeing, the authority
- can make smart decisions from there.
- If they install a geoip database, they can periodically report to their
- bridge directory authority which countries they're seeing use from. This
- might help us to track which countries are making use of Ramp, and can
- also let us learn about new steps the adversary has taken in the arms
- race. (If the bridges don't want to install a whole geoip subsystem, they
- can report samples of the /24 network for their users, and the authorities
- can do the geoip work. This tradeoff has clear downsides though.)
- Worry: adversary signs up a bunch of already-blocked bridges. If we're
- stingy giving out bridges, users in that country won't get useful ones.
- (Worse, we'll blame the users when the bridges report they're not
- being used?)
- Worry: the adversary could choose not to block bridges but just record
- connections to them. So be it, I guess.
- \subsection{How to learn how well the whole idea is working}
- We need some feedback mechanism to learn how much use the bridge network
- as a whole is actually seeing. Part of the reason for this is so we can
- respond and adapt the design; part is because the funders expect to see
- progress reports.
- The above geoip-based approach to detecting blocked bridges gives us a
- solution though.
- \section{Security considerations}
- \label{sec:security}
- \subsection{Possession of Tor in oppressed areas}
- Many people speculate that installing and using a Tor client in areas with
- particularly extreme firewalls is a high risk---and the risk increases
- as the firewall gets more restrictive. This is probably true, but there's
- a counter pressure as well: as the firewall gets more restrictive, more
- ordinary people use Tor for more mainstream activities, such as learning
- about Wall Street prices or looking at pictures of women's ankles. So
- if the restrictive firewall pushes up the number of Tor users, then the
- ``typical'' Tor user becomes more mainstream.
- Hard to say which of these pressures will ultimately win out.
- ...
- % Nick can rewrite/elaborate on this section?
- \subsection{Observers can tell who is publishing and who is reading}
- \label{subsec:upload-padding}
- Should bridge users sometimes send bursts of long-range drop cells?
- \subsection{Anonymity effects from acting as a bridge relay}
- Against some attacks, relaying traffic for others can improve anonymity. The
- simplest example is an attacker who owns a small number of Tor servers. He
- will see a connection from the bridge, but he won't be able to know
- whether the connection originated there or was relayed from somebody else.
- There are some cases where it doesn't seem to help: if an attacker can
- watch all of the bridge's incoming and outgoing traffic, then it's easy
- to learn which connections were relayed and which started there. (In this
- case he still doesn't know the final destinations unless he is watching
- them too, but in this case bridges are no better off than if they were
- an ordinary client.)
- There are also some potential downsides to running a bridge. First, while
- we try to make it hard to enumerate all bridges, it's still possible to
- learn about some of them, and for some people just the fact that they're
- running one might signal to an attacker that they place a high value
- on their anonymity. Second, there are some more esoteric attacks on Tor
- relays that are not as well-understood or well-tested---for example, an
- attacker may be able to ``observe'' whether the bridge is sending traffic
- even if he can't actually watch its network, by relaying traffic through
- it and noticing changes in traffic timing~\cite{attack-tor-oak05}. On
- the other hand, it may be that limiting the bandwidth the bridge is
- willing to relay will allow this sort of attacker to determine if it's
- being used as a bridge but not whether it is adding traffic of its own.
- It is an open research question whether the benefits outweigh the risks. A
- lot of the decision rests on which attacks the users are most worried
- about. For most users, we don't think running a bridge relay will be
- that damaging.
- Need to examine how entry guards fit in. If the blocked user doesn't use
- the bridge's entry guards, then the bridge doesn't gain as much cover
- benefit. If he does, first how does that actually work, and second is
- it turtles all the way down (need to use the guard's guards, ...)?
- \subsection{Trusting local hardware: Internet cafes and LiveCDs}
- \label{subsec:cafes-and-livecds}
- Assuming that users have their own trusted hardware is not
- always reasonable.
- For Internet cafe Windows computers that let you attach your own USB key,
- a USB-based Tor image would be smart. There's Torpark, and hopefully
- there will be more thoroughly analyzed options down the road. Worries
- about hardware or
- software keyloggers and other spyware---and physical surveillance.
- If the system lets you boot from a CD or from a USB key, you can gain
- a bit more security by bringing a privacy LiveCD with you. Hardware
- keyloggers and physical surveillance still a worry. LiveCDs also useful
- if it's your own hardware, since it's easier to avoid leaving breadcrumbs
- everywhere.
- \subsection{Forward compatibility and retiring bridge authorities}
- Eventually we'll want to change the identity key and/or location
- of a bridge authority. How do we do this mostly cleanly?
- \subsection{The trust chain}
- \label{subsec:trust-chain}
- Tor's ``public key infrastructure'' provides a chain of trust to
- let users verify that they're actually talking to the right servers.
- There are four pieces to this trust chain.
- First, when Tor clients are establishing circuits, at each step
- they demand that the next Tor server in the path prove knowledge of
- its private key~\cite{tor-design}. This step prevents the first node
- in the path from just spoofing the rest of the path. Second, the
- Tor directory authorities provide a signed list of servers along with
- their public keys---so unless the adversary can control a threshold
- of directory authorities, he can't trick the Tor client into using other
- Tor servers. Third, the location and keys of the directory authorities,
- in turn, is hard-coded in the Tor source code---so as long as the user
- got a genuine version of Tor, he can know that he is using the genuine
- Tor network. And last, the source code and other packages are signed
- with the GPG keys of the Tor developers, so users can confirm that they
- did in fact download a genuine version of Tor.
- But how can a user in an oppressed country know that he has the correct
- key fingerprints for the developers? As with other security systems, it
- ultimately comes down to human interaction. The keys are signed by dozens
- of people around the world, and we have to hope that our users have met
- enough people in the PGP web of trust~\cite{pgp-wot} that they can learn
- the correct keys. For users that aren't connected to the global security
- community, though, this question remains a critical weakness.
- % XXX make clearer the trust chain step for bridge directory authorities
- \subsection{Security through obscurity: publishing our design}
- Many other schemes like dynaweb use the typical arms race strategy of
- not publishing their plans. Our goal here is to produce a design---a
- framework---that can be public and still secure. Where's the tradeoff?
- \section{Performance improvements}
- \label{sec:performance}
- \subsection{Fetch server descriptors just-in-time}
- I guess we should encourage most places to do this, so blocked
- users don't stand out.
- network-status and directory optimizations. caching better. partitioning
- issues?
- \section{Maintaining reachability}
- \subsection{How many bridge relays should you know about?}
- If they're ordinary Tor users on cable modem or DSL, many of them will
- disappear and/or move periodically. How many bridge relays should a
- blockee know
- about before he's likely to have at least one reachable at any given point?
- How do we factor in a parameter for "speed that his bridges get discovered
- and blocked"?
- The related question is: if the bridge relays change IP addresses
- periodically, how often does the bridge user need to "check in" in order
- to keep from being cut out of the loop?
- \subsection{Cablemodem users don't provide important websites}
- \label{subsec:block-cable}
- ...so our adversary could just block all DSL and cablemodem networks,
- and for the most part only our bridge relays would be affected.
- The first answer is to aim to get volunteers both from traditionally
- ``consumer'' networks and also from traditionally ``producer'' networks.
- The second answer (not so good) would be to encourage more use of consumer
- networks for popular and useful websites. (But P2P exists; minor websites
- exist; gaming exists; IM exists; ...)
- Other attack: China pressures Verizon to discourage its users from
- running bridges.
- \subsection{Scanning-resistance}
- If it's trivial to verify that we're a bridge, and we run on a predictable
- port, then it's conceivable our attacker would scan the whole Internet
- looking for bridges. (In fact, he can just scan likely networks like
- cablemodem and DSL services---see Section~\ref{block-cable} for a related
- attack.) It would be nice to slow down this attack. It would
- be even nicer to make it hard to learn whether we're a bridge without
- first knowing some secret.
- Password protecting the bridges.
- Could provide a password to the bridge user. He provides a nonced hash of
- it or something when he connects. We'd need to give him an ID key for the
- bridge too, and wait to present the password until we've TLSed, else the
- adversary can pretend to be the bridge and MITM him to learn the password.
- We could some kind of ID-based knocking protocol, or we could act like an
- unconfigured HTTPS server if treated like one.
- We can assume that the attacker can easily recognize https connections
- to unknown servers. It can then attempt to connect to them and block
- connections to servers that seem suspicious. It may be that password
- protected web sites will not be suspicious in general, in which case
- that may be the easiest way to give controlled access to the bridge.
- If such sites that have no other overt features are automatically
- blocked when detected, then we may need to be more subtle.
- Possibilities include serving an innocuous web page if a TLS encrypted
- request is received without the authorization needed to access the Tor
- network and only responding to a requested access to the Tor network
- of proper authentication is given. If an unauthenticated request to
- access the Tor network is sent, the bridge should respond as if
- it has received a message it does not understand (as would be the
- case were it not a bridge).
- \subsection{How to motivate people to run bridge relays}
- One of the traditional ways to get people to run software that benefits
- others is to give them motivation to install it themselves. An often
- suggested approach is to install it as a stunning screensaver so everybody
- will be pleased to run it. We take a similar approach here, by leveraging
- the fact that these users are already interested in protecting their
- own Internet traffic, so they will install and run the software.
- Make all Tor users become bridges if they're reachable---needs more work
- on usability first, but we're making progress.
- Also, we can make a snazzy network graph with Vidalia that emphasizes
- the connections the bridge user is currently relaying. (Minor anonymity
- implications, but hey.) (In many cases there won't be much activity,
- so this may backfire. Or it may be better suited to full-fledged Tor
- servers.)
- \subsection{What if the clients can't install software?}
- [this section should probably move to the related work section,
- or just disappear entirely.]
- Bridge users without Tor software
- Bridge relays could always open their socks proxy. This is bad though,
- first
- because bridges learn the bridge users' destinations, and second because
- we've learned that open socks proxies tend to attract abusive users who
- have no idea they're using Tor.
- Bridges could require passwords in the socks handshake (not supported
- by most software including Firefox). Or they could run web proxies
- that require authentication and then pass the requests into Tor. This
- approach is probably a good way to help bootstrap the Psiphon network,
- if one of its barriers to deployment is a lack of volunteers willing
- to exit directly to websites. But it clearly drops some of the nice
- anonymity and security features Tor provides.
- A hybrid approach where the user gets his anonymity from Tor but his
- software-less use from a web proxy running on a trusted machine on the
- free side.
- \subsection{Publicity attracts attention}
- \label{subsec:publicity}
- Many people working on this field want to publicize the existence
- and extent of censorship concurrently with the deployment of their
- circumvention software. The easy reason for this two-pronged push is
- to attract volunteers for running proxies in their systems; but in many
- cases their main goal is not to build the software, but rather to educate
- the world about the censorship. The media also tries to do its part by
- broadcasting the existence of each new circumvention system.
- But at the same time, this publicity attracts the attention of the
- censors. We can slow down the arms race by not attracting as much
- attention, and just spreading by word of mouth. If our goal is to
- establish a solid social network of bridges and bridge users before
- the adversary gets involved, does this attention tradeoff work to our
- advantage?
- \subsection{The Tor website: how to get the software}
- \section{Future designs}
- \subsection{Bridges inside the blocked network too}
- Assuming actually crossing the firewall is the risky part of the
- operation, can we have some bridge relays inside the blocked area too,
- and more established users can use them as relays so they don't need to
- communicate over the firewall directly at all? A simple example here is
- to make new blocked users into internal bridges also---so they sign up
- on the BDA as part of doing their query, and we give out their addresses
- rather than (or along with) the external bridge addresses. This design
- is a lot trickier because it brings in the complexity of whether the
- internal bridges will remain available, can maintain reachability with
- the outside world, etc.
- Hidden services as bridges. Hidden services as bridge directory authorities.
- \section{Conclusion}
- a technical solution won't solve the whole problem. after all, china's
- firewall is *socially* very successful, even if technologies exist to
- get around it.
- but having a strong technical solution is still useful as a piece of the
- puzzle.
- \bibliographystyle{plain} \bibliography{tor-design}
- \appendix
- \section{Counting Tor users by country}
- \label{app:geoip}
- \end{document}
- ship geoip db to bridges. they look up users who tls to them in the db,
- and upload a signed list of countries and number-of-users each day. the
- bridge authority aggregates them and publishes stats.
- bridge relays have buddies
- they ask a user to test the reachability of their buddy.
- leaks O(1) bridges, but not O(n).
- we should not be blockable by ordinary cisco censorship features.
- that is, if they want to block our new design, they will need to
- add a feature to block exactly this.
- strategically speaking, this may come in handy.
- Bridges come in clumps of 4 or 8 or whatever. If you know one bridge
- in a clump, the authority will tell you the rest. Now bridges can
- ask users to test reachability of their buddies.
- Giving out clumps helps with dynamic IP addresses too. Whether it
- should be 4 or 8 depends on our churn.
- the account server. let's call it a database, it doesn't have to
- be a thing that human interacts with.
- rate limiting mechanisms:
- energy spent. captchas. relaying traffic for others?
- send us \$10, we'll give you an account
- so how do we reward people for being good?
|