123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157 |
- Filename: 127-dirport-mirrors-downloads.txt
- Title: Relaying dirport requests to Tor download site / website
- Version: $Revision$
- Last-Modified: $Date$
- Author: Roger Dingledine
- Created: 2007-12-02
- Status: Draft
- 1. Overview
- Some countries and networks block connections to the Tor website. As
- time goes by, this will remain a problem and it may even become worse.
- We have a big pile of mirrors (google for "Tor mirrors"), but few of
- our users think to try a search like that. Also, many of these mirrors
- might be automatically blocked since their pages contain words that
- might cause them to get banned. And lastly, we can imagine a future
- where the blockers are aware of the mirror list too.
- Here we describe a new set of URLs for Tor's DirPort that will relay
- connections from users to the official Tor download site. Rather than
- trying to cache a bunch of new Tor packages (which is a hassle in terms
- of keeping them up to date, and a hassle in terms of drive space used),
- we instead just proxy the requests directly to Tor's /dist page.
- Specifically, we should support
- GET /tor/dist/$1
- and
- GET /tor/website/$1
- 2. Direct connections, one-hop circuits, or three-hop circuits?
- We could relay the connections directly to the download site -- but
- this produces recognizable outgoing traffic on the bridge or cache's
- network, which will probably surprise our nice volunteers. (Is this
- a good enough reason to discard the direct connection idea?)
- Even if we don't do direct connections, should we do a one-hop
- begindir-style connection to the mirror site (make a one-hop circuit
- to it, then send a 'begindir' cell down the circuit), or should we do
- a normal three-hop anonymized connection?
- If these mirrors are mainly bridges, doing either a direct or a one-hop
- connection creates another way to enumerate bridges. That would argue
- for three-hop. On the other hand, downloading a 10+ megabyte installer
- through a normal Tor circuit can't be fun. But if you're already getting
- throttled a lot because you're in the "relayed traffic" bucket, you're
- going to have to accept a slow transfer anyway. So three-hop it is.
- Speaking of which, we would want to label this connection
- as "relay" traffic for the purposes of rate limiting; see
- connection_counts_as_relayed_traffic() and or_conn->client_used. This
- will be a bit tricky though, because these connections will use the
- bridge's guards.
- 3. Scanning resistance
- One other goal we'd like to achieve, or at least not hinder, is making
- it hard to scan large swaths of the Internet to look for responses
- that indicate a bridge.
- In general this is a really hard problem, so we shouldn't demand to
- solve it here. But we can note that some bridges should open their
- DirPort (and offer this functionality), and others shouldn't. Then
- some bridges provide a download mirror while others can remain
- scanning-resistant.
- 4. Integrity checking
- If we serve this stuff in plaintext from the bridge, anybody in between
- the user and the bridge can intercept and modify it. The bridge can too.
- If we do an anonymized three-hop connection, the exit node can also
- intercept and modify the exe it sends back.
- Are we setting ourselves up for rogue exit relays, or rogue bridges,
- that trojan our users?
- Answer #1: Users need to do pgp signature checking. Not a very good
- answer, a) because it's complex, and b) because they don't know the
- right signing keys in the first place.
- Answer #2: The mirrors could exit from a specific Tor relay, using the
- '.exit' notation. This would make connections a bit more brittle, but
- would resolve the rogue exit relay issue. We could even round-robin
- among several, and the list could be dynamic -- for example, all the
- relays with an Authority flag that allow exits to the Tor website.
- Answer #3: The mirrors should connect to the main distribution site
- via SSL. That way the exit relay can't influence anything.
- Answer #4: We could suggest that users only use trusted bridges for
- fetching a copy of Tor. Hopefully they heard about the bridge from a
- trusted source rather than from the adversary.
- Answer #5: What if the adversary is trawling for Tor downloads by
- network signature -- either by looking for known bytes in the binary,
- or by looking for "GET /tor/dist/"? It would be nice to encrypt the
- connection from the bridge user to the bridge. And we can! The bridge
- already supports TLS. Rather than initiating a TLS renegotiation after
- connecting to the ORPort, the user should actually request a URL. Then
- the ORPort can either pass the connection off as a linked conn to the
- dirport, or renegotiate and become a Tor connection, depending on how
- the client behaves.
- 5. Linked connections: at what level should we proxy?
- Check out the connection_ap_make_link() function, as called from
- directory.c. Tor clients use this to create a "fake" socks connection
- back to themselves, and then they attach a directory request to it,
- so they can launch directory fetches via Tor. We can piggyback on
- this feature.
- We need to decide if we're going to be passing the bytes back and
- forth between the web browser and the main distribution site, or if
- we're going to be actually acting like a proxy (parsing out the file
- they want, fetching that file, and serving it back).
- Advantages of proxying without looking inside:
- - We don't need to build any sort of http support (including
- continues, partial fetches, etc etc).
- Disadvantages:
- - If the browser thinks it's speaking http, are there easy ways
- to pass the bytes to an https server and have everything work
- correctly? At the least, it would seem that the browser would
- complain about the cert. More generally, ssl wants to be negotiated
- before the URL and headers are sent, yet we need to read the URL
- and headers to know that this is a mirror request; so we have an
- ordering problem here.
- - Makes it harder to do caching later on, if we don't look at what
- we're relaying. (It might be useful down the road to cache the
- answers to popular requests, so we don't have to keep getting
- them again.)
- 6. Outstanding problems
- 1) HTTP proxies already exist. Why waste our time cloning one
- badly? When we clone existing stuff, we usually regret it.
- 2) It's overbroad. We only seem to need a secure get-a-tor feature,
- and instead we're contemplating building a locked-down HTTP proxy.
- 3) It's going to add a fair bit of complexity to our code. We do
- not currently implement HTTPS. We'd need to refactor lots of the
- low-level connection stuff so that "SSL" and "Cell-based" were no
- longer synonymous.
- 4) It's still unclear how effective this proposal would be in
- practice. You need to know that this feature exists, which means
- somebody needs to tell you about a bridge (mirror) address and tell
- you how to use it. And if they're doing that, they could (e.g.) tell
- you about a gmail autoresponder address just as easily, and then you'd
- get better authentication of the Tor program to boot.
|