123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350 |
- Filename: 171-separate-streams.txt
- Title: Separate streams across circuits by connection metadata
- Author: Robert Hogan, Jacob Appelbaum, Damon McCoy, Nick Mathewson
- Created: 21-Oct-2008
- Modified: 7-Dec-2010
- Status: Open
- Summary:
- We propose a new set of options to isolate unrelated streams from one
- another, putting them on separate circuits so that semantically
- unrelated traffic is not inadvertently made linkable.
- Motivation:
- Currently, Tor attaches regular streams (that is, ones not carrying
- rendezvous or directory traffic) to circuits based only on whether Tor
- circuit's current exit node supports the destination, and whether the
- circuit has been dirty (that is, in use) for too long.
- This means that traffic that would otherwise be unrelated sometimes
- gets sent over the same circuit, allowing the exit node to link such
- streams with certainty, and allowing other parties to link such
- streams probabilistically.
- Older versions of onion routing tried to address this problem by
- sending every stream over a separate circuit; performance issues made
- this unfeasible. Moreover, in the presence of a localized adversary,
- separating streams by circuits increases the odds that, for any given
- linked set of streams, at least one will go over a compromised
- circuit.
- Therefore we ought to look for ways to allow streams that ought to be
- linked to travel over a single circuit, while keeping streams that
- ought not be linked isolated to separate circuits.
- Discussion:
- Let's call a series of inherently-linked streams (like a set of
- streams downloading objects from the same webpage, or a browsing
- session where the user requests several related webpages) a "Session".
- "Sessions" are a necessarily a fuzzy concept. While users typically
- consider some activities as wholly unrelated to each other ("My IM
- session has nothing to do with my web browsing!"), the boundaries
- between activities are sometimes hard to determine. If I'm reading
- lolcats in one browser tab and reading about treatments for an
- embarrassing disease in another, those are probably separate sessions.
- If I search for a forum, log in, read it for a while, and post a few
- messages on unrelated topics, that's probably all the same session.
- So with the proviso that no automated process can identify sessions
- 100% accurately, let's see which options we have available.
- Generally, all the streams on a session come from a single
- application. Unfortunately, isolating streams by application
- automatically isn't feasible, given the lack of any nice
- cross-platform way to tell which local process originated a given
- connection. (Yes, lsof works. But a quick review of the lsof code
- should be sufficient to scare you away from thinking there is a
- portable option, much less a portable O(1) option.) So instead, we'll
- have to use some other aspect of a Tor request as a proxy for the
- application.
- Generally, traffic from separate applications is not in the same
- session.
- With some applications (IRC, for example), each stream is a session.
- Some applications (most notably web browsing) can't be meaningfully
- split into sessions without inspecting the traffic itself and
- maintaining a lot of state.
- How well do ports correspond to sessions? Early versions of this
- proposal focused on using destination ports as a proxy for
- application, since a connection to port 22 for SSH is probably not in
- the same session as one to port 80. This only works with some
- applications better than others, though: while SSH users typically
- know when they're on port 22 and when they aren't, a web browser can
- be coaxed (though img urls or any number of releated tricks) into
- connecting to any port at all. Moreover, when Tor gets a DNS lookup
- request, it doesn't know in advance which port the resulting address
- will be used to connect to.
- So in summary, each kind of traffic wants to follow different rules,
- and assuming the existence of a web browser and a hostile web page or
- exit node, we can't tell one kind of traffic from another by simply
- looking at the destination:port of the traffic.
- Fortunately, we're not doomed.
- Design:
- When a stream arrives at Tor, we have the following data to examine:
- 1) The destination address
- 2) The destination port (unless this a DNS lookup)
- 3) The protocol used by the application to send the stream to Tor:
- SOCKS4, SOCKS4A, SOCKS5, or whatever local "transparent proxy"
- mechanism the kernel gives us.
- 4) The port used by the application to send the stream to Tor --
- that is, the SOCKSListenAddress or TransListenAddress that the
- application used, if we have more than one.
- 5) The SOCKS username and password, if any.
- 6) The source address and port for the application.
- We propose to use 3, 4, and 5 as a backchannel for applications to
- tell Tor about different sessions. Rather than running only one
- SOCKSPort, a Tor user who would prefer better session isolation should
- run multiple SOCKSPorts/TransPorts, and configure different
- applications to use separate ports. Applications that support SOCKS
- authentication can further be separated on a single port by their
- choice of username/password. Streams sent to separate ports or using
- different authentication information should never be sent over the
- same circuit. We allow each port to have its own settings for
- isolation based on destination port, destination address, or both.
- Handling DNS can be a challenge. We can get hostnames by one of three
- means:
- A) A SOCKS4a request, or a SOCKS5 request with a hostname. This
- case is handled trivially using the rules above.
- B) A RESOLVE request on a SOCKSPort. This case is handled using the
- rules above, except that port isolation can't work to isolate
- RESOLVE requests into a proper session, since we don't know which
- port will eventually be used when we connect to the returned
- address.
- C) A request on a DNSPort. We have no way of knowing which
- address/port will be used to connect to the requested address.
- When B or C is required but problematic, we could favor the use of
- AutomapHostsOnResolve.
- Interface:
- We propose that {SOCKS,Natd,Trans,DNS}ListenAddr be deprecated in
- favor of an expanded {SOCKS,Natd,Trans,DNS}Port syntax:
- ClientPortLine = OptionName SP (Addr ":")? Port (SP Options?)
- OptionName = "SOCKSPort" / "NatdPort" / "TransPort" / "DNSPort"
- Addr = An IPv4 address / an IPv6 address surrounded by brackets.
- If optional, we default to 127.0.0.1
- Port = An integer from 1 through 65535 inclusive
- Options = Option
- Options = Options SP Option
- Option = IsolateOption / GroupOption
- GroupOption = "SessionGroup=" UINT
- IsolateOption = OptNo ("IsolateDestPort" / "IsolateDestAddr" /
- "IsolateSOCKSUser"/ "IsolateClientProtocol" /
- "IsolateClientAddr") OptPlural
- OptNo = "No" ?
- OptPlural = "s" ?
- SP = " "
- UINT = An unsigned integer
- All options are case-insensitive.
- The "IsolateSOCKSUser" and "IsolateClientAddr" options are on by
- default; "NoIsolateSOCKSUser" and "NoIsolateClientAddr" respectively
- turn them off. The IsolateDestPort and IsolateDestAddr and
- IsolateClientProtocol options are off by default. NoIsolateDestPort and
- NoIsolateDestAddr and NoIsolateClientProtocol have no effect.
- Given a set of ClientPortLines, streams must NOT be placed on the same
- circuit if ANY of the following hold:
- * They were sent to two different client ports, unless the two
- client ports both specify a "SessionGroup" option with the same
- integer value.
- * At least one was sent to a client port with the IsolateDestPort
- active, and they have different destination ports.
- * At least one was sent to a client port with IsolateDestAddr
- active, and they have different destination addresses.
- * At least one was sent to a client port with IsolateClientProtocol
- active, and they use different protocols (where SOCKS4, SOCKS4a,
- SOCKS5, TransPort, NatdPort, and DNS are the protocols in question)
- * At least one was sent to a client port with IsolateSOCKSUser
- active, and they have different SOCKS username/password values
- configurations. (For the purposes of this option, the
- username/password pair of ""/"" is distinct from SOCKS without
- authentication, and both are distinct from any non-SOCKS client's
- non-authentication.)
- * At least one was sent to a client port with IsolateClientAddr
- active, and they came from different client addresses. (For the
- purpose of this option, any local interface counts as the same
- address. So if the host is configured with addresses 10.0.0.1,
- 192.0.32.10, and 127.0.0.1, then traffic from those addresses can
- leave on the same circuit, but traffic to from 10.0.0.2 (for
- example) could not share a circuit with any of them.)
- These rules apply regardless of whether the streams are active at the
- same time. In other words, if the rules say that streams A and B must
- not be on the same circuit, and stream A is attached to circuit X,
- then stream B must never be attached to stream X, even if stream A is
- closed first.
- Alternative Interface:
- We're cramming a lot onto one line in the design above. Perhaps
- instead it would be a better idea to have grouped lines of the form:
- StreamGroup 1
- SOCKSPort 9050
- TransPort 9051
- IsolateDestPort 1
- IsolateClientProtocol 0
- EndStreamGroup
- StreamGroup 2
- SOCKSPort 9052
- DNSPort 9053
- IsolateDestAddr 1
- EndStreamGroup
- This would be equivalent to:
- SOCKSPort 9050 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
- TransPort 9051 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
- SOCKSPort 9052 SessionGroup=2 IsolateDestAddr
- DNSPort 9053 SessionGroup=2 IsolateDestAddr
- But it would let us extend range of allowed options later without
- having client port lines group without bound. For example, we might
- give different circuit building parameters to different session
- groups.
- Example of use:
- Suppose that we want to use a web browser, an IRC client, and a SSH
- client all at the same time. Let's assume that we want web traffic to
- be isolated from all other traffic, even if the browser makes
- connections to ports usually used for IRC or SSH. Let's also assume
- that IRC and SSH are both used for relatively long-lived connections,
- and we want to keep all IRC/SSH sessions separate from one another.
- In this case, we could say:
- SOCKSPort 9050
- SOCKSPort 9051 IsolateDestAddr IsolateDestPort
- We would then configure our browser to use 9050 and our IRC/SSH
- clients to use 9051.
- Advanced example of use, #2:
- Suppose that we have a bunch of applications, and we launch them all
- using torsocks, and we want to keep each applications isolated from
- one another. We just create a shell script, "torlaunch":
- #!/bin/bash
- export TORSOCKS_USERNAME="$1"
- exec torsocks $@
- And we configure our SOCKSPort with IsolateSOCKSUser.
- Or if we're on Linux and we want to isolate by application invocation,
- we would change the TORSOCKS_USERNAME line to:
- export TORSOCKS_USERNAME="`cat /proc/sys/kernel/random/uuid`"
- Advanced example of use, #2:
- Now suppose that we want to achieve the benefits of the first example
- of use, but we are stuck using transparent proxies. Let's suppose
- this is Linux.
- TransPort 9090
- TransPort 9091 IsolateDestAddr IsolateDestPort
- DNSPort 5353
- AutomapHostsOnResolve 1
- Here we use the iptables --cmd-owner filter to distinguish which
- command is originating the packets, directing traffic from our irc
- client and our SSH client to port 9091, and directing other traffic to
- 9090. Using AutomapHostsOnResolve will confuse ssh in its default
- configuration; we'll need to find a way around that.
- Security Risks:
- Disabling IsolateClientAddr is a pretty bad idea.
- Setting up a set of applications to use this system effectively is a
- big problem. It's likely that lots of people who try to do this will
- mess it up. We should try to see which setups are sensible, and see
- if we can provide good feedback to explain which streams are isolated
- how.
- Performance Risks:
- This proposal will result in clients building many more circuits than
- they do today. To avoid accidentally hammering the network, we should
- have in-process limits on the maximum circuit creation rate and the
- total maximum client circuits.
- Specification:
- The Tor client circuit selection process is not entirely specified.
- Any client circuit specification must take these changes into account.
- Implementation notes:
- The more obvious ways to implement the "find a good circuit to attach
- to" part of this proposal involve doing an O(n_circuits) operation
- every time we have a stream to attach. We already do such an
- operation, so it's not as if we need to hunt for fancy ways to make it
- O(1). What will be harder is implementing the "launch circuits as
- needed" part of the proposal. Still, it should come down to "a simple
- matter of programming."
- The SOCKS4 spec has the client provide authentication info when it
- connects; accepting such info is no problem. But the SOCKS5 spec has
- the client send a list of known auth methods, then has the server send
- back the authentication method it chooses. We'll need to update the
- SOCKS5 implementation so it can accept user/password authentication if
- it's offered.
- If we use the second syntax for describing these options, we'll want
- to add a new "section-based" entry type for the configuration parser.
- Not a huge deal; we already have kludged up something similar for
- hidden service configurations.
- Opening circuits for predicted ports has the potential to get a little
- more complicated; we can probably get away with the existing
- algorithm, though, to see where its weak points are and look for
- better ones.
- Perhaps we can get our next-gen HTTP proxy to communicate browser tab
- or session into to tor via authentication, or have torbutton do it
- directly. More design is needed here, though.
- Alternative designs:
- The implementation of this option may want to consider cases where the
- same exit node is shared by two or more circuits and
- IsolateStreamsByPort is in force. Since one possible use of the option
- is to reduce the opportunity of Exit Nodes to attack traffic from the
- same source on multiple ports, the implementation may need to ensure
- that circuits reserved for the exclusive use of given ports do not
- share the same exit node. On the other hand, if our goal is only that
- streams should be unlinkable, deliberately shunting them to different
- exit nodes is unnecessary and slightly counterproductive.
- Earlier versions of this design included a mechanism to isolate
- _particular_ destination ports and addresses, so that traffic sent to,
- say, port 22 would never share a port with any traffic *not* sent to
- port 22. You can achieve this here by having all applications that
- send traffic to one of these ports use a separate SOCKSPort, and
- then setting IsolateDestPorts on that SOCKSPort.
- Lingering questions:
- I suspect there are issues remaining with DNS and TransPort users, and
- that my "just use AutomapHostsOnResolve" suggestion may be
- insufficient.
|