171-separate-streams.txt 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350
  1. Filename: 171-separate-streams.txt
  2. Title: Separate streams across circuits by connection metadata
  3. Author: Robert Hogan, Jacob Appelbaum, Damon McCoy, Nick Mathewson
  4. Created: 21-Oct-2008
  5. Modified: 7-Dec-2010
  6. Status: Open
  7. Summary:
  8. We propose a new set of options to isolate unrelated streams from one
  9. another, putting them on separate circuits so that semantically
  10. unrelated traffic is not inadvertently made linkable.
  11. Motivation:
  12. Currently, Tor attaches regular streams (that is, ones not carrying
  13. rendezvous or directory traffic) to circuits based only on whether Tor
  14. circuit's current exit node supports the destination, and whether the
  15. circuit has been dirty (that is, in use) for too long.
  16. This means that traffic that would otherwise be unrelated sometimes
  17. gets sent over the same circuit, allowing the exit node to link such
  18. streams with certainty, and allowing other parties to link such
  19. streams probabilistically.
  20. Older versions of onion routing tried to address this problem by
  21. sending every stream over a separate circuit; performance issues made
  22. this unfeasible. Moreover, in the presence of a localized adversary,
  23. separating streams by circuits increases the odds that, for any given
  24. linked set of streams, at least one will go over a compromised
  25. circuit.
  26. Therefore we ought to look for ways to allow streams that ought to be
  27. linked to travel over a single circuit, while keeping streams that
  28. ought not be linked isolated to separate circuits.
  29. Discussion:
  30. Let's call a series of inherently-linked streams (like a set of
  31. streams downloading objects from the same webpage, or a browsing
  32. session where the user requests several related webpages) a "Session".
  33. "Sessions" are a necessarily a fuzzy concept. While users typically
  34. consider some activities as wholly unrelated to each other ("My IM
  35. session has nothing to do with my web browsing!"), the boundaries
  36. between activities are sometimes hard to determine. If I'm reading
  37. lolcats in one browser tab and reading about treatments for an
  38. embarrassing disease in another, those are probably separate sessions.
  39. If I search for a forum, log in, read it for a while, and post a few
  40. messages on unrelated topics, that's probably all the same session.
  41. So with the proviso that no automated process can identify sessions
  42. 100% accurately, let's see which options we have available.
  43. Generally, all the streams on a session come from a single
  44. application. Unfortunately, isolating streams by application
  45. automatically isn't feasible, given the lack of any nice
  46. cross-platform way to tell which local process originated a given
  47. connection. (Yes, lsof works. But a quick review of the lsof code
  48. should be sufficient to scare you away from thinking there is a
  49. portable option, much less a portable O(1) option.) So instead, we'll
  50. have to use some other aspect of a Tor request as a proxy for the
  51. application.
  52. Generally, traffic from separate applications is not in the same
  53. session.
  54. With some applications (IRC, for example), each stream is a session.
  55. Some applications (most notably web browsing) can't be meaningfully
  56. split into sessions without inspecting the traffic itself and
  57. maintaining a lot of state.
  58. How well do ports correspond to sessions? Early versions of this
  59. proposal focused on using destination ports as a proxy for
  60. application, since a connection to port 22 for SSH is probably not in
  61. the same session as one to port 80. This only works with some
  62. applications better than others, though: while SSH users typically
  63. know when they're on port 22 and when they aren't, a web browser can
  64. be coaxed (though img urls or any number of releated tricks) into
  65. connecting to any port at all. Moreover, when Tor gets a DNS lookup
  66. request, it doesn't know in advance which port the resulting address
  67. will be used to connect to.
  68. So in summary, each kind of traffic wants to follow different rules,
  69. and assuming the existence of a web browser and a hostile web page or
  70. exit node, we can't tell one kind of traffic from another by simply
  71. looking at the destination:port of the traffic.
  72. Fortunately, we're not doomed.
  73. Design:
  74. When a stream arrives at Tor, we have the following data to examine:
  75. 1) The destination address
  76. 2) The destination port (unless this a DNS lookup)
  77. 3) The protocol used by the application to send the stream to Tor:
  78. SOCKS4, SOCKS4A, SOCKS5, or whatever local "transparent proxy"
  79. mechanism the kernel gives us.
  80. 4) The port used by the application to send the stream to Tor --
  81. that is, the SOCKSListenAddress or TransListenAddress that the
  82. application used, if we have more than one.
  83. 5) The SOCKS username and password, if any.
  84. 6) The source address and port for the application.
  85. We propose to use 3, 4, and 5 as a backchannel for applications to
  86. tell Tor about different sessions. Rather than running only one
  87. SOCKSPort, a Tor user who would prefer better session isolation should
  88. run multiple SOCKSPorts/TransPorts, and configure different
  89. applications to use separate ports. Applications that support SOCKS
  90. authentication can further be separated on a single port by their
  91. choice of username/password. Streams sent to separate ports or using
  92. different authentication information should never be sent over the
  93. same circuit. We allow each port to have its own settings for
  94. isolation based on destination port, destination address, or both.
  95. Handling DNS can be a challenge. We can get hostnames by one of three
  96. means:
  97. A) A SOCKS4a request, or a SOCKS5 request with a hostname. This
  98. case is handled trivially using the rules above.
  99. B) A RESOLVE request on a SOCKSPort. This case is handled using the
  100. rules above, except that port isolation can't work to isolate
  101. RESOLVE requests into a proper session, since we don't know which
  102. port will eventually be used when we connect to the returned
  103. address.
  104. C) A request on a DNSPort. We have no way of knowing which
  105. address/port will be used to connect to the requested address.
  106. When B or C is required but problematic, we could favor the use of
  107. AutomapHostsOnResolve.
  108. Interface:
  109. We propose that {SOCKS,Natd,Trans,DNS}ListenAddr be deprecated in
  110. favor of an expanded {SOCKS,Natd,Trans,DNS}Port syntax:
  111. ClientPortLine = OptionName SP (Addr ":")? Port (SP Options?)
  112. OptionName = "SOCKSPort" / "NatdPort" / "TransPort" / "DNSPort"
  113. Addr = An IPv4 address / an IPv6 address surrounded by brackets.
  114. If optional, we default to 127.0.0.1
  115. Port = An integer from 1 through 65535 inclusive
  116. Options = Option
  117. Options = Options SP Option
  118. Option = IsolateOption / GroupOption
  119. GroupOption = "SessionGroup=" UINT
  120. IsolateOption = OptNo ("IsolateDestPort" / "IsolateDestAddr" /
  121. "IsolateSOCKSUser"/ "IsolateClientProtocol" /
  122. "IsolateClientAddr") OptPlural
  123. OptNo = "No" ?
  124. OptPlural = "s" ?
  125. SP = " "
  126. UINT = An unsigned integer
  127. All options are case-insensitive.
  128. The "IsolateSOCKSUser" and "IsolateClientAddr" options are on by
  129. default; "NoIsolateSOCKSUser" and "NoIsolateClientAddr" respectively
  130. turn them off. The IsolateDestPort and IsolateDestAddr and
  131. IsolateClientProtocol options are off by default. NoIsolateDestPort and
  132. NoIsolateDestAddr and NoIsolateClientProtocol have no effect.
  133. Given a set of ClientPortLines, streams must NOT be placed on the same
  134. circuit if ANY of the following hold:
  135. * They were sent to two different client ports, unless the two
  136. client ports both specify a "SessionGroup" option with the same
  137. integer value.
  138. * At least one was sent to a client port with the IsolateDestPort
  139. active, and they have different destination ports.
  140. * At least one was sent to a client port with IsolateDestAddr
  141. active, and they have different destination addresses.
  142. * At least one was sent to a client port with IsolateClientProtocol
  143. active, and they use different protocols (where SOCKS4, SOCKS4a,
  144. SOCKS5, TransPort, NatdPort, and DNS are the protocols in question)
  145. * At least one was sent to a client port with IsolateSOCKSUser
  146. active, and they have different SOCKS username/password values
  147. configurations. (For the purposes of this option, the
  148. username/password pair of ""/"" is distinct from SOCKS without
  149. authentication, and both are distinct from any non-SOCKS client's
  150. non-authentication.)
  151. * At least one was sent to a client port with IsolateClientAddr
  152. active, and they came from different client addresses. (For the
  153. purpose of this option, any local interface counts as the same
  154. address. So if the host is configured with addresses 10.0.0.1,
  155. 192.0.32.10, and 127.0.0.1, then traffic from those addresses can
  156. leave on the same circuit, but traffic to from 10.0.0.2 (for
  157. example) could not share a circuit with any of them.)
  158. These rules apply regardless of whether the streams are active at the
  159. same time. In other words, if the rules say that streams A and B must
  160. not be on the same circuit, and stream A is attached to circuit X,
  161. then stream B must never be attached to stream X, even if stream A is
  162. closed first.
  163. Alternative Interface:
  164. We're cramming a lot onto one line in the design above. Perhaps
  165. instead it would be a better idea to have grouped lines of the form:
  166. StreamGroup 1
  167. SOCKSPort 9050
  168. TransPort 9051
  169. IsolateDestPort 1
  170. IsolateClientProtocol 0
  171. EndStreamGroup
  172. StreamGroup 2
  173. SOCKSPort 9052
  174. DNSPort 9053
  175. IsolateDestAddr 1
  176. EndStreamGroup
  177. This would be equivalent to:
  178. SOCKSPort 9050 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
  179. TransPort 9051 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
  180. SOCKSPort 9052 SessionGroup=2 IsolateDestAddr
  181. DNSPort 9053 SessionGroup=2 IsolateDestAddr
  182. But it would let us extend range of allowed options later without
  183. having client port lines group without bound. For example, we might
  184. give different circuit building parameters to different session
  185. groups.
  186. Example of use:
  187. Suppose that we want to use a web browser, an IRC client, and a SSH
  188. client all at the same time. Let's assume that we want web traffic to
  189. be isolated from all other traffic, even if the browser makes
  190. connections to ports usually used for IRC or SSH. Let's also assume
  191. that IRC and SSH are both used for relatively long-lived connections,
  192. and we want to keep all IRC/SSH sessions separate from one another.
  193. In this case, we could say:
  194. SOCKSPort 9050
  195. SOCKSPort 9051 IsolateDestAddr IsolateDestPort
  196. We would then configure our browser to use 9050 and our IRC/SSH
  197. clients to use 9051.
  198. Advanced example of use, #2:
  199. Suppose that we have a bunch of applications, and we launch them all
  200. using torsocks, and we want to keep each applications isolated from
  201. one another. We just create a shell script, "torlaunch":
  202. #!/bin/bash
  203. export TORSOCKS_USERNAME="$1"
  204. exec torsocks $@
  205. And we configure our SOCKSPort with IsolateSOCKSUser.
  206. Or if we're on Linux and we want to isolate by application invocation,
  207. we would change the TORSOCKS_USERNAME line to:
  208. export TORSOCKS_USERNAME="`cat /proc/sys/kernel/random/uuid`"
  209. Advanced example of use, #2:
  210. Now suppose that we want to achieve the benefits of the first example
  211. of use, but we are stuck using transparent proxies. Let's suppose
  212. this is Linux.
  213. TransPort 9090
  214. TransPort 9091 IsolateDestAddr IsolateDestPort
  215. DNSPort 5353
  216. AutomapHostsOnResolve 1
  217. Here we use the iptables --cmd-owner filter to distinguish which
  218. command is originating the packets, directing traffic from our irc
  219. client and our SSH client to port 9091, and directing other traffic to
  220. 9090. Using AutomapHostsOnResolve will confuse ssh in its default
  221. configuration; we'll need to find a way around that.
  222. Security Risks:
  223. Disabling IsolateClientAddr is a pretty bad idea.
  224. Setting up a set of applications to use this system effectively is a
  225. big problem. It's likely that lots of people who try to do this will
  226. mess it up. We should try to see which setups are sensible, and see
  227. if we can provide good feedback to explain which streams are isolated
  228. how.
  229. Performance Risks:
  230. This proposal will result in clients building many more circuits than
  231. they do today. To avoid accidentally hammering the network, we should
  232. have in-process limits on the maximum circuit creation rate and the
  233. total maximum client circuits.
  234. Specification:
  235. The Tor client circuit selection process is not entirely specified.
  236. Any client circuit specification must take these changes into account.
  237. Implementation notes:
  238. The more obvious ways to implement the "find a good circuit to attach
  239. to" part of this proposal involve doing an O(n_circuits) operation
  240. every time we have a stream to attach. We already do such an
  241. operation, so it's not as if we need to hunt for fancy ways to make it
  242. O(1). What will be harder is implementing the "launch circuits as
  243. needed" part of the proposal. Still, it should come down to "a simple
  244. matter of programming."
  245. The SOCKS4 spec has the client provide authentication info when it
  246. connects; accepting such info is no problem. But the SOCKS5 spec has
  247. the client send a list of known auth methods, then has the server send
  248. back the authentication method it chooses. We'll need to update the
  249. SOCKS5 implementation so it can accept user/password authentication if
  250. it's offered.
  251. If we use the second syntax for describing these options, we'll want
  252. to add a new "section-based" entry type for the configuration parser.
  253. Not a huge deal; we already have kludged up something similar for
  254. hidden service configurations.
  255. Opening circuits for predicted ports has the potential to get a little
  256. more complicated; we can probably get away with the existing
  257. algorithm, though, to see where its weak points are and look for
  258. better ones.
  259. Perhaps we can get our next-gen HTTP proxy to communicate browser tab
  260. or session into to tor via authentication, or have torbutton do it
  261. directly. More design is needed here, though.
  262. Alternative designs:
  263. The implementation of this option may want to consider cases where the
  264. same exit node is shared by two or more circuits and
  265. IsolateStreamsByPort is in force. Since one possible use of the option
  266. is to reduce the opportunity of Exit Nodes to attack traffic from the
  267. same source on multiple ports, the implementation may need to ensure
  268. that circuits reserved for the exclusive use of given ports do not
  269. share the same exit node. On the other hand, if our goal is only that
  270. streams should be unlinkable, deliberately shunting them to different
  271. exit nodes is unnecessary and slightly counterproductive.
  272. Earlier versions of this design included a mechanism to isolate
  273. _particular_ destination ports and addresses, so that traffic sent to,
  274. say, port 22 would never share a port with any traffic *not* sent to
  275. port 22. You can achieve this here by having all applications that
  276. send traffic to one of these ports use a separate SOCKSPort, and
  277. then setting IsolateDestPorts on that SOCKSPort.
  278. Lingering questions:
  279. I suspect there are issues remaining with DNS and TransPort users, and
  280. that my "just use AutomapHostsOnResolve" suggestion may be
  281. insufficient.