HACKING 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511
  1. Guide to Hacking Tor
  2. (As of 8 October 2003, this was all accurate. If you're reading this in
  3. the distant future, stuff may have changed.)
  4. 0. Intro and required reading
  5. Onion Routing is still very much in development stages. This document
  6. aims to get you started in the right direction if you want to understand
  7. the code, add features, fix bugs, etc.
  8. Read the README file first, so you can get familiar with the basics of
  9. installing and running an onion router.
  10. Then, skim some of the introductory materials in tor-spec.txt,
  11. tor-design.tex, and the Tor FAQ to learn more about how the Tor protocol
  12. is supposed to work. This document will assume you know about Cells,
  13. Circuits, Streams, Connections, Onion Routers, and Onion Proxies.
  14. 1. Code organization
  15. 1.1. The modules
  16. The code is divided into two directories: ./src/common and ./src/or.
  17. The "common" directory contains general purpose utility functions not
  18. specific to onion routing. The "or" directory implements all
  19. onion-routing and onion-proxy specific functionality.
  20. Files in ./src/common:
  21. aes.[ch] -- Implements the AES cipher (with 128-bit keys and blocks),
  22. and a counter-mode stream cipher on top of AES. This code is
  23. taken from the main Rijndael distribution. (We include this
  24. because many people are running older versions of OpenSSL without
  25. AES support.)
  26. crypto.[ch] -- Wrapper functions to present a consistent interface to
  27. public-key and symmetric cryptography operations from OpenSSL.
  28. fakepoll.[ch] -- Used on systems that don't have a poll() system call;
  29. reimplements() poll using the select() system call.
  30. log.[ch] -- Tor's logging subsystem.
  31. test.h -- Macros used by unit tests.
  32. torint.h -- Provides missing [u]int*_t types for environments that
  33. don't have stdint.h.
  34. tortls.[ch] -- Wrapper functions to present a consistent interface to
  35. TLS, SSL, and X.509 functions from OpenSSL.
  36. util.[ch] -- Miscellaneous portability and convenience functions.
  37. Files in ./src/or:
  38. [General-purpose modules]
  39. or.h -- Common header file: includes everything, define everything.
  40. buffers.c -- Implements a generic buffer interface. Buffers are
  41. fairly opaque string holders that can read to or flush from:
  42. memory, file descriptors, or TLS connections.
  43. Also implements parsing functions to read HTTP and SOCKS commands
  44. from buffers.
  45. tree.h -- A splay tree implementatio by Niels Provos. Used only by
  46. dns.c.
  47. config.c -- Code to parse and validate the configuration file.
  48. [Background processing modules]
  49. cpuworker.c -- Implements a separate 'CPU worker' process to perform
  50. CPU-intensive tasks in the background, so as not interrupt the
  51. onion router. (OR only)
  52. dns.c -- Implements a farm of 'DNS worker' processes to perform DNS
  53. lookups for onion routers and cache the results. [This needs to
  54. be done in the background because of the lack of a good,
  55. ubiquitous asynchronous DNS implementation.] (OR only)
  56. [Directory-related functionality.]
  57. directory.c -- Code to send and fetch directories and router
  58. descriptors via HTTP. Directories use dirserv.c to generate the
  59. results; clients use routers.c to parse them.
  60. dirserv.c -- Code to manage directory contents and generate
  61. directories. [Directory only]
  62. routers.c -- Code to parse directories and router descriptors; and to
  63. generate a router descriptor corresponding to this OR's
  64. capabilities. Also presents some high-level interfaces for
  65. managing an OR or OP's view of the directory.
  66. [Circuit-related modules.]
  67. circuit.c -- Code to create circuits, manage circuits, and route
  68. relay cells along circuits.
  69. onion.c -- Code to generate and respond to "onion skins".
  70. [Core protocol implementation.]
  71. connection.c -- Code used in common by all connection types. See
  72. 1.2. below for more general information about connections.
  73. connection_edge.c -- Code used only by edge connections.
  74. command.c -- Code to handle specific cell types. [OR only]
  75. connection_or.c -- Code to implement cell-speaking connections.
  76. [Toplevel modules.]
  77. main.c -- Toplevel module. Initializes keys, handles signals,
  78. multiplexes between connections, implements main loop, and drives
  79. scheduled events.
  80. tor_main.c -- Stub module containing a main() function. Allows unit
  81. test binary to link against main.c
  82. [Unit tests]
  83. test.c -- Contains unit tests for many pieces of the lower level Tor
  84. modules.
  85. 1.2. All about connections
  86. All sockets in Tor are handled as different types of nonblocking
  87. 'connections'. (What the Tor spec calls a "Connection", the code refers
  88. to as a "Cell-speaking" or "OR" connection.)
  89. Connections are implemented by the connection_t struct, defined in or.h.
  90. Not every kind of connection uses all the fields in connection_t; see
  91. the comments in or.h and the assertions in assert_connection_ok() for
  92. more information.
  93. Every connection has a type and a state. Connections never change their
  94. type, but can go through many state changes in their lifetime.
  95. The connection types break down as follows:
  96. [Cell-speaking connections]
  97. CONN_TYPE_OR -- A bidirectional TLS connection transmitting a
  98. sequence of cells. May be from an OR to an OR, or from an OP to
  99. an OR.
  100. [Edge connections]
  101. CONN_TYPE_EXIT -- A TCP connection from an onion router to a
  102. Stream's destination. [OR only]
  103. CONN_TYPE_AP -- A SOCKS proxy connection from the end user to the
  104. onion proxy. [OP only]
  105. [Listeners]
  106. CONN_TYPE_OR_LISTENER [OR only]
  107. CONN_TYPE_AP_LISTENER [OP only]
  108. CONN_TYPE_DIR_LISTENER [Directory only]
  109. -- Bound network sockets, waiting for incoming connections.
  110. [Internal]
  111. CONN_TYPE_DNSWORKER -- Connection from the main process to a DNS
  112. worker. [OR only]
  113. CONN_TYPE_CPUWORKER -- Connection from the main process to a CPU
  114. worker. [OR only]
  115. Connection states are documented in or.h.
  116. Every connection has two associated input and output buffers.
  117. Listeners don't use them. With other connections, incoming data is
  118. appended to conn->inbuf, and outgoing data is taken from the front of
  119. conn->outbuf. Connections differ primarily in the functions called
  120. to fill and drain these buffers.
  121. 1.3. All about circuits.
  122. A circuit_t structure fills two roles. First, a circuit_t links two
  123. connections together: either an edge connection and an OR connection,
  124. or two OR connections. (When joined to an OR connection, a circuit_t
  125. affects only cells sent to a particular ACI on that connection. When
  126. joined to an edge connection, a circuit_t affects all data.)
  127. Second, a circuit_t holds the cipher keys and state for sending data
  128. along a given circuit. At the OP, it has a sequence of ciphers, each
  129. of which is shared with a single OR along the circuit. Separate
  130. ciphers are used for data going "forward" (away from the OP) and
  131. "backward" (towards the OP). At the OR, a circuit has only two stream
  132. ciphers: one for data going forward, and one for data going backward.
  133. 1.4. Asynchronous IO and the main loop.
  134. Tor uses the poll(2) system call [or a substitute based on select(2)]
  135. to handle nonblocking (asynchonous) IO. If you're not familiar with
  136. nonblocking IO, check out the links at the end of this document.
  137. All asynchronous logic is handled in main.c. The functions
  138. 'connection_add', 'connection_set_poll_socket', and 'connection_remove'
  139. manage an array of connection_t*, and keep in synch with the array of
  140. struct pollfd required by poll(2). (This array of connection_t* is
  141. accessible via get_connection_array, but users should generally call
  142. one of the 'connection_get_by_*' functions in connection.c to look up
  143. individual connections.)
  144. To trap read and write events, connections call the functions
  145. 'connection_{is|stop|start}_{reading|writing}'.
  146. When connections get events, main.c calls conn_read and conn_write.
  147. These functions dispatch events to connection_handle_read and
  148. connection_handle_write as appropriate.
  149. When connection need to be closed, they can respond in two ways. Most
  150. simply, they can make connection_handle_* to return an error (-1),
  151. which will make conn_{read|write} close them. But if the connection
  152. needs to stay around [XXXX explain why] until the end of the current
  153. iteration of the main loop, it marks itself for closing by setting
  154. conn->connection_marked_for_close.
  155. The main loop handles several other operations: First, it checks
  156. whether any signals have been received that require a response (HUP,
  157. KILL, USR1, CHLD). Second, it calls prepare_for_poll to handle recurring
  158. tasks and compute the necessary poll timeout. These recurring tasks
  159. include periodically fetching the directory, timing out unused
  160. circuits, incrementing flow control windows and re-enabling connections
  161. that were blocking for more bandwidth, and maintaining statistics.
  162. A word about TLS: Using TLS on OR connections complicates matters in
  163. two ways. First, a TLS stream has its own read buffer independent of
  164. the connection's read buffer. (TLS needs to read an entire frame from
  165. the network before it can decrypt any data. Thus, trying to read 1
  166. byte from TLS can require that several KB be read from the network and
  167. decrypted. The extra data is stored in TLS's decrypt buffer.) Second,
  168. the TLS stream's events do not correspond directly to network events:
  169. sometimes, before a TLS stream can read, the network must be ready to
  170. write -- or vice versa.
  171. [XXXX describe the consequences of this for OR connections.]
  172. 1.5. How data flows (An illustration.)
  173. Suppose an OR receives 50 bytes along an OR connection. These 50 bytes
  174. complete a data relay cell, which gets decrypted and delivered to an
  175. edge connection. Here we give a possible call sequence for the
  176. delivery of this data.
  177. (This may be outdated quickly.)
  178. do_main_loop -- Calls poll(2), receives a POLLIN event on a struct
  179. pollfd, then calls:
  180. conn_read -- Looks up the corresponding connection_t, and calls:
  181. connection_handle_read -- Calls:
  182. connection_read_to_buf -- Notices that it has an OR connection so:
  183. read_to_buf_tls -- Pulls data from the TLS stream onto conn->inbuf.
  184. connection_process_inbuf -- Notices that it has an OR connection so:
  185. connection_or_process_inbuf -- Checks whether conn is open, and calls:
  186. connection_process_cell_from_inbuf -- Notices it has enough data for
  187. a cell, then calls:
  188. connection_fetch_from_buf -- Pulls the cell from the buffer.
  189. cell_unpack -- Decodes the raw cell into a cell_t
  190. command_process_cell -- Notices it is a relay cell, so calls:
  191. command_process_relay_cell -- Looks up the circuit for the cell,
  192. makes sure the circuit is live, then passes the cell to:
  193. circuit_deliver_relay_cell -- Passes the cell to each of:
  194. relay_crypt -- Strips a layer of encryption from the cell and
  195. notice that the cell is for local delivery.
  196. connection_edge_process_relay_cell -- extracts the cell's
  197. relay command, and makes sure the edge connection is
  198. open. Since it has a DATA cell and an open connection,
  199. calls:
  200. circuit_consider_sending_sendme -- [XXX]
  201. connection_write_to_buf -- To place the data on the outgoing
  202. buffer of the correct edge connection, by calling:
  203. connection_start_writing -- To tell the main poll loop about
  204. the pending data.
  205. write_to_buf -- To actually place the outgoing data on the
  206. edge connection.
  207. connection_consider_sending_sendme -- [XXX]
  208. [In a subsequent iteration, main notices that the edge connection is
  209. ready for writing.]
  210. do_main_loop -- Calls poll(2), receives a POLLOUT event on a struct
  211. pollfd, then calls:
  212. conn_write -- Looks up the corresponding connection_t, and calls:
  213. connection_handle_write -- This isn't a TLS connection, so calls:
  214. flush_buf -- Delivers data from the edge connection's outbuf to the
  215. network.
  216. connection_wants_to_flush -- Reports that all data has been flushed.
  217. connection_finished_flushing -- Notices the connection is an exit,
  218. and calls:
  219. connection_edge_finished_flushing -- The connection is open, so it
  220. calls:
  221. connection_stop_writing -- Tells the main poll loop that this
  222. connection has no more data to write.
  223. connection_consider_sending_sendme -- [XXX]
  224. 1.6. Routers, descriptors, and directories
  225. All Tor processes need to keep track of a list of onion routers, for
  226. several reasons:
  227. - OPs need to establish connections and circuits to ORs.
  228. - ORs need to establish connections to other ORs.
  229. - OPs and ORs need to fetch directories from a directory servers.
  230. - ORs need to upload their descriptors to directory servers.
  231. - Directory servers need to know which ORs are allowed onto the
  232. network, what the descriptors are for those ORs, and which of
  233. those ORs are currently live.
  234. Thus, every Tor process keeps track of a list of all the ORs it knows
  235. in a static variable 'directory' in the routers.c module. This
  236. variable contains a routerinfo_t object for each known OR. On startup,
  237. the directory is initialized to a list of known directory servers (via
  238. router_get_list_from_file()). Later, the directory is updated via
  239. router_get_dir_from_string(). (OPs and ORs retrieve fresh directories
  240. from directory servers; directory servers generate their own.)
  241. Every OR must periodically regenerate a router descriptor for itself.
  242. The descriptor and the corresponding routerinfo_t are stored in the
  243. 'desc_routerinfo' and 'descriptor' static variables in routers.c.
  244. Additionally, a directory server keeps track of a list of the
  245. router descriptors it knows in a separte list in dirserv.c. It
  246. uses this list, plus the open connections in main.c, to build
  247. directories.
  248. 1.7. Data model
  249. [XXX]
  250. 1.8. Flow control
  251. [XXX]
  252. 2. Coding conventions
  253. 2.1. Details
  254. Use tor_malloc, tor_strdup, and tor_gettimeofday instead of their
  255. generic equivalents. (They always succeed or exit.)
  256. Use INLINE instead of 'inline', so that we work properly on windows.
  257. 2.2. Calling and naming conventions
  258. Whenever possible, functions should return -1 on error and and 0 on
  259. success.
  260. For multi-word identifiers, use lowercase words combined with
  261. underscores. (e.g., "multi_word_identifier"). Use ALL_CAPS for macros and
  262. constants.
  263. Typenames should end with "_t".
  264. Function names should be prefixed with a module name or object name. (In
  265. general, code to manipulate an object should be a module with the same
  266. name as the object, so it's hard to tell which convention is used.)
  267. Functions that do things should have imperative-verb names
  268. (e.g. buffer_clear, buffer_resize); functions that return booleans should
  269. have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
  270. 2.3. What To Optimize
  271. Don't optimize anything if it's not in the critical path. Right now,
  272. the critical path seems to be AES, logging, and the network itself.
  273. Feel free to do your own profiling to determine otherwise.
  274. 2.4. Log conventions
  275. Log convention: use only these four log severities.
  276. ERR is if something fatal just happened.
  277. WARNING is something bad happened, but we're still running. The
  278. bad thing is either a bug in the code, an attack or buggy
  279. protocol/implementation of the remote peer, etc. The operator should
  280. examine the bad thing and try to correct it.
  281. (No error or warning messages should be expected during normal OR or OP
  282. operation.. I expect most people to run on -l warning eventually. If a
  283. library function is currently called such that failure always means
  284. ERR, then the library function should log WARNING and let the caller
  285. log ERR.)
  286. INFO means something happened (maybe bad, maybe ok), but there's nothing
  287. you need to (or can) do about it.
  288. DEBUG is for everything louder than INFO.
  289. [XXX Proposed convention: every messages of severity INFO or higher should
  290. either (A) be intelligible to end-users who don't know the Tor source; or
  291. (B) somehow inform the end-users that they aren't expected to understand
  292. the message (perhaps with a string like "internal error"). Option (A) is
  293. to be preferred to option (B). -NM]
  294. 3. References
  295. About Tor
  296. See http://freehaven.net/tor/
  297. http://freehaven.net/tor/cvs/doc/tor-spec.txt
  298. http://freehaven.net/tor/cvs/doc/tor-dessign.tex
  299. http://freehaven.net/tor/cvs/doc/FAQ
  300. About anonymity
  301. See http://freehaven.net/anonbib/
  302. About nonblocking IO
  303. [XXX insert references]
  304. # ======================================================================
  305. # Old HACKING document; merge into the above, move into tor-design.tex,
  306. # or delete.
  307. # ======================================================================
  308. The pieces.
  309. Routers. Onion routers, as far as the 'tor' program is concerned,
  310. are a bunch of data items that are loaded into the router_array when
  311. the program starts. Periodically it downloads a new set of routers
  312. from a directory server, and updates the router_array. When a new OR
  313. connection is started (see below), the relevant information is copied
  314. from the router struct to the connection struct.
  315. Connections. A connection is a long-standing tcp socket between
  316. nodes. A connection is named based on what it's connected to -- an "OR
  317. connection" has an onion router on the other end, an "OP connection" has
  318. an onion proxy on the other end, an "exit connection" has a website or
  319. other server on the other end, and an "AP connection" has an application
  320. proxy (and thus a user) on the other end.
  321. Circuits. A circuit is a path over the onion routing
  322. network. Applications can connect to one end of the circuit, and can
  323. create exit connections at the other end of the circuit. AP and exit
  324. connections have only one circuit associated with them (and thus these
  325. connection types are closed when the circuit is closed), whereas OP and
  326. OR connections multiplex many circuits at once, and stay standing even
  327. when there are no circuits running over them.
  328. Streams. Streams are specific conversations between an AP and an exit.
  329. Streams are multiplexed over circuits.
  330. Cells. Some connections, specifically OR and OP connections, speak
  331. "cells". This means that data over that connection is bundled into 256
  332. byte packets (8 bytes of header and 248 bytes of payload). Each cell has
  333. a type, or "command", which indicates what it's for.
  334. Robustness features.
  335. [XXX no longer up to date]
  336. Bandwidth throttling. Each cell-speaking connection has a maximum
  337. bandwidth it can use, as specified in the routers.or file. Bandwidth
  338. throttling can occur on both the sender side and the receiving side. If
  339. the LinkPadding option is on, the sending side sends cells at regularly
  340. spaced intervals (e.g., a connection with a bandwidth of 25600B/s would
  341. queue a cell every 10ms). The receiving side protects against misbehaving
  342. servers that send cells more frequently, by using a simple token bucket:
  343. Each connection has a token bucket with a specified capacity. Tokens are
  344. added to the bucket each second (when the bucket is full, new tokens
  345. are discarded.) Each token represents permission to receive one byte
  346. from the network --- to receive a byte, the connection must remove a
  347. token from the bucket. Thus if the bucket is empty, that connection must
  348. wait until more tokens arrive. The number of tokens we add enforces a
  349. longterm average rate of incoming bytes, yet we still permit short-term
  350. bursts above the allowed bandwidth. Currently bucket sizes are set to
  351. ten seconds worth of traffic.
  352. The bandwidth throttling uses TCP to push back when we stop reading.
  353. We extend it with token buckets to allow more flexibility for traffic
  354. bursts.
  355. Data congestion control. Even with the above bandwidth throttling,
  356. we still need to worry about congestion, either accidental or intentional.
  357. If a lot of people make circuits into same node, and they all come out
  358. through the same connection, then that connection may become saturated
  359. (be unable to send out data cells as quickly as it wants to). An adversary
  360. can make a 'put' request through the onion routing network to a webserver
  361. he owns, and then refuse to read any of the bytes at the webserver end
  362. of the circuit. These bottlenecks can propagate back through the entire
  363. network, mucking up everything.
  364. (See the tor-spec.txt document for details of how congestion control
  365. works.)
  366. In practice, all the nodes in the circuit maintain a receive window
  367. close to maximum except the exit node, which stays around 0, periodically
  368. receiving a sendme and reading more data cells from the webserver.
  369. In this way we can use pretty much all of the available bandwidth for
  370. data, but gracefully back off when faced with multiple circuits (a new
  371. sendme arrives only after some cells have traversed the entire network),
  372. stalled network connections, or attacks.
  373. We don't need to reimplement full tcp windows, with sequence numbers,
  374. the ability to drop cells when we're full etc, because the tcp streams
  375. already guarantee in-order delivery of each cell. Rather than trying
  376. to build some sort of tcp-on-tcp scheme, we implement this minimal data
  377. congestion control; so far it's enough.
  378. Router twins. In many cases when we ask for a router with a given
  379. address and port, we really mean a router who knows a given key. Router
  380. twins are two or more routers that share the same private key. We thus
  381. give routers extra flexibility in choosing the next hop in the circuit: if
  382. some of the twins are down or slow, it can choose the more available ones.
  383. Currently the code tries for the primary router first, and if it's down,
  384. chooses the first available twin.