|  | @@ -1,11 +1,418 @@
 | 
	
		
			
				|  |  | +			 Guide to Hacking Tor
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -0. Intro.
 | 
	
		
			
				|  |  | -Onion Routing is still very much in development stages. This document
 | 
	
		
			
				|  |  | -aims to get you started in the right direction if you want to understand
 | 
	
		
			
				|  |  | -the code, add features, fix bugs, etc.
 | 
	
		
			
				|  |  | +(As of 8 October 2003, this was all accurate.  If you're reading this in
 | 
	
		
			
				|  |  | +the distant future, stuff may have changed.)
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -Read the README file first, so you can get familiar with the basics.
 | 
	
		
			
				|  |  | +0. Intro and required reading
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | +  Onion Routing is still very much in development stages. This document
 | 
	
		
			
				|  |  | +  aims to get you started in the right direction if you want to understand
 | 
	
		
			
				|  |  | +  the code, add features, fix bugs, etc.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Read the README file first, so you can get familiar with the basics of
 | 
	
		
			
				|  |  | +  installing and running an onion router.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Then, skim some of the introductory materials in tor-spec.txt,
 | 
	
		
			
				|  |  | +  tor-design.tex, and the Tor FAQ to learn more about how the Tor protocol
 | 
	
		
			
				|  |  | +  is supposed to work.  This document will assume you know about Cells,
 | 
	
		
			
				|  |  | +  Circuits, Streams, Connections, Onion Routers, and Onion Proxies.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1. Code organization
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1.1. The modules
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The code is divided into two directories: ./src/common and ./src/or.
 | 
	
		
			
				|  |  | +  The "common" directory contains general purpose utility functions not
 | 
	
		
			
				|  |  | +  specific to onion routing.  The "or" directory implements all
 | 
	
		
			
				|  |  | +  onion-routing and onion-proxy specific functionality.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Files in ./src/common:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     aes.[ch] -- Implements the AES cipher (with 128-bit keys and blocks),
 | 
	
		
			
				|  |  | +        and a counter-mode stream cipher on top of AES.  This code is
 | 
	
		
			
				|  |  | +        taken from the main Rijndael distribution.  (We include this
 | 
	
		
			
				|  |  | +        because many people are running older versions of OpenSSL without
 | 
	
		
			
				|  |  | +        AES support.)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     crypto.[ch] -- Wrapper functions to present a consistent interface to
 | 
	
		
			
				|  |  | +        public-key and symmetric cryptography operations from OpenSSL.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     fakepoll.[ch] -- Used on systems that don't have a poll() system call;
 | 
	
		
			
				|  |  | +        reimplements() poll using the select() system call.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     log.[ch] -- Tor's logging subsystem.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     test.h -- Macros used by unit tests.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     torint.h -- Provides missing [u]int*_t types for environments that
 | 
	
		
			
				|  |  | +        don't have stdint.h.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     tortls.[ch] -- Wrapper functions to present a consistent interface to
 | 
	
		
			
				|  |  | +        TLS, SSL, and X.509 functions from OpenSSL.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     util.[ch] -- Miscellaneous portability and convenience functions.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Files in ./src/or:
 | 
	
		
			
				|  |  | +  
 | 
	
		
			
				|  |  | +   [General-purpose modules]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     or.h -- Common header file: includes everything, define everything.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     buffers.c -- Implements a generic buffer interface.  Buffers are 
 | 
	
		
			
				|  |  | +        fairly opaque string holders that can read to or flush from:
 | 
	
		
			
				|  |  | +        memory, file descriptors, or TLS connections.  
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +        Also implements parsing functions to read HTTP and SOCKS commands
 | 
	
		
			
				|  |  | +        from buffers.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     tree.h -- A splay tree implementatio by Niels Provos.  Used only by
 | 
	
		
			
				|  |  | +        dns.c.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     config.c -- Code to parse and validate the configuration file.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   [Background processing modules]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     cpuworker.c -- Implements a separate 'CPU worker' process to perform
 | 
	
		
			
				|  |  | +        CPU-intensive tasks in the background, so as not interrupt the
 | 
	
		
			
				|  |  | +        onion router.  (OR only)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     dns.c -- Implements a farm of 'DNS worker' processes to perform DNS
 | 
	
		
			
				|  |  | +        lookups for onion routers and cache the results.  [This needs to
 | 
	
		
			
				|  |  | +        be done in the background because of the lack of a good,
 | 
	
		
			
				|  |  | +        ubiquitous asynchronous DNS implementation.] (OR only)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   [Directory-related functionality.]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     directory.c -- Code to send and fetch directories and router
 | 
	
		
			
				|  |  | +        descriptors via HTTP.  Directories use dirserv.c to generate the
 | 
	
		
			
				|  |  | +        results; clients use routers.c to parse them.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     dirserv.c -- Code to manage directory contents and generate
 | 
	
		
			
				|  |  | +        directories. [Directory only] 
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     routers.c -- Code to parse directories and router descriptors; and to
 | 
	
		
			
				|  |  | +        generate a router descriptor corresponding to this OR's
 | 
	
		
			
				|  |  | +        capabilities.  Also presents some high-level interfaces for
 | 
	
		
			
				|  |  | +        managing an OR or OP's view of the directory.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   [Circuit-related modules.]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     circuit.c -- Code to create circuits, manage circuits, and route
 | 
	
		
			
				|  |  | +        relay cells along circuits.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     onion.c -- Code to generate and respond to "onion skins".
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   [Core protocol implementation.]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     connection.c -- Code used in common by all connection types.  See
 | 
	
		
			
				|  |  | +        1.2. below for more general information about connections.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     connection_edge.c -- Code used only by edge connections.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     command.c -- Code to handle specific cell types. [OR only]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     connection_or.c -- Code to implement cell-speaking connections.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   [Toplevel modules.]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     main.c -- Toplevel module.  Initializes keys, handles signals,
 | 
	
		
			
				|  |  | +        multiplexes between connections, implements main loop, and drives
 | 
	
		
			
				|  |  | +        scheduled events.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     tor_main.c -- Stub module containing a main() function.  Allows unit
 | 
	
		
			
				|  |  | +        test binary to link against main.c
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   [Unit tests]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     test.c -- Contains unit tests for many pieces of the lower level Tor
 | 
	
		
			
				|  |  | +        modules.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1.2. All about connections
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  All sockets in Tor are handled as different types of nonblocking
 | 
	
		
			
				|  |  | +  'connections'.  (What the Tor spec calls a "Connection", the code refers
 | 
	
		
			
				|  |  | +  to as a "Cell-speaking" or "OR" connection.)
 | 
	
		
			
				|  |  | +  
 | 
	
		
			
				|  |  | +  Connections are implemented by the connection_t struct, defined in or.h.
 | 
	
		
			
				|  |  | +  Not every kind of connection uses all the fields in connection_t; see 
 | 
	
		
			
				|  |  | +  the comments in or.h and the assertions in assert_connection_ok() for
 | 
	
		
			
				|  |  | +  more information.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Every connection has a type and a state.  Connections never change their
 | 
	
		
			
				|  |  | +  type, but can go through many state changes in their lifetime.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  The connection types break down as follows:
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     [Cell-speaking connections]
 | 
	
		
			
				|  |  | +       CONN_TYPE_OR -- A bidirectional TLS connection transmitting a
 | 
	
		
			
				|  |  | +          sequence of cells.  May be from an OR to an OR, or from an OP to
 | 
	
		
			
				|  |  | +          an OR.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     [Edge connections]
 | 
	
		
			
				|  |  | +       CONN_TYPE_EXIT -- A TCP connection from an onion router to a
 | 
	
		
			
				|  |  | +          Stream's destination. [OR only]
 | 
	
		
			
				|  |  | +       CONN_TYPE_AP -- A SOCKS proxy connection from the end user to the
 | 
	
		
			
				|  |  | +          onion proxy.  [OP only]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     [Listeners]
 | 
	
		
			
				|  |  | +       CONN_TYPE_OR_LISTENER [OR only]
 | 
	
		
			
				|  |  | +       CONN_TYPE_AP_LISTENER [OP only]
 | 
	
		
			
				|  |  | +       CONN_TYPE_DIR_LISTENER [Directory only]
 | 
	
		
			
				|  |  | +          -- Bound network sockets, waiting for incoming connections.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     [Internal]
 | 
	
		
			
				|  |  | +       CONN_TYPE_DNSWORKER -- Connection from the main process to a DNS
 | 
	
		
			
				|  |  | +          worker. [OR only]
 | 
	
		
			
				|  |  | +       
 | 
	
		
			
				|  |  | +       CONN_TYPE_CPUWORKER -- Connection from the main process to a CPU
 | 
	
		
			
				|  |  | +          worker. [OR only]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   Connection states are documented in or.h.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   Every connection has two associated input and output buffers.
 | 
	
		
			
				|  |  | +   Listeners don't use them.  With other connections, incoming data is
 | 
	
		
			
				|  |  | +   appended to conn->inbuf, and outgoing data is taken from the front of
 | 
	
		
			
				|  |  | +   conn->outbuf.  Connections differ primarily in the functions called
 | 
	
		
			
				|  |  | +   to fill and drain these buffers.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1.3. All about circuits.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   A circuit_t structure fills two roles.  First, a circuit_t links two
 | 
	
		
			
				|  |  | +   connections together: either an edge connection and an OR connection,
 | 
	
		
			
				|  |  | +   or two OR connections.  (When joined to an OR connection, a circuit_t
 | 
	
		
			
				|  |  | +   affects only cells sent to a particular ACI on that connection.  When
 | 
	
		
			
				|  |  | +   joined to an edge connection, a circuit_t affects all data.)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   Second, a circuit_t holds the cipher keys and state for sending data
 | 
	
		
			
				|  |  | +   along a given circuit.  At the OP, it has a sequence of ciphers, each
 | 
	
		
			
				|  |  | +   of which is shared with a single OR along the circuit.  Separate
 | 
	
		
			
				|  |  | +   ciphers are used for data going "forward" (away from the OP) and
 | 
	
		
			
				|  |  | +   "backward" (towards the OP).  At the OR, a circuit has only two stream
 | 
	
		
			
				|  |  | +   ciphers: one for data going forward, and one for data going backward.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1.4. Asynchronous IO and the main loop.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   Tor uses the poll(2) system call [or a substitute based on select(2)]
 | 
	
		
			
				|  |  | +   to handle nonblocking (asynchonous) IO.  If you're not familiar with
 | 
	
		
			
				|  |  | +   nonblocking IO, check out the links at the end of this document.
 | 
	
		
			
				|  |  | +        
 | 
	
		
			
				|  |  | +   All asynchronous logic is handled in main.c.  The functions
 | 
	
		
			
				|  |  | +   'connection_add', 'connection_set_poll_socket', and 'connection_remove'
 | 
	
		
			
				|  |  | +   manage an array of connection_t*, and keep in synch with the array of
 | 
	
		
			
				|  |  | +   struct pollfd required by poll(2).  (This array of connection_t* is
 | 
	
		
			
				|  |  | +   accessible via get_connection_array, but users should generally call
 | 
	
		
			
				|  |  | +   one of the 'connection_get_by_*' functions in connection.c to look up
 | 
	
		
			
				|  |  | +   individual connections.)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   To trap read and write events, connections call the functions
 | 
	
		
			
				|  |  | +   'connection_{is|stop|start}_{reading|writing}'.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   When connections get events, main.c calls conn_read and conn_write.
 | 
	
		
			
				|  |  | +   These functions dispatch events to connection_handle_read and
 | 
	
		
			
				|  |  | +   connection_handle_write as appropriate.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   When connection need to be closed, they can respond in two ways.  Most
 | 
	
		
			
				|  |  | +   simply, they can make connection_handle_* to return an error (-1),
 | 
	
		
			
				|  |  | +   which will make conn_{read|write} close them.  But if the connection
 | 
	
		
			
				|  |  | +   needs to stay around [XXXX explain why] until the end of the current
 | 
	
		
			
				|  |  | +   iteration of the main loop, it marks itself for closing by setting
 | 
	
		
			
				|  |  | +   conn->connection_marked_for_close.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   The main loop handles several other operations: First, it checks
 | 
	
		
			
				|  |  | +   whether any signals have been received that require a response (HUP,
 | 
	
		
			
				|  |  | +   KILL, USR1, CHLD).  Second, it calls prepare_for_poll to handle recurring
 | 
	
		
			
				|  |  | +   tasks and compute the necessary poll timeout.  These recurring tasks
 | 
	
		
			
				|  |  | +   include periodically fetching the directory, timing out unused
 | 
	
		
			
				|  |  | +   circuits, incrementing flow control windows and re-enabling connections
 | 
	
		
			
				|  |  | +   that were blocking for more bandwidth, and maintaining statistics.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   A word about TLS: Using TLS on OR connections complicates matters in
 | 
	
		
			
				|  |  | +   two ways.  First, a TLS stream has its own read buffer independent of
 | 
	
		
			
				|  |  | +   the connection's read buffer.  (TLS needs to read an entire frame from
 | 
	
		
			
				|  |  | +   the network before it can decrypt any data.  Thus, trying to read 1
 | 
	
		
			
				|  |  | +   byte from TLS can require that several KB be read from the network and
 | 
	
		
			
				|  |  | +   decrypted.  The extra data is stored in TLS's decrypt buffer.)  Second,
 | 
	
		
			
				|  |  | +   the TLS stream's events do not correspond directly to network events:
 | 
	
		
			
				|  |  | +   sometimes, before a TLS stream can read, the network must be ready to
 | 
	
		
			
				|  |  | +   write -- or vice versa.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   [XXXX describe the consequences of this for OR connections.]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1.5. How data flows (An illustration.)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   Suppose an OR receives 50 bytes along an OR connection.  These 50 bytes
 | 
	
		
			
				|  |  | +   complete a data relay cell, which gets decrypted and delivered to an
 | 
	
		
			
				|  |  | +   edge connection.  Here we give a possible call sequence for the
 | 
	
		
			
				|  |  | +   delivery of this data.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   (This may be outdated quickly.)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   do_main_loop -- Calls poll(2), receives a POLLIN event on a struct
 | 
	
		
			
				|  |  | +                 pollfd, then calls:
 | 
	
		
			
				|  |  | +    conn_read -- Looks up the corresponding connection_t, and calls:
 | 
	
		
			
				|  |  | +     connection_handle_read -- Calls:
 | 
	
		
			
				|  |  | +      connection_read_to_buf -- Notices that it has an OR connection so:
 | 
	
		
			
				|  |  | +       read_to_buf_tls -- Pulls data from the TLS stream onto conn->inbuf.
 | 
	
		
			
				|  |  | +      connection_process_inbuf -- Notices that it has an OR connection so:
 | 
	
		
			
				|  |  | +       connection_or_process_inbuf -- Checks whether conn is open, and calls:
 | 
	
		
			
				|  |  | +        connection_process_cell_from_inbuf -- Notices it has enough data for
 | 
	
		
			
				|  |  | +                 a cell, then calls:
 | 
	
		
			
				|  |  | +         connection_fetch_from_buf -- Pulls the cell from the buffer.
 | 
	
		
			
				|  |  | +         cell_unpack -- Decodes the raw cell into a cell_t
 | 
	
		
			
				|  |  | +         command_process_cell -- Notices it is a relay cell, so calls:
 | 
	
		
			
				|  |  | +          command_process_relay_cell -- Looks up the circuit for the cell,
 | 
	
		
			
				|  |  | +                 makes sure the circuit is live, then passes the cell to:
 | 
	
		
			
				|  |  | +           circuit_deliver_relay_cell -- Passes the cell to each of: 
 | 
	
		
			
				|  |  | +            relay_crypt -- Strips a layer of encryption from the cell and
 | 
	
		
			
				|  |  | +                 notice that the cell is for local delivery.
 | 
	
		
			
				|  |  | +            connection_edge_process_relay_cell -- extracts the cell's
 | 
	
		
			
				|  |  | +                 relay command, and makes sure the edge connection is
 | 
	
		
			
				|  |  | +                 open.  Since it has a DATA cell and an open connection,
 | 
	
		
			
				|  |  | +                 calls:
 | 
	
		
			
				|  |  | +             circuit_consider_sending_sendme -- [XXX]
 | 
	
		
			
				|  |  | +             connection_write_to_buf -- To place the data on the outgoing
 | 
	
		
			
				|  |  | +                 buffer of the correct edge connection, by calling:
 | 
	
		
			
				|  |  | +              connection_start_writing -- To tell the main poll loop about
 | 
	
		
			
				|  |  | +                 the pending data.
 | 
	
		
			
				|  |  | +              write_to_buf -- To actually place the outgoing data on the
 | 
	
		
			
				|  |  | +                 edge connection.
 | 
	
		
			
				|  |  | +             connection_consider_sending_sendme -- [XXX]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   [In a subsequent iteration, main notices that the edge connection is
 | 
	
		
			
				|  |  | +    ready for writing.]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   do_main_loop -- Calls poll(2), receives a POLLOUT event on a struct
 | 
	
		
			
				|  |  | +                 pollfd, then calls:
 | 
	
		
			
				|  |  | +    conn_write -- Looks up the corresponding connection_t, and calls:
 | 
	
		
			
				|  |  | +     connection_handle_write -- This isn't a TLS connection, so calls:
 | 
	
		
			
				|  |  | +      flush_buf -- Delivers data from the edge connection's outbuf to the
 | 
	
		
			
				|  |  | +                 network.
 | 
	
		
			
				|  |  | +      connection_wants_to_flush -- Reports that all data has been flushed.
 | 
	
		
			
				|  |  | +      connection_finished_flushing -- Notices the connection is an exit,
 | 
	
		
			
				|  |  | +                 and calls:
 | 
	
		
			
				|  |  | +       connection_edge_finished_flushing -- The connection is open, so it
 | 
	
		
			
				|  |  | +                 calls:
 | 
	
		
			
				|  |  | +        connection_stop_writing -- Tells the main poll loop that this
 | 
	
		
			
				|  |  | +                 connection has no more data to write.
 | 
	
		
			
				|  |  | +        connection_consider_sending_sendme -- [XXX]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1.6. Routers, descriptors, and directories
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   All Tor processes need to keep track of a list of onion routers, for
 | 
	
		
			
				|  |  | +   several reasons:
 | 
	
		
			
				|  |  | +       - OPs need to establish connections and circuits to ORs.
 | 
	
		
			
				|  |  | +       - ORs need to establish connections to other ORs.
 | 
	
		
			
				|  |  | +       - OPs and ORs need to fetch directories from a directory servers.
 | 
	
		
			
				|  |  | +       - ORs need to upload their descriptors to directory servers.
 | 
	
		
			
				|  |  | +       - Directory servers need to know which ORs are allowed onto the
 | 
	
		
			
				|  |  | +         network, what the descriptors are for those ORs, and which of
 | 
	
		
			
				|  |  | +         those ORs are currently live.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   Thus, every Tor process keeps track of a list of all the ORs it knows
 | 
	
		
			
				|  |  | +   in a static variable 'directory' in the routers.c module.  This
 | 
	
		
			
				|  |  | +   variable contains a routerinfo_t object for each known OR. On startup,
 | 
	
		
			
				|  |  | +   the directory is initialized to a list of known directory servers (via
 | 
	
		
			
				|  |  | +   router_get_list_from_file()).  Later, the directory is updated via
 | 
	
		
			
				|  |  | +   router_get_dir_from_string().  (OPs and ORs retrieve fresh directories
 | 
	
		
			
				|  |  | +   from directory servers; directory servers generate their own.)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   Every OR must periodically regenerate a router descriptor for itself.
 | 
	
		
			
				|  |  | +   The descriptor and the corresponding routerinfo_t are stored in the
 | 
	
		
			
				|  |  | +   'desc_routerinfo' and 'descriptor' static variables in routers.c.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +   Additionally, a directory server keeps track of a list of the
 | 
	
		
			
				|  |  | +   router descriptors it knows in a separte list in dirserv.c.  It
 | 
	
		
			
				|  |  | +   uses this list, plus the open connections in main.c, to build
 | 
	
		
			
				|  |  | +   directories.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1.7. Data model
 | 
	
		
			
				|  |  | +  
 | 
	
		
			
				|  |  | +  [XXX]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +1.8. Flow control
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  [XXX]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +2. Coding conventions
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +2.1. Details
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Use tor_malloc, tor_strdup, and tor_gettimeofday instead of their
 | 
	
		
			
				|  |  | +  generic equivalents.  (They always succeed or exit.)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Use INLINE instead of 'inline', so that we work properly on windows.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +2.2. Calling and naming conventions
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Whenever possible, functions should return -1 on error and and 0 on
 | 
	
		
			
				|  |  | +  success.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  For multi-word identifiers, use lowercase words combined with
 | 
	
		
			
				|  |  | +  underscores. (e.g., "multi_word_identifier").  Use ALL_CAPS for macros and
 | 
	
		
			
				|  |  | +  constants.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Typenames should end with "_t".
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Function names should be prefixed with a module name or object name.  (In
 | 
	
		
			
				|  |  | +  general, code to manipulate an object should be a module with the same
 | 
	
		
			
				|  |  | +  name as the object, so it's hard to tell which convention is used.)
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Functions that do things should have imperative-verb names
 | 
	
		
			
				|  |  | +  (e.g. buffer_clear, buffer_resize); functions that return booleans should
 | 
	
		
			
				|  |  | +  have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +2.3. What To Optimize
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Don't optimize anything if it's not in the critical path.  Right now,
 | 
	
		
			
				|  |  | +  the critical path seems to be AES, logging, and the network itself.
 | 
	
		
			
				|  |  | +  Feel free to do your own profiling to determine otherwise.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +2.4. Log conventions
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  Log convention: use only these four log severities.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +    ERR is if something fatal just happened.
 | 
	
		
			
				|  |  | +    WARNING is something bad happened, but we're still running. The
 | 
	
		
			
				|  |  | +      bad thing is either a bug in the code, an attack or buggy
 | 
	
		
			
				|  |  | +      protocol/implementation of the remote peer, etc. The operator should
 | 
	
		
			
				|  |  | +      examine the bad thing and try to correct it.
 | 
	
		
			
				|  |  | +    (No error or warning messages should be expected during normal OR or OP
 | 
	
		
			
				|  |  | +      operation.. I expect most people to run on -l warning eventually. If a
 | 
	
		
			
				|  |  | +      library function is currently called such that failure always means
 | 
	
		
			
				|  |  | +      ERR, then the library function should log WARNING and let the caller
 | 
	
		
			
				|  |  | +      log ERR.)
 | 
	
		
			
				|  |  | +    INFO means something happened (maybe bad, maybe ok), but there's nothing
 | 
	
		
			
				|  |  | +      you need to (or can) do about it.
 | 
	
		
			
				|  |  | +    DEBUG is for everything louder than INFO.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  [XXX Proposed convention: every messages of severity INFO or higher should
 | 
	
		
			
				|  |  | +  either (A) be intelligible to end-users who don't know the Tor source; or
 | 
	
		
			
				|  |  | +  (B) somehow inform the end-users that they aren't expected to understand
 | 
	
		
			
				|  |  | +  the message (perhaps with a string like "internal error").  Option (A) is
 | 
	
		
			
				|  |  | +  to be preferred to option (B). -NM]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +3. References
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  About Tor
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     See http://freehaven.net/tor/
 | 
	
		
			
				|  |  | +         http://freehaven.net/tor/cvs/doc/tor-spec.txt
 | 
	
		
			
				|  |  | +         http://freehaven.net/tor/cvs/doc/tor-dessign.tex
 | 
	
		
			
				|  |  | +         http://freehaven.net/tor/cvs/doc/FAQ
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  About anonymity
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     See http://freehaven.net/anonbib/
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +  About nonblocking IO
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +     [XXX insert references]
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +# ======================================================================
 | 
	
		
			
				|  |  | +# Old HACKING document; merge into the above, move into tor-design.tex,
 | 
	
		
			
				|  |  | +# or delete.
 | 
	
		
			
				|  |  | +# ======================================================================
 | 
	
		
			
				|  |  |  The pieces.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |    Routers. Onion routers, as far as the 'tor' program is concerned,
 | 
	
	
		
			
				|  | @@ -99,20 +506,6 @@ Robustness features.
 | 
	
		
			
				|  |  |    Currently the code tries for the primary router first, and if it's down,
 | 
	
		
			
				|  |  |    chooses the first available twin.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -Coding conventions:
 | 
	
		
			
				|  |  | -
 | 
	
		
			
				|  |  | - Log convention: use only these four log severities.
 | 
	
		
			
				|  |  | -
 | 
	
		
			
				|  |  | -  ERR is if something fatal just happened.
 | 
	
		
			
				|  |  | -  WARNING is something bad happened, but we're still running. The
 | 
	
		
			
				|  |  | -    bad thing is either a bug in the code, an attack or buggy
 | 
	
		
			
				|  |  | -    protocol/implementation of the remote peer, etc. The operator should
 | 
	
		
			
				|  |  | -    examine the bad thing and try to correct it.
 | 
	
		
			
				|  |  | -  (No error or warning messages should be expected. I expect most people
 | 
	
		
			
				|  |  | -    to run on -l warning eventually. If a library function is currently
 | 
	
		
			
				|  |  | -    called such that failure always means ERR, then the library function
 | 
	
		
			
				|  |  | -    should log WARNING and let the caller log ERR.)
 | 
	
		
			
				|  |  | -  INFO means something happened (maybe bad, maybe ok), but there's nothing
 | 
	
		
			
				|  |  | -    you need to (or can) do about it.
 | 
	
		
			
				|  |  | -  DEBUG is for everything louder than INFO.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  |  
 |