| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117 | 
0. Intro.Onion Routing is still very much in development stages. This documentaims to get you started in the right direction if you want to understandthe code, add features, fix bugs, etc.Read the README file first, so you can get familiar with the basics.1. The programs.1.1. "or". This is the main program here. It functions as both a serverand a client, depending on which config file you give it. ...2. The pieces.2.1. Routers. Onion routers, as far as the 'or' program is concerned,are a bunch of data items that are loaded into the router_array whenthe program starts. After it's loaded, the router information is neverchanged. When a new OR connection is started (see below), the relevantinformation is copied from the router struct to the connection struct.2.2. Connections. A connection is a long-standing tcp socket betweennodes. A connection is named based on what it's connected to -- an "ORconnection" has an onion router on the other end, an "OP connection" hasan onion proxy on the other end, an "exit connection" has a website orother server on the other end, and an "AP connection" has an applicationproxy (and thus a user) on the other end.2.3. Circuits. A circuit is a single conversation between twoparticipants over the onion routing network. One end of the circuit hasan AP connection, and the other end has an exit connection. AP and exitconnections have only one circuit associated with them (and thus theseconnection types are closed when the circuit is closed), whereas OP andOR connections multiplex many circuits at once, and stay standing evenwhen there are no circuits running over them.2.4. Cells. Some connections, specifically OR and OP connections, speak"cells". This means that data over that connection is bundled into 128byte packets (8 bytes of header and 120 bytes of payload). Each cell hasa type, or "command", which indicates what it's for.3. Important parameters in the code.3.1. Role.4. Robustness features.4.1. Bandwidth throttling. Each cell-speaking connection has a maximumbandwidth it can use, as specified in the routers.or file. Bandwidththrottling occurs on both the sender side and the receiving side. Thesending side sends cells at regularly spaced intervals (e.g., a connectionwith a bandwidth of 12800B/s would queue a cell every 10ms). The receivingside protects against misbehaving servers that send cells more frequently,by using a simple token bucket:Each connection has a token bucket with a specified capacity. Tokens areadded to the bucket each second (when the bucket is full, new tokensare discarded.) Each token represents permission to receive one bytefrom the network --- to receive a byte, the connection must remove atoken from the bucket. Thus if the bucket is empty, that connection mustwait until more tokens arrive. The number of tokens we add enforces alongterm average rate of incoming bytes, yet we still permit short-termbursts above the allowed bandwidth. Currently bucket sizes are set toten seconds worth of traffic.The bandwidth throttling uses TCP to push back when we stop reading.We extend it with token buckets to allow more flexibility for trafficbursts.4.2. Data congestion control. Even with the above bandwidth throttling,we still need to worry about congestion, either accidental or intentional.If a lot of people make circuits into same node, and they all come outthrough the same connection, then that connection may become saturated(be unable to send out data cells as quickly as it wants to). An adversarycan make a 'put' request through the onion routing network to a webserverhe owns, and then refuse to read any of the bytes at the webserver endof the circuit. These bottlenecks can propagate back through the entirenetwork, mucking up everything.To handle this congestion, each circuit starts out with a receivewindow at each node of 100 cells -- it is willing to receive at most 100cells on that circuit. (It handles each direction separately; so that'sreally 100 cells forward and 100 cells back.) The edge of the circuitis willing to create at most 100 cells from data coming from outside theonion routing network. Nodes in the middle of the circuit will tear downthe circuit if a data cell arrives when the receive window is 0. Whendata has traversed the network, the edge node buffers it on its outbuf,and evaluates whether to respond with a 'sendme' acknowledgement: if itsoutbuf is not too full, and its receive window is less than 90, then itqueues a 'sendme' cell backwards in the circuit. Each node that receivesthe sendme increments its window by 10 and passes the cell onward.In practice, all the nodes in the circuit maintain a receive windowclose to 100 except the exit node, which stays around 0, periodicallyreceiving a sendme and reading 10 more data cells from the webserver.In this way we can use pretty much all of the available bandwidth fordata, but gracefully back off when faced with multiple circuits (a newsendme arrives only after some cells have traversed the entire network),stalled network connections, or attacks.We don't need to reimplement full tcp windows, with sequence numbers,the ability to drop cells when we're full etc, because the tcp streamsalready guarantee in-order delivery of each cell. Rather than tryingto build some sort of tcp-on-tcp scheme, we implement this minimal datacongestion control; so far it's enough.4.3. Router twins. In many cases when we ask for a router with a givenaddress and port, we really mean a router who knows a given key. Routertwins are two or more routers that all share the same private key. We thusgive routers extra flexibility in choosing the next hop in the circuit: ifsome of the twins are down or slow, it can choose the more available ones.Currently the code tries for the primary router first, and if it's down,chooses the first available twin.
 |