| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113 | 
							
- 0. Intro.
 
- Onion Routing is still very much in development stages. This document
 
- aims to get you started in the right direction if you want to understand
 
- the code, add features, fix bugs, etc.
 
- Read the README file first, so you can get familiar with the basics.
 
- 1. The programs.
 
- 1.1. "or". This is the main program here. It functions as either a server
 
- or a client, depending on which config file you give it.
 
- 1.2. "orkeygen". Use "orkeygen file-for-privkey file-for-pubkey" to
 
- generate key files for an onion router.
 
- 2. The pieces.
 
- 2.1. Routers. Onion routers, as far as the 'or' program is concerned,
 
- are a bunch of data items that are loaded into the router_array when
 
- the program starts. Periodically it downloads a new set of routers
 
- from a directory server, and updates the router_array. When a new OR
 
- connection is started (see below), the relevant information is copied
 
- from the router struct to the connection struct.
 
- 2.2. Connections. A connection is a long-standing tcp socket between
 
- nodes. A connection is named based on what it's connected to -- an "OR
 
- connection" has an onion router on the other end, an "OP connection" has
 
- an onion proxy on the other end, an "exit connection" has a website or
 
- other server on the other end, and an "AP connection" has an application
 
- proxy (and thus a user) on the other end.
 
- 2.3. Circuits. A circuit is a path over the onion routing
 
- network. Applications can connect to one end of the circuit, and can
 
- create exit connections at the other end of the circuit. AP and exit
 
- connections have only one circuit associated with them (and thus these
 
- connection types are closed when the circuit is closed), whereas OP and
 
- OR connections multiplex many circuits at once, and stay standing even
 
- when there are no circuits running over them.
 
- 2.4. Topics. Topics are specific conversations between an AP and an exit.
 
- Topics are multiplexed over circuits.
 
- 2.4. Cells. Some connections, specifically OR and OP connections, speak
 
- "cells". This means that data over that connection is bundled into 256
 
- byte packets (8 bytes of header and 248 bytes of payload). Each cell has
 
- a type, or "command", which indicates what it's for.
 
- 3. Important parameters in the code.
 
- 4. Robustness features.
 
- 4.1. Bandwidth throttling. Each cell-speaking connection has a maximum
 
- bandwidth it can use, as specified in the routers.or file. Bandwidth
 
- throttling can occur on both the sender side and the receiving side. If
 
- the LinkPadding option is on, the sending side sends cells at regularly
 
- spaced intervals (e.g., a connection with a bandwidth of 25600B/s would
 
- queue a cell every 10ms). The receiving side protects against misbehaving
 
- servers that send cells more frequently, by using a simple token bucket:
 
- Each connection has a token bucket with a specified capacity. Tokens are
 
- added to the bucket each second (when the bucket is full, new tokens
 
- are discarded.) Each token represents permission to receive one byte
 
- from the network --- to receive a byte, the connection must remove a
 
- token from the bucket. Thus if the bucket is empty, that connection must
 
- wait until more tokens arrive. The number of tokens we add enforces a
 
- longterm average rate of incoming bytes, yet we still permit short-term
 
- bursts above the allowed bandwidth. Currently bucket sizes are set to
 
- ten seconds worth of traffic.
 
- The bandwidth throttling uses TCP to push back when we stop reading.
 
- We extend it with token buckets to allow more flexibility for traffic
 
- bursts.
 
- 4.2. Data congestion control. Even with the above bandwidth throttling,
 
- we still need to worry about congestion, either accidental or intentional.
 
- If a lot of people make circuits into same node, and they all come out
 
- through the same connection, then that connection may become saturated
 
- (be unable to send out data cells as quickly as it wants to). An adversary
 
- can make a 'put' request through the onion routing network to a webserver
 
- he owns, and then refuse to read any of the bytes at the webserver end
 
- of the circuit. These bottlenecks can propagate back through the entire
 
- network, mucking up everything.
 
- (See the tor-spec.txt document for details of how congestion control
 
- works.)
 
- In practice, all the nodes in the circuit maintain a receive window
 
- close to maximum except the exit node, which stays around 0, periodically
 
- receiving a sendme and reading more data cells from the webserver.
 
- In this way we can use pretty much all of the available bandwidth for
 
- data, but gracefully back off when faced with multiple circuits (a new
 
- sendme arrives only after some cells have traversed the entire network),
 
- stalled network connections, or attacks.
 
- We don't need to reimplement full tcp windows, with sequence numbers,
 
- the ability to drop cells when we're full etc, because the tcp streams
 
- already guarantee in-order delivery of each cell. Rather than trying
 
- to build some sort of tcp-on-tcp scheme, we implement this minimal data
 
- congestion control; so far it's enough.
 
- 4.3. Router twins. In many cases when we ask for a router with a given
 
- address and port, we really mean a router who knows a given key. Router
 
- twins are two or more routers that share the same private key. We thus
 
- give routers extra flexibility in choosing the next hop in the circuit: if
 
- some of the twins are down or slow, it can choose the more available ones.
 
- Currently the code tries for the primary router first, and if it's down,
 
- chooses the first available twin.
 
 
  |