Browse Source

add log convention to hacking file

this thing needs to get revamped into a 'guide to tor' document


svn:r534
Roger Dingledine 22 years ago
parent
commit
22526c62a5
1 changed files with 109 additions and 104 deletions
  1. 109 104
      doc/HACKING

+ 109 - 104
doc/HACKING

@@ -6,108 +6,113 @@ the code, add features, fix bugs, etc.
 
 
 Read the README file first, so you can get familiar with the basics.
 Read the README file first, so you can get familiar with the basics.
 
 
-1. The programs.
+The pieces.
-
+
-1.1. "or". This is the main program here. It functions as either a server
+  Routers. Onion routers, as far as the 'tor' program is concerned,
-or a client, depending on which config file you give it.
+  are a bunch of data items that are loaded into the router_array when
-
+  the program starts. Periodically it downloads a new set of routers
-1.2. "orkeygen". Use "orkeygen file-for-privkey file-for-pubkey" to
+  from a directory server, and updates the router_array. When a new OR
-generate key files for an onion router.
+  connection is started (see below), the relevant information is copied
-
+  from the router struct to the connection struct.
-2. The pieces.
+
-
+  Connections. A connection is a long-standing tcp socket between
-2.1. Routers. Onion routers, as far as the 'or' program is concerned,
+  nodes. A connection is named based on what it's connected to -- an "OR
-are a bunch of data items that are loaded into the router_array when
+  connection" has an onion router on the other end, an "OP connection" has
-the program starts. Periodically it downloads a new set of routers
+  an onion proxy on the other end, an "exit connection" has a website or
-from a directory server, and updates the router_array. When a new OR
+  other server on the other end, and an "AP connection" has an application
-connection is started (see below), the relevant information is copied
+  proxy (and thus a user) on the other end.
-from the router struct to the connection struct.
+
-
+  Circuits. A circuit is a path over the onion routing
-2.2. Connections. A connection is a long-standing tcp socket between
+  network. Applications can connect to one end of the circuit, and can
-nodes. A connection is named based on what it's connected to -- an "OR
+  create exit connections at the other end of the circuit. AP and exit
-connection" has an onion router on the other end, an "OP connection" has
+  connections have only one circuit associated with them (and thus these
-an onion proxy on the other end, an "exit connection" has a website or
+  connection types are closed when the circuit is closed), whereas OP and
-other server on the other end, and an "AP connection" has an application
+  OR connections multiplex many circuits at once, and stay standing even
-proxy (and thus a user) on the other end.
+  when there are no circuits running over them.
-
+
-2.3. Circuits. A circuit is a path over the onion routing
+  Streams. Streams are specific conversations between an AP and an exit.
-network. Applications can connect to one end of the circuit, and can
+  Streams are multiplexed over circuits.
-create exit connections at the other end of the circuit. AP and exit
+
-connections have only one circuit associated with them (and thus these
+  Cells. Some connections, specifically OR and OP connections, speak
-connection types are closed when the circuit is closed), whereas OP and
+  "cells". This means that data over that connection is bundled into 256
-OR connections multiplex many circuits at once, and stay standing even
+  byte packets (8 bytes of header and 248 bytes of payload). Each cell has
-when there are no circuits running over them.
+  a type, or "command", which indicates what it's for.
-
+
-2.4. Topics. Topics are specific conversations between an AP and an exit.
+Robustness features.
-Topics are multiplexed over circuits.
+
-
+[XXX no longer up to date]
-2.4. Cells. Some connections, specifically OR and OP connections, speak
+ Bandwidth throttling. Each cell-speaking connection has a maximum
-"cells". This means that data over that connection is bundled into 256
+  bandwidth it can use, as specified in the routers.or file. Bandwidth
-byte packets (8 bytes of header and 248 bytes of payload). Each cell has
+  throttling can occur on both the sender side and the receiving side. If
-a type, or "command", which indicates what it's for.
+  the LinkPadding option is on, the sending side sends cells at regularly
-
+  spaced intervals (e.g., a connection with a bandwidth of 25600B/s would
-
+  queue a cell every 10ms). The receiving side protects against misbehaving
-3. Important parameters in the code.
+  servers that send cells more frequently, by using a simple token bucket:
-
+
-
+  Each connection has a token bucket with a specified capacity. Tokens are
-
+  added to the bucket each second (when the bucket is full, new tokens
-4. Robustness features.
+  are discarded.) Each token represents permission to receive one byte
-
+  from the network --- to receive a byte, the connection must remove a
-4.1. Bandwidth throttling. Each cell-speaking connection has a maximum
+  token from the bucket. Thus if the bucket is empty, that connection must
-bandwidth it can use, as specified in the routers.or file. Bandwidth
+  wait until more tokens arrive. The number of tokens we add enforces a
-throttling can occur on both the sender side and the receiving side. If
+  longterm average rate of incoming bytes, yet we still permit short-term
-the LinkPadding option is on, the sending side sends cells at regularly
+  bursts above the allowed bandwidth. Currently bucket sizes are set to
-spaced intervals (e.g., a connection with a bandwidth of 25600B/s would
+  ten seconds worth of traffic.
-queue a cell every 10ms). The receiving side protects against misbehaving
+
-servers that send cells more frequently, by using a simple token bucket:
+  The bandwidth throttling uses TCP to push back when we stop reading.
-
+  We extend it with token buckets to allow more flexibility for traffic
-Each connection has a token bucket with a specified capacity. Tokens are
+  bursts.
-added to the bucket each second (when the bucket is full, new tokens
+
-are discarded.) Each token represents permission to receive one byte
+ Data congestion control. Even with the above bandwidth throttling,
-from the network --- to receive a byte, the connection must remove a
+  we still need to worry about congestion, either accidental or intentional.
-token from the bucket. Thus if the bucket is empty, that connection must
+  If a lot of people make circuits into same node, and they all come out
-wait until more tokens arrive. The number of tokens we add enforces a
+  through the same connection, then that connection may become saturated
-longterm average rate of incoming bytes, yet we still permit short-term
+  (be unable to send out data cells as quickly as it wants to). An adversary
-bursts above the allowed bandwidth. Currently bucket sizes are set to
+  can make a 'put' request through the onion routing network to a webserver
-ten seconds worth of traffic.
+  he owns, and then refuse to read any of the bytes at the webserver end
-
+  of the circuit. These bottlenecks can propagate back through the entire
-The bandwidth throttling uses TCP to push back when we stop reading.
+  network, mucking up everything.
-We extend it with token buckets to allow more flexibility for traffic
+
-bursts.
+  (See the tor-spec.txt document for details of how congestion control
-
+  works.)
-4.2. Data congestion control. Even with the above bandwidth throttling,
+
-we still need to worry about congestion, either accidental or intentional.
+  In practice, all the nodes in the circuit maintain a receive window
-If a lot of people make circuits into same node, and they all come out
+  close to maximum except the exit node, which stays around 0, periodically
-through the same connection, then that connection may become saturated
+  receiving a sendme and reading more data cells from the webserver.
-(be unable to send out data cells as quickly as it wants to). An adversary
+  In this way we can use pretty much all of the available bandwidth for
-can make a 'put' request through the onion routing network to a webserver
+  data, but gracefully back off when faced with multiple circuits (a new
-he owns, and then refuse to read any of the bytes at the webserver end
+  sendme arrives only after some cells have traversed the entire network),
-of the circuit. These bottlenecks can propagate back through the entire
+  stalled network connections, or attacks.
-network, mucking up everything.
+
-
+  We don't need to reimplement full tcp windows, with sequence numbers,
-(See the tor-spec.txt document for details of how congestion control
+  the ability to drop cells when we're full etc, because the tcp streams
-works.)
+  already guarantee in-order delivery of each cell. Rather than trying
-
+  to build some sort of tcp-on-tcp scheme, we implement this minimal data
-In practice, all the nodes in the circuit maintain a receive window
+  congestion control; so far it's enough.
-close to maximum except the exit node, which stays around 0, periodically
+
-receiving a sendme and reading more data cells from the webserver.
+ Router twins. In many cases when we ask for a router with a given
-In this way we can use pretty much all of the available bandwidth for
+  address and port, we really mean a router who knows a given key. Router
-data, but gracefully back off when faced with multiple circuits (a new
+  twins are two or more routers that share the same private key. We thus
-sendme arrives only after some cells have traversed the entire network),
+  give routers extra flexibility in choosing the next hop in the circuit: if
-stalled network connections, or attacks.
+  some of the twins are down or slow, it can choose the more available ones.
-
+
-We don't need to reimplement full tcp windows, with sequence numbers,
+  Currently the code tries for the primary router first, and if it's down,
-the ability to drop cells when we're full etc, because the tcp streams
+  chooses the first available twin.
-already guarantee in-order delivery of each cell. Rather than trying
+
-to build some sort of tcp-on-tcp scheme, we implement this minimal data
+Coding conventions:
-congestion control; so far it's enough.
+
-
+ Log convention: use only these four log severities.
-4.3. Router twins. In many cases when we ask for a router with a given
+
-address and port, we really mean a router who knows a given key. Router
+  ERR is if something fatal just happened.
-twins are two or more routers that share the same private key. We thus
+  WARNING is something bad happened, but we're still running. The
-give routers extra flexibility in choosing the next hop in the circuit: if
+    bad thing is either a bug in the code, an attack or buggy
-some of the twins are down or slow, it can choose the more available ones.
+    protocol/implementation of the remote peer, etc. The operator should
-
+    examine the bad thing and try to correct it.
-Currently the code tries for the primary router first, and if it's down,
+  (No error or warning messages should be expected. I expect most people
-chooses the first available twin.
+    to run on -l warning eventually. If a library function is currently
+    called such that failure always means ERR, then the library function
+    should log WARNING and let the caller log ERR.)
+  INFO means something happened (maybe bad, maybe ok), but there's nothing
+    you need to (or can) do about it.
+  DEBUG is for everything louder than INFO.