HACKING 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. 0. Intro.
  2. Onion Routing is still very much in development stages. This document
  3. aims to get you started in the right direction if you want to understand
  4. the code, add features, fix bugs, etc.
  5. Read the README file first, so you can get familiar with the basics.
  6. 1. The programs.
  7. 1.1. "or". This is the main program here. It functions as either a server
  8. or a client, depending on which config file you give it.
  9. 1.2. "orkeygen". Use "orkeygen file-for-privkey file-for-pubkey" to
  10. generate key files for an onion router.
  11. 2. The pieces.
  12. 2.1. Routers. Onion routers, as far as the 'or' program is concerned,
  13. are a bunch of data items that are loaded into the router_array when
  14. the program starts. Periodically it downloads a new set of routers
  15. from a directory server, and updates the router_array. When a new OR
  16. connection is started (see below), the relevant information is copied
  17. from the router struct to the connection struct.
  18. 2.2. Connections. A connection is a long-standing tcp socket between
  19. nodes. A connection is named based on what it's connected to -- an "OR
  20. connection" has an onion router on the other end, an "OP connection" has
  21. an onion proxy on the other end, an "exit connection" has a website or
  22. other server on the other end, and an "AP connection" has an application
  23. proxy (and thus a user) on the other end.
  24. 2.3. Circuits. A circuit is a path over the onion routing
  25. network. Applications can connect to one end of the circuit, and can
  26. create exit connections at the other end of the circuit. AP and exit
  27. connections have only one circuit associated with them (and thus these
  28. connection types are closed when the circuit is closed), whereas OP and
  29. OR connections multiplex many circuits at once, and stay standing even
  30. when there are no circuits running over them.
  31. 2.4. Topics. Topics are specific conversations between an AP and an exit.
  32. Topics are multiplexed over circuits.
  33. 2.4. Cells. Some connections, specifically OR and OP connections, speak
  34. "cells". This means that data over that connection is bundled into 256
  35. byte packets (8 bytes of header and 248 bytes of payload). Each cell has
  36. a type, or "command", which indicates what it's for.
  37. 3. Important parameters in the code.
  38. 4. Robustness features.
  39. 4.1. Bandwidth throttling. Each cell-speaking connection has a maximum
  40. bandwidth it can use, as specified in the routers.or file. Bandwidth
  41. throttling can occur on both the sender side and the receiving side. If
  42. the LinkPadding option is on, the sending side sends cells at regularly
  43. spaced intervals (e.g., a connection with a bandwidth of 25600B/s would
  44. queue a cell every 10ms). The receiving side protects against misbehaving
  45. servers that send cells more frequently, by using a simple token bucket:
  46. Each connection has a token bucket with a specified capacity. Tokens are
  47. added to the bucket each second (when the bucket is full, new tokens
  48. are discarded.) Each token represents permission to receive one byte
  49. from the network --- to receive a byte, the connection must remove a
  50. token from the bucket. Thus if the bucket is empty, that connection must
  51. wait until more tokens arrive. The number of tokens we add enforces a
  52. longterm average rate of incoming bytes, yet we still permit short-term
  53. bursts above the allowed bandwidth. Currently bucket sizes are set to
  54. ten seconds worth of traffic.
  55. The bandwidth throttling uses TCP to push back when we stop reading.
  56. We extend it with token buckets to allow more flexibility for traffic
  57. bursts.
  58. 4.2. Data congestion control. Even with the above bandwidth throttling,
  59. we still need to worry about congestion, either accidental or intentional.
  60. If a lot of people make circuits into same node, and they all come out
  61. through the same connection, then that connection may become saturated
  62. (be unable to send out data cells as quickly as it wants to). An adversary
  63. can make a 'put' request through the onion routing network to a webserver
  64. he owns, and then refuse to read any of the bytes at the webserver end
  65. of the circuit. These bottlenecks can propagate back through the entire
  66. network, mucking up everything.
  67. (See the tor-spec.txt document for details of how congestion control
  68. works.)
  69. In practice, all the nodes in the circuit maintain a receive window
  70. close to maximum except the exit node, which stays around 0, periodically
  71. receiving a sendme and reading more data cells from the webserver.
  72. In this way we can use pretty much all of the available bandwidth for
  73. data, but gracefully back off when faced with multiple circuits (a new
  74. sendme arrives only after some cells have traversed the entire network),
  75. stalled network connections, or attacks.
  76. We don't need to reimplement full tcp windows, with sequence numbers,
  77. the ability to drop cells when we're full etc, because the tcp streams
  78. already guarantee in-order delivery of each cell. Rather than trying
  79. to build some sort of tcp-on-tcp scheme, we implement this minimal data
  80. congestion control; so far it's enough.
  81. 4.3. Router twins. In many cases when we ask for a router with a given
  82. address and port, we really mean a router who knows a given key. Router
  83. twins are two or more routers that share the same private key. We thus
  84. give routers extra flexibility in choosing the next hop in the circuit: if
  85. some of the twins are down or slow, it can choose the more available ones.
  86. Currently the code tries for the primary router first, and if it's down,
  87. chooses the first available twin.