%!s(int64=5) %!d(string=hai) anos · 469051f650
--- a/doc/HACKING/design/00-overview.md
+++ b/doc/HACKING/design/00-overview.md
@@ -0,0 +1,124 @@
 
				+
			
 
				+## Overview ##
			
 
				+
			
 
				+This document describes the general structure of the Tor codebase, how
			
 
				+it fits together, what functionality is available for extending Tor,
			
 
				+and gives some notes on how Tor got that way.
			
 
				+
			
 
				+Tor remains a work in progress: We've been working on it for more than a
			
 
				+decade, and we've learned a lot about good coding since we first
			
 
				+started.  This means, however, that some of the older pieces of Tor will
			
 
				+have some "code smell" in them that could sure stand a brisk
			
 
				+refactoring.  So when I describe a piece of code, I'll sometimes give a
			
 
				+note on how it got that way, and whether I still think that's a good
			
 
				+idea.
			
 
				+
			
 
				+The first drafts of this document were written in the Summer and Fall of
			
 
				+2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7
			
 
				+was under development.  If you're reading this far in the future, some
			
 
				+things may have changed.  Caveat haxxor!
			
 
				+
			
 
				+This document is not an overview of the Tor protocol.  For that, see the
			
 
				+design paper and the specifications at https://spec.torproject.org/ .
			
 
				+
			
 
				+For more information about Tor's coding standards and some helpful
			
 
				+development tools, see doc/HACKING in the Tor repository.
			
 
				+
			
 
				+For more information about writing tests, see doc/HACKING/WritingTests.txt
			
 
				+in the Tor repository.
			
 
				+
			
 
				+### The very high level ###
			
 
				+
			
 
				+Ultimately, Tor runs as an event-driven network daemon: it responds to
			
 
				+network events, signals, and timers by sending and receiving things over
			
 
				+the network.  Clients, relays, and directory authorities all use the
			
 
				+same codebase: the Tor process will run as a client, relay, or authority
			
 
				+depending on its configuration.
			
 
				+
			
 
				+Tor has a few major dependencies, including Libevent (used to tell which
			
 
				+sockets are readable and writable), OpenSSL (used for many encryption
			
 
				+functions, and to implement the TLS protocol), and zlib (used to
			
 
				+compress and uncompress directory information).
			
 
				+
			
 
				+Most of Tor's work today is done in a single event-driven main thread.
			
 
				+Tor also spawns one or more worker threads to handle CPU-intensive
			
 
				+tasks.  (Right now, this only includes circuit encryption.)
			
 
				+
			
 
				+On startup, Tor initializes its libraries, reads and responds to its
			
 
				+configuration files, and launches a main event loop.  At first, the only
			
 
				+events that Tor listens for are a few signals (like TERM and HUP), and
			
 
				+one or more listener sockets (for different kinds of incoming
			
 
				+connections).  Tor also configures a timer function to run once per
			
 
				+second to handle periodic events.  As Tor runs over time, other events
			
 
				+will open, and new events will be scheduled.
			
 
				+
			
 
				+The codebase is divided into a few main subdirectories:
			
 
				+
			
 
				+   src/common -- utility functions, not necessarily tor-specific.
			
 
				+
			
 
				+   src/or -- implements the Tor protocols.
			
 
				+
			
 
				+   src/test -- unit and regression tests
			
 
				+
			
 
				+   src/ext -- Code maintained elsewhere that we include in the Tor
			
 
				+   source distribution.
			
 
				+
			
 
				+   src/trunnel -- automatically generated code (from the Trunnel)
			
 
				+   tool: used to parse and encode binary formats.
			
 
				+
			
 
				+### Some key high-level abstractions ###
			
 
				+
			
 
				+The most important abstractions at Tor's high-level are Connections,
			
 
				+Channels, Circuits, and Nodes.
			
 
				+
			
 
				+A 'Connection' represents a stream-based information flow.  Most
			
 
				+connections are TCP connections to remote Tor servers and clients. (But
			
 
				+as a shortcut, a relay will sometimes make a connection to itself
			
 
				+without actually using a TCP connection.  More details later on.)
			
 
				+Connections exist in different varieties, depending on what
			
 
				+functionality they provide.  The principle types of connection are
			
 
				+"edge" (eg a socks connection or a connection from an exit relay to a
			
 
				+destination), "OR" (a TLS stream connecting to a relay), "Directory" (an
			
 
				+HTTP connection to learn about the network), and "Control" (a connection
			
 
				+from a controller).
			
 
				+
			
 
				+A 'Circuit' is persistent tunnel through the Tor network, established
			
 
				+with public-key cryptography, and used to send cells one or more hops.
			
 
				+Clients keep track of multi-hop circuits, and the cryptography
			
 
				+associated with each hop.  Relays, on the other hand, keep track only of
			
 
				+their hop of each circuit.
			
 
				+
			
 
				+A 'Channel' is an abstract view of sending cells to and from a Tor
			
 
				+relay.  Currently, all channels are implemented using OR connections.
			
 
				+If we switch to other strategies in the future, we'll have more
			
 
				+connection types.
			
 
				+
			
 
				+A 'Node' is a view of a Tor instance's current knowledge and opinions
			
 
				+about a Tor relay orbridge.
			
 
				+
			
 
				+### The rest of this document. ###
			
 
				+
			
 
				+> **Note**: This section describes the eventual organization of this
			
 
				+> document, which is not yet complete.
			
 
				+
			
 
				+We'll begin with an overview of the various utility functions available
			
 
				+in Tor's 'common' directory.  Knowing about these is key to writing
			
 
				+portable, simple code in Tor.
			
 
				+
			
 
				+Then we'll go on and talk about the main data-flow of the Tor network:
			
 
				+how Tor generates and responds to network traffic.  This will occupy a
			
 
				+chapter for the main overview, with other chapters for special topics.
			
 
				+
			
 
				+After that, we'll mention the main modules in Tor, and describe the
			
 
				+function of each.
			
 
				+
			
 
				+We'll cover the directory subsystem next: how Tor learns about other
			
 
				+relays, and how relays advertise themselves.
			
 
				+
			
 
				+Then we'll cover a few specialized modules, such as hidden services,
			
 
				+sandboxing, hibernation, accounting, statistics, guards, path
			
 
				+generation, pluggable transports, and how they integrate with the rest of Tor.
			
 
				+
			
 
				+We'll close with a meandering overview of important pending issues in
			
 
				+the Tor codebase, and how they affect the future of the Tor software.
			
 
				+
			
--- a/doc/HACKING/design/01-common-utils.md
+++ b/doc/HACKING/design/01-common-utils.md
@@ -0,0 +1,121 @@
 
				+
			
 
				+## Utility code in Tor
			
 
				+
			
 
				+Most of Tor's utility code is in modules in the src/common subdirectory.
			
 
				+
			
 
				+These are divided, broadly, into _compatibility_ functions, _utility_
			
 
				+functions, _containers_, and _cryptography_.  (Someday in the future, it
			
 
				+would be great to split these modules into separate directories.  Also, some
			
 
				+functions are probably put in the wrong modules)
			
 
				+
			
 
				+### Compatibility code
			
 
				+
			
 
				+These functions live in src/common/compat\*.c; some corresponding macros live
			
 
				+in src/common/compat\*.h.  They serve as wrappers around platform-specific or
			
 
				+compiler-specific logic functionality.
			
 
				+
			
 
				+In general, the rest of the Tor code *should not* be calling platform-specific
			
 
				+or otherwise non-portable functions.  Instead, they should call wrappers from
			
 
				+compat.c, which implement a common cross-platform API.  (If you don't know
			
 
				+whether a function is portable, it's usually good enough to see whether it
			
 
				+exists on OSX, Linux, and Windows.)
			
 
				+
			
 
				+Other compatibility modules include backtrace.c, which generates stack traces
			
 
				+for crash reporting; sandbox.c, which implements the Linux seccomp2 sandbox;
			
 
				+and procmon.c, which handles monitoring a child process.
			
 
				+
			
 
				+Parts of address.c are compatibility code for handling network addressing
			
 
				+issues; other parts are in util.c.
			
 
				+
			
 
				+Notable compatibility areas are:
			
 
				+
			
 
				+   * mmap support for mapping files into the address space (read-only)
			
 
				+
			
 
				+   * Code to work around the intricacies
			
 
				+
			
 
				+   * Workaround code for Windows's horrible winsock incompatibilities and
			
 
				+     Linux's intricate socket extensions.
			
 
				+
			
 
				+   * Helpful string functions like memmem, memstr, asprintf, strlcpy, and
			
 
				+     strlcat that not all platforms have.
			
 
				+
			
 
				+   * Locale-ignoring variants of the ctypes functions.
			
 
				+
			
 
				+   * Time-manipulation functions
			
 
				+
			
 
				+   * File locking function
			
 
				+
			
 
				+   * IPv6 functions for platforms that don't have enough IPv6 support
			
 
				+
			
 
				+   * Endianness functions
			
 
				+
			
 
				+   * OS functions
			
 
				+
			
 
				+   * Threading and locking functions.
			
 
				+
			
 
				+=== Utility functions
			
 
				+
			
 
				+General-purpose utilities are in util.c; they include higher-level wrappers
			
 
				+around many of the compatibility functions to provide things like
			
 
				+file-at-once access, memory management functions, math, string manipulation,
			
 
				+time manipulation, filesystem manipulation, etc.
			
 
				+
			
 
				+(Some functionality, like daemon-launching, would be better off in a
			
 
				+compatibility module.)
			
 
				+
			
 
				+In util_format.c, we have code to implement stuff like base-32 and base-64
			
 
				+encoding.
			
 
				+
			
 
				+The address.c module interfaces with the system resolver and implements
			
 
				+address parsing and formatting functions.  It converts sockaddrs to and from
			
 
				+a more compact tor_addr_t type.
			
 
				+
			
 
				+The di_ops.c module provides constant-time comparison and associative-array
			
 
				+operations, for side-channel avoidance.
			
 
				+
			
 
				+The logging subsystem in log.c supports logging to files, to controllers, to
			
 
				+stdout/stderr, or to the system log.
			
 
				+
			
 
				+The abstraction in memarea.c is used in cases when a large amount of
			
 
				+temporary objects need to be allocated, and they can all be freed at the same
			
 
				+time.
			
 
				+
			
 
				+The torgzip.c module wraps the zlib library to implement compression.
			
 
				+
			
 
				+Workqueue.c provides a simple multithreaded work-queue implementation.
			
 
				+
			
 
				+### Containers
			
 
				+
			
 
				+The container.c module defines these container types, used throughout the Tor
			
 
				+codebase.
			
 
				+
			
 
				+There is a dynamic array called **smartlist**, used as our general resizeable
			
 
				+array type.  It supports sorting, searching, common set operations, and so
			
 
				+on.  It has specialized functions for smartlists of strings, and for
			
 
				+heap-based priority queues.
			
 
				+
			
 
				+There's a bit-array type.
			
 
				+
			
 
				+A set of mapping types to map strings, 160-bit digests, and 256-bit digests
			
 
				+to void \*.  These are what we generally use when we want O(1) lookup.
			
 
				+
			
 
				+Additionally, for containers, we use the ht.h and tor_queue.h headers, in
			
 
				+src/ext.  These provide intrusive hashtable and linked-list macros.
			
 
				+
			
 
				+###  Cryptography
			
 
				+
			
 
				+Once, we tried to keep our cryptography code in a single "crypto.c" file,
			
 
				+with an "aes.c" module containing an AES implementation for use with older
			
 
				+OpenSSLs.
			
 
				+
			
 
				+Now, our practice has become to introduce crypto_\*.c modules when adding new
			
 
				+cryptography backend code.  We have modules for Ed25519, Curve25519,
			
 
				+secret-to-key algorithms, and password-based boxed encryption.
			
 
				+
			
 
				+Our various TLS compatibility code, wrappers, and hacks are kept in
			
 
				+tortls.c, which is probably too full of Tor-specific kludges.  I'm
			
 
				+hoping we can eliminate most of those kludges when we finally remove
			
 
				+support for older versions of our TLS handshake.
			
 
				+
			
 
				+
			
 
				+
			
--- a/doc/HACKING/design/01a-memory.md
+++ b/doc/HACKING/design/01a-memory.md
@@ -0,0 +1,93 @@
 
				+
			
 
				+## Memory management
			
 
				+
			
 
				+### Heap-allocation functions
			
 
				+
			
 
				+Tor imposes a few light wrappers over C's native malloc and free
			
 
				+functions, to improve convenience, and to allow wholescale replacement
			
 
				+of malloc and free as needed.
			
 
				+
			
 
				+You should never use 'malloc', 'calloc', 'realloc, or 'free' on their
			
 
				+own; always use the variants prefixed with 'tor_'.
			
 
				+They are the same as the standard C functions, with the following
			
 
				+exceptions:
			
 
				+
			
 
				+   * tor_free(NULL) is a no-op.
			
 
				+   * tor_free() is a macro that takes an lvalue as an argument and sets it to
			
 
				+     NULL after freeing it.  To avoid this behavior, you can use tor_free_()
			
 
				+     instead.
			
 
				+   * tor_malloc() and friends fail with an assertion if they are asked to
			
 
				+     allocate a value so large that it is probably an underflow.
			
 
				+   * It is always safe to tor_malloc(0), regardless of whether your libc
			
 
				+     allows it.
			
 
				+   * tor_malloc(), tor_realloc(), and friends are never allowed to fail.
			
 
				+     Instead, Tor will die with an assertion.  This means that you never
			
 
				+     need to check their return values.  See the next subsection for
			
 
				+     information on why we think this is a good idea.
			
 
				+
			
 
				+We define additional general-purpose memory allocation functions as well:
			
 
				+
			
 
				+   * tor_malloc_zero(x) behaves as calloc(1, x), except the it makes clear
			
 
				+     the intent to allocate a single zeroed-out value.
			
 
				+   * tor_reallocarray(x,y) behaves as the OpenBSD reallocarray function.
			
 
				+     Use it for cases when you need to realloc() in a multiplication-safe
			
 
				+     way.
			
 
				+
			
 
				+And specific-purpose functions as well:
			
 
				+
			
 
				+   * tor_strdup() and tor_strndup() behaves as the underlying libc functions,
			
 
				+     but use tor_malloc() instead of the underlying function.
			
 
				+   * tor_memdup() copies a chunk of memory of a given size.
			
 
				+   * tor_memdup_nulterm() copies a chunk of memory of a given size, then
			
 
				+     NUL-terminates it just to be safe.
			
 
				+
			
 
				+#### Why assert on failure?
			
 
				+
			
 
				+Why don't we allow tor_malloc() and its allies to return NULL?
			
 
				+
			
 
				+First, it's error-prone.  Many programmers forget to check for NULL return
			
 
				+values, and testing for malloc() failures is a major pain.
			
 
				+
			
 
				+Second, it's not necessarily a great way to handle OOM conditions. It's
			
 
				+probably better (we think) to have a memory target where we dynamically free
			
 
				+things ahead of time in order to stay under the target.  Trying to respond to
			
 
				+an OOM at the point of tor_malloc() failure, on the other hand, would involve
			
 
				+a rare operation invoked from deep in the call stack.  (Again, that's
			
 
				+error-prone and hard to debug.)
			
 
				+
			
 
				+Third, thanks to the rise of Linux and other operating systems that allow
			
 
				+memory to be overcommitted, you can't actually ever rely on getting a NULL
			
 
				+from malloc() when you're out of memory; instead you have to use an approach
			
 
				+closer to tracking the total memory usage.
			
 
				+
			
 
				+#### Conventions for your own allocation functions.
			
 
				+
			
 
				+Whenever you create a new type, the convention is to give it a pair of
			
 
				+x_new() and x_free() functions, named after the type.
			
 
				+
			
 
				+Calling x_free(NULL) should always be a no-op.
			
 
				+
			
 
				+
			
 
				+### Grow-only memory allocation: memarea.c
			
 
				+
			
 
				+It's often handy to allocate a large number of tiny objects, all of which
			
 
				+need to disappear at the same time.  You can do this in tor using the
			
 
				+memarea.c abstraction, which uses a set of grow-only buffers for allocation,
			
 
				+and only supports a single "free" operation at the end.
			
 
				+
			
 
				+Using memareas also helps you avoid memory fragmentation.  You see, some libc
			
 
				+malloc implementations perform badly on the case where a large number of
			
 
				+small temporary objects are allocated at the same time as a few long-lived
			
 
				+objects of similar size.  But if you use tor_malloc() for the long-lived ones
			
 
				+and a memarea for the temporary object, the malloc implementation is likelier
			
 
				+to do better.
			
 
				+
			
 
				+To create a new memarea, use memarea_new().  To drop all the storage from a
			
 
				+memarea, and invalidate its pointers, use memarea_drop_all().
			
 
				+
			
 
				+The allocation functions memarea_alloc(), memarea_alloc_zero(),
			
 
				+memarea_memdup(), memarea_strdup(), and memarea_strndup() are analogous to
			
 
				+the similarly-named malloc() functions.  There is intentionally no
			
 
				+memarea_free() or memarea_realloc().
			
 
				+
			
 
				+
			
--- a/doc/HACKING/design/01b-collections.md
+++ b/doc/HACKING/design/01b-collections.md
@@ -0,0 +1,43 @@
 
				+
			
 
				+## Collections in tor
			
 
				+
			
 
				+### Smartlists: Neither lists, nor especially smart.
			
 
				+
			
 
				+For historical reasons, we call our dynamic-allocated array type
			
 
				+"smartlist_t".  It can grow or shrink as elements are added and removed.
			
 
				+
			
 
				+All smartlists hold an array of void \*.  Whenever you expose a smartlist
			
 
				+in an API you *must* document which types its pointers actually hold.
			
 
				+
			
 
				+<!-- It would be neat to fix that, wouldn't it? -NM  -->
			
 
				+
			
 
				+Smartlists are created empty with smartlist_new() and freed with
			
 
				+smartlist_free().  See the containers.h module documentation for more
			
 
				+information; there are many convenience functions for commonly needed
			
 
				+operations.
			
 
				+
			
 
				+
			
 
				+### Digest maps, string maps, and more.
			
 
				+
			
 
				+Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
			
 
				+or nul-terminated strings to void \*. These types are digestmap_t,
			
 
				+digest256map_t, and strmap_t respectively.  See the containers.h
			
 
				+module documentation for more information.
			
 
				+
			
 
				+
			
 
				+### Intrusive lists and hashtables
			
 
				+
			
 
				+For performance-sensitive cases, we sometimes want to use "intrusive"
			
 
				+collections: ones where the bookkeeping pointers are stuck inside the
			
 
				+structures that belong to the collection.  If you've used the
			
 
				+BSD-style sys/queue.h macros, you'll be familiar with these.
			
 
				+
			
 
				+Unfortunately, the sys/queue.h macros vary significantly between the
			
 
				+platforms that have them, so we provide our own variants in
			
 
				+src/ext/tor_queue.h .
			
 
				+
			
 
				+We also provide an intrusive hashtable implementation in src/ext/ht.h
			
 
				+. When you're using it, you'll need to define your own hash
			
 
				+functions. If attacker-induced collisions are a worry here, use the
			
 
				+cryptographic siphash24g function to extract hashes.
			
 
				+
			
--- a/doc/HACKING/design/01c-time.md
+++ b/doc/HACKING/design/01c-time.md
@@ -0,0 +1,75 @@
 
				+
			
 
				+## Time in tor ##
			
 
				+
			
 
				+### What time is it? ###
			
 
				+
			
 
				+We have several notions of the current time in Tor.
			
 
				+
			
 
				+The *wallclock time* is available from time(NULL) with
			
 
				+second-granularity and tor_gettimeofday() with microsecond
			
 
				+granularity.  It corresponds most closely to "the current time and date".
			
 
				+
			
 
				+The *monotonic time* is available with the set of monotime_\*
			
 
				+functions declared in compat_time.h.  Unlike the wallclock time, it
			
 
				+can only move forward. It does not necessarily correspond to a real
			
 
				+world time, and it is not portable between systems.
			
 
				+
			
 
				+The *coarse monotonic time* is available from the set of
			
 
				+monotime_coarse_\* functions in compat_time.h. It is the same as
			
 
				+monotime_\* on some platforms. On others, it gives a monotonic timer
			
 
				+with less precision, but which it's more efficient to access.
			
 
				+
			
 
				+### Cached views of time. ###
			
 
				+
			
 
				+On some systems (like Linux), many time functions use a VDSO to avoid
			
 
				+the overhead of a system call.  But on other systems, gettimeofday()
			
 
				+and time() can be costly enough that you wouldn't want to call them
			
 
				+tens of thousands of times.  To get a recent, but not especially
			
 
				+accurate, view of the current time, see approx_time() and
			
 
				+tor_gettimeofday_cached().
			
 
				+
			
 
				+
			
 
				+### Parsing and encoding time values ###
			
 
				+
			
 
				+Tor has functions to parse and format time in these formats:
			
 
				+
			
 
				+   * RFC1123 format. ("Fri, 29 Sep 2006 15:54:20 GMT").  For this,
			
 
				+     use format_rfc1123_time() and parse_rfc1123_time.
			
 
				+
			
 
				+   * ISO8601 format. ("2006-10-29 10:57:20") For this, use
			
 
				+     format_local_iso_time and format_iso_time.  We also support the
			
 
				+     variant format "2006-10-29T10:57:20" with format_iso_time_nospace, and
			
 
				+     "2006-10-29T10:57:20.123456" with format_iso_time_nospace_usec.
			
 
				+
			
 
				+   * HTTP format collections (preferably "Mon, 25 Jul 2016 04:01:11
			
 
				+     GMT" or possibly "Wed Jun 30 21:49:08 1993" or even "25-Jul-16
			
 
				+     04:01:11 GMT"). For this, use parse_http_time.  Don't generate anything
			
 
				+     but the first format.
			
 
				+
			
 
				+Some of these functions use struct tm. You can use the standard
			
 
				+tor_localtime_r and tor_gmtime_r() to wrap these in a safe way. We
			
 
				+also have a tor_timegm() function.
			
 
				+
			
 
				+### Scheduling events ###
			
 
				+
			
 
				+The main way to schedule a not-too-frequent periodic event with
			
 
				+respect to the Tor mainloop is via the mechanism in periodic.c.
			
 
				+There's a big table of periodic_events in main.c, each of which gets
			
 
				+invoked on its own schedule.  You should not expect more than about
			
 
				+one second of accuracy with these timers.
			
 
				+
			
 
				+You can create an independent timer using libevent directly, or using
			
 
				+the periodic_timer_new() function.  But you should avoid doing this
			
 
				+for per-connection or per-circuit timers: Libevent's internal timer
			
 
				+implementation uses a min-heap, and those tend to start scaling poorly
			
 
				+once you have a few thousand entries.
			
 
				+
			
 
				+If you need to create a large number of fine-grained timers for some
			
 
				+purpose, you should consider the mechanism in src/common/timers.c,
			
 
				+which is optimized for the case where you have a large number of
			
 
				+timers with not-too-long duration, many of which will be deleted
			
 
				+before they actually expire. These timers should be reasonably
			
 
				+accurate within a handful of milliseconds -- possibly better on some
			
 
				+platforms.  (The timers.c module uses William Ahern's timeout.c
			
 
				+implementation as its backend, which is based on a hierarchical timing
			
 
				+wheel algorithm. It's cool stuff; check it out.)
			
--- a/doc/HACKING/design/01d-crypto.md
+++ b/doc/HACKING/design/01d-crypto.md
@@ -0,0 +1,169 @@
 
				+
			
 
				+## Lower-level cryptography functionality in Tor ##
			
 
				+
			
 
				+Generally speaking, Tor code shouldn't be calling OpenSSL (or any
			
 
				+other crypto library) directly.  Instead, we should indirect through
			
 
				+one of the functions in src/common/crypto\*.c or src/common/tortls.c.
			
 
				+
			
 
				+Cryptography functionality that's available is described below.
			
 
				+
			
 
				+### RNG facilities ###
			
 
				+
			
 
				+The most basic RNG capability in Tor is the crypto_rand() family of
			
 
				+functions. These currently use OpenSSL's RAND_() backend, but may use
			
 
				+something faster in the future.
			
 
				+
			
 
				+In addition to crypto_rand(), which fills in a buffer with random
			
 
				+bytes, we also have functions to produce random integers in certain
			
 
				+ranges; to produce random hostnames; to produce random doubles, etc.
			
 
				+
			
 
				+When you're creating a long-term cryptographic secret, you might want
			
 
				+to use crypto_strongest_rand() instead of crypto_rand().  It takes the
			
 
				+operating system's entropy source and combines it with output from
			
 
				+crypto_rand().  This is a pure paranoia measure, but it might help us
			
 
				+someday.
			
 
				+
			
 
				+You can use smartlist_choose() to pick a random element from a smartlist
			
 
				+and smartlist_shuffle() to randomize the order of a smartlist.  Both are
			
 
				+potentially a bit slow.
			
 
				+
			
 
				+### Cryptographic digests and related functions ###
			
 
				+
			
 
				+We treat digests as separate types based on the length of their
			
 
				+outputs.  We support one 160-bit digest (SHA1), two 256-bit digests
			
 
				+(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512).
			
 
				+
			
 
				+You should not use SHA1 for anything new.
			
 
				+
			
 
				+The crypto_digest\*() family of functions manipulates digests.  You
			
 
				+can either compute a digest of a chunk of memory all at once using
			
 
				+crypto_digest(), crypto_digest256(), or crypto_digest512().  Or you
			
 
				+can create a crypto_digest_t object with
			
 
				+crypto_digest{,256,512}_new(), feed information to it in chunks using
			
 
				+crypto_digest_add_bytes(), and then extract the final digest using
			
 
				+crypto_digest_get_digest().  You can copy the state of one of these
			
 
				+objects using crypto_digest_dup() or crypto_digest_assign().
			
 
				+
			
 
				+We support the HMAC hash-based message authentication code
			
 
				+instantiated using SHA256. See crypto_hmac_sha256.  (You should not
			
 
				+add any HMAC users with SHA1, and HMAC is not necessary with SHA3.)
			
 
				+
			
 
				+We also support the SHA3 cousins, SHAKE128 and SHAKE256.  Unlike
			
 
				+digests, these are extendable output functions (or XOFs) where you can
			
 
				+get any amount of output.  Use the crypto_xof_\*() functions to access
			
 
				+these.
			
 
				+
			
 
				+We have several ways to derive keys from cryptographically strong secret
			
 
				+inputs (like diffie-hellman outputs). The old
			
 
				+crypto_expand_key_material-TAP() performs an ad-hoc KDF based on SHA1 -- you
			
 
				+shouldn't use it for implementing anything but old versions of the Tor
			
 
				+protocol.  You can use HKDF-SHA256 (as defined in RFC5869) for more modern
			
 
				+protocols.  Also consider SHAKE256.
			
 
				+
			
 
				+If your input is potentially weak, like a password or passphrase, use a salt
			
 
				+along with the secret_to_key() functions as defined in crypto_s2k.c.  Prefer
			
 
				+scrypt over other hashing methods when possible.  If you're using a password
			
 
				+to encrypt something, see the "boxed file storage" section below.
			
 
				+
			
 
				+Finally, in order to store objects in hash tables, Tor includes the
			
 
				+randomized SipHash 2-4 function.  Call it via the siphash24g() function in
			
 
				+src/ext/siphash.h whenever you're creating a hashtable whose keys may be
			
 
				+manipulated by an attacker in order to DoS you with collisions.
			
 
				+
			
 
				+
			
 
				+### Stream ciphers ###
			
 
				+
			
 
				+You can create instances of a stream cipher using crypto_cipher_new().
			
 
				+These are stateful objects of type crypto_cipher_t.  Note that these
			
 
				+objects only support AES-128 right now; a future version should add
			
 
				+support for AES-128 and/or ChaCha20.
			
 
				+
			
 
				+You can encrypt/decrypt with crypto_cipher_encrypt or
			
 
				+crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs
			
 
				+an encryption without a copy.
			
 
				+
			
 
				+Note that sensible people should not use raw stream ciphers; they should
			
 
				+probably be using some kind of AEAD. Sorry.
			
 
				+
			
 
				+### Public key functionality ###
			
 
				+
			
 
				+We support four public key algorithms: DH1024, RSA, Curve25519, and
			
 
				+Ed25519.
			
 
				+
			
 
				+We support DH1024 over two prime groups.  You access these via the
			
 
				+crypto_dh_\*() family of functions.
			
 
				+
			
 
				+We support RSA in many bit sizes for signing and encryption.  You access
			
 
				+it via the crypto_pk_*() family of functions.  Note that a crypto_pk_t
			
 
				+may or may not include a private key.  See the crypto_pk_* functions in
			
 
				+crypto.c for a full list of functions here.
			
 
				+
			
 
				+For Curve25519 functionality, see the functions and types in
			
 
				+crypto_curve25519.c. Curve25519 is generally suitable for when you need
			
 
				+a secure fast elliptic-curve diffie hellman implementation. When
			
 
				+designing new protocols, prefer it over DH in Z_p.
			
 
				+
			
 
				+For Ed25519 functionality, see the functions and types in
			
 
				+crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast
			
 
				+elliptic curve signature method. For new protocols, prefer it over RSA
			
 
				+signatures.
			
 
				+
			
 
				+### Metaformats for storage ###
			
 
				+
			
 
				+When OpenSSL manages the storage of some object, we use whatever format
			
 
				+OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding
			
 
				+that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----".
			
 
				+
			
 
				+When we manage the storage of some cryptographic object, we prefix the
			
 
				+object with 32-byte NUL-padded prefix in order to avoid accidental
			
 
				+object confusion; see the crypto_read_tagged_contents_from_file() and
			
 
				+crypto_write_tagged_contents_to_file() functions for manipulating
			
 
				+these. The prefix is "== type: tag ==", where type describes the object
			
 
				+and its encoding, and tag indicates which one it is.
			
 
				+
			
 
				+### Boxed-file storage ###
			
 
				+
			
 
				+When managing keys, you frequently want to have some way to write a
			
 
				+secret object to disk, encrypted with a passphrase.  The crypto_pwbox
			
 
				+and crypto_unpwbox functions do so in a way that's likely to be
			
 
				+readable by future versions of Tor.
			
 
				+
			
 
				+### Certificates ###
			
 
				+
			
 
				+We have, alas, several certificate types in Tor.
			
 
				+
			
 
				+The tor_x509_cert_t type represents an X.509 certificate. This document
			
 
				+won't explain X.509 to you -- possibly, no document can. (OTOH, Peter
			
 
				+Gutmann's "x.509 style guide", though severely dated, does a good job of
			
 
				+explaining how awful x.509 can be.)  Do not introduce any new usages of
			
 
				+X.509. Right now we only use it in places where TLS forces us to do so.
			
 
				+
			
 
				+The authority_cert_t type is used only for directory authority keys. It
			
 
				+has a medium-term signing key (which the authorities actually keep
			
 
				+online) signed by a long-term identity key (which the authority operator
			
 
				+had really better be keeping offline).  Don't use it for any new kind of
			
 
				+certificate.
			
 
				+
			
 
				+For new places where you need a certificate, consider tor_cert_t: it
			
 
				+represents a typed and dated _something_ signed by an Ed25519 key.  The
			
 
				+format is described in tor-spec. Unlike x.509, you can write it on a
			
 
				+napkin.
			
 
				+
			
 
				+(Additionally, the Tor directory design uses a fairly wide variety of
			
 
				+documents that include keys and which are signed by keys. You can
			
 
				+consider these documents to be an additional kind of certificate if you
			
 
				+want.)
			
 
				+
			
 
				+### TLS ###
			
 
				+
			
 
				+Tor's TLS implementation is more tightly coupled to OpenSSL than we'd
			
 
				+prefer.  You can read most of it in tortls.c.
			
 
				+
			
 
				+Unfortunately, TLS's state machine and our requirement for nonblocking
			
 
				+IO support means that using TLS in practice is a bit hairy, since
			
 
				+logical writes can block on a physical reads, and vice versa.
			
 
				+
			
 
				+If you are lucky, you will never have to look at the code here.
			
 
				+
			
 
				+
			
 
				+
			
--- a/doc/HACKING/design/01e-os-compat.md
+++ b/doc/HACKING/design/01e-os-compat.md
@@ -0,0 +1,50 @@
 
				+
			
 
				+## OS compatibility functions ##
			
 
				+
			
 
				+We've got a bunch of functions to wrap differences between various
			
 
				+operating systems where we run.
			
 
				+
			
 
				+### The filesystem ###
			
 
				+
			
 
				+We wrap the most important filesystem functions with load-file,
			
 
				+save-file, and map-file abstractions declared in util.c or compat.c.  If
			
 
				+you're messing about with file descriptors yourself, you might be doing
			
 
				+things wrong.  Most of the time, write_str_to_file() and
			
 
				+read_str_from_file() are all you need.
			
 
				+
			
 
				+Use the check_private_directory() function to create or verify the
			
 
				+presence of directories, and tor_listdir() to list the files in a
			
 
				+directory.
			
 
				+
			
 
				+Those modules also have functions for manipulating paths a bit.
			
 
				+
			
 
				+### Networking ###
			
 
				+
			
 
				+Nearly all the world is on a Berkeley sockets API, except for
			
 
				+windows, whose version of the Berkeley API was corrupted by late-90s
			
 
				+insistence on backward compatibility with the
			
 
				+sort-of-berkeley-sort-of-not add-on *thing* that was WinSocks.
			
 
				+
			
 
				+What's more, everybody who implemented sockets realized that select()
			
 
				+wasn't a very good way to do nonblocking IO... and then the various
			
 
				+implementations all decided to so something different.
			
 
				+
			
 
				+You can forget about most of these differences, fortunately: We use
			
 
				+libevent to hide most of the differences between the various networking
			
 
				+backends, and we add a few of our own functions to hide the differences
			
 
				+that Libevent doesn't.
			
 
				+
			
 
				+To create a network connection, the right level of abstraction to look
			
 
				+at is probably the connection_t system in connection.c.  Most of the
			
 
				+lower level work has already been done for you.  If you need to
			
 
				+instantiate something that doesn't fit well with connection_t, you
			
 
				+should see whether you can instantiate it with connection_t anyway -- or
			
 
				+you might need to refactor connection.c a little.
			
 
				+
			
 
				+Whenever possible, represent network addresses as tor_addr_t.
			
 
				+
			
 
				+### Process launch and monitoring ###
			
 
				+
			
 
				+Launching and/or monitoring a process is tricky business. You can use
			
 
				+the mechanisms in procmon.c and tor_spawn_background(), but they're both
			
 
				+a bit wonky.  A refactoring would not be out of order.
			
--- a/doc/HACKING/design/01f-threads.md
+++ b/doc/HACKING/design/01f-threads.md
@@ -0,0 +1,26 @@
 
				+
			
 
				+## Threads in Tor ##
			
 
				+
			
 
				+Tor is based around a single main thread and one or more worker
			
 
				+threads.  We aim (with middling success) to use worker threads for
			
 
				+CPU-intensive activities and the main thread for our networking.
			
 
				+Fortunately (?) we have enough cryptography that moving what we can of the
			
 
				+cryptographic processes to the workers should achieve good parallelism under most
			
 
				+loads.  Unfortunately, we only have a small fraction of our
			
 
				+cryptography done in our worker threads right now.
			
 
				+
			
 
				+Our threads-and-workers abstraction is defined in workqueue.c, which
			
 
				+combines a work queue with a thread pool, and integrates the
			
 
				+signalling with libevent.  Tor main instance of a work queue is
			
 
				+instantiated in cpuworker.c.  It will probably need some refactoring
			
 
				+as more types of work are added.
			
 
				+
			
 
				+On a lower level, we provide locks with tor_mutex_t, conditions with
			
 
				+tor_cond_t, and thread-local storage with tor_threadlocal_t, all of
			
 
				+which are specified in compat_threads.h and implemented in an OS-
			
 
				+specific compat_\*threads.h module.
			
 
				+
			
 
				+Try to minimize sharing between threads: it is usually best to simply
			
 
				+make the worker "own" all the data it needs while the work is in
			
 
				+progress, and to give up ownership when it's complete.
			
 
				+
			
--- a/doc/HACKING/design/01g-strings.md
+++ b/doc/HACKING/design/01g-strings.md
@@ -0,0 +1,95 @@
 
				+
			
 
				+## String processing in Tor ##
			
 
				+
			
 
				+Since you're reading about a C program, you probably expected this
			
 
				+section: it's full of functions for manipulating the (notoriously
			
 
				+dubious) C string abstraction.  I'll describe some often-missed
			
 
				+highlights here.
			
 
				+
			
 
				+### Comparing strings and memory chunks ###
			
 
				+
			
 
				+We provide strcmpstart() and strcmpend() to perform a strcmp with the start
			
 
				+or end of a string.
			
 
				+
			
 
				+	tor_assert(!strcmpstart("Hello world","Hello"));
			
 
				+	tor_assert(!strcmpend("Hello world","world"));
			
 
				+
			
 
				+	tor_assert(!strcasecmpstart("HELLO WORLD","Hello"));
			
 
				+	tor_assert(!strcasecmpend("HELLO WORLD","world"));
			
 
				+
			
 
				+To compare two string pointers, either of which might be NULL, use
			
 
				+strcmp_opt().
			
 
				+
			
 
				+To search for a string or a chunk of memory within a non-null
			
 
				+terminated memory block, use tor_memstr or tor_memmem respectively.
			
 
				+
			
 
				+We avoid using memcmp() directly, since it tends to be used in cases
			
 
				+when having a constant-time operation would be better.  Instead, we
			
 
				+recommend tor_memeq() and tor_memneq() for when you need a
			
 
				+constant-time operation.  In cases when you need a fast comparison,
			
 
				+and timing leaks are not a danger, you can use fast_memeq() and
			
 
				+fast_memneq().
			
 
				+
			
 
				+It's a common pattern to take a string representing one or more lines
			
 
				+of text, and search within it for some other string, at the start of a
			
 
				+line.  You could search for "\\ntarget", but that would miss the first
			
 
				+line.  Instead, use find_str_at_start_of_line.
			
 
				+
			
 
				+### Parsing text ###
			
 
				+
			
 
				+Over the years, we have accumulated lots of ways to parse text --
			
 
				+probably too many. Refactoring them to be safer and saner could be a
			
 
				+good project!  The one that seems most error-resistant is tokenizing
			
 
				+text with smartlist_split_strings().  This function takes a smartlist,
			
 
				+a string, and a separator, and splits the string along occurrences of
			
 
				+the separator, adding new strings for the sub-elements to the given
			
 
				+smartlist.
			
 
				+
			
 
				+To handle time, you can use one of the functions mentioned above in
			
 
				+"Parsing and encoding time values".
			
 
				+
			
 
				+For numbers in general, use the tor_parse_{long,ulong,double,uint64}
			
 
				+family of functions.  Each of these can be called in a few ways.  The
			
 
				+most general is as follows:
			
 
				+
			
 
				+      const int BASE = 10;
			
 
				+      const int MINVAL = 10, MAXVAL = 10000;
			
 
				+      const char *next;
			
 
				+      int ok;
			
 
				+      long lng = tor_parse_long("100", BASE, MINVAL, MAXVAL, &ok, &next);
			
 
				+
			
 
				+The return value should be ignored if "ok" is set to false.  The input
			
 
				+string needs to contain an entire number, or it's considered
			
 
				+invalid... unless the "next" pointer is available, in which case extra
			
 
				+characters at the end are allowed, and "next" is set to point to the
			
 
				+first such character.
			
 
				+
			
 
				+### Generating blocks of text ###
			
 
				+
			
 
				+For not-too-large blocks of text, we provide tor_asprintf(), which
			
 
				+behaves like other members of the sprintf() family, except that it
			
 
				+always allocates enough memory on the heap for its output.
			
 
				+
			
 
				+For larger blocks: Rather than using strlcat and strlcpy to build
			
 
				+text, or keeping pointers to the interior of a memory block, we
			
 
				+recommend that you use the smartlist_* functions to build a smartlist
			
 
				+full of substrings in order.  Then you can concatenate them into a
			
 
				+single string with smartlist_join_strings(), which also takes optional
			
 
				+separator and terminator arguments.
			
 
				+
			
 
				+As a convenience, we provide smartlist_add_asprintf(), which combines
			
 
				+the two methods above together.  Many of the cryptographic digest
			
 
				+functions also accept a not-yet-concatenated smartlist of strings.
			
 
				+
			
 
				+### Logging helpers ###
			
 
				+
			
 
				+Often we'd like to log a value that comes from an untrusted source.
			
 
				+To do this, use escaped() to escape the nonprintable characters and
			
 
				+other confusing elements in a string, and surround it by quotes.  (Use
			
 
				+esc_for_log() if you need to allocate a new string.)
			
 
				+
			
 
				+It's also handy to put memory chunks into hexadecimal before logging;
			
 
				+you can use hex_str(memory, length) for that.
			
 
				+
			
 
				+The escaped() and hex_str() functions both provide outputs that are
			
 
				+only valid till they are next invoked; they are not threadsafe.
			
--- a/doc/HACKING/design/02-dataflow.md
+++ b/doc/HACKING/design/02-dataflow.md
@@ -0,0 +1,236 @@
 
				+
			
 
				+## Data flow in the Tor process ##
			
 
				+
			
 
				+We read bytes from the network, we write bytes to the network.  For the
			
 
				+most part, the bytes we write correspond roughly to bytes we have read,
			
 
				+with bits of cryptography added in.
			
 
				+
			
 
				+The rest is a matter of details.
			
 
				+
			
 
				+![Diagram of main data flows in Tor](./diagrams/02/02-dataflow.png "Diagram of main data flows in Tor")
			
 
				+
			
 
				+### Connections and buffers: reading, writing, and interpreting. ###
			
 
				+
			
 
				+At a low level, Tor's networking code is based on "connections".  Each
			
 
				+connection represents an object that can send or receive network-like
			
 
				+events.  For the most part, each connection has a single underlying TCP
			
 
				+stream (I'll discuss counterexamples below).
			
 
				+
			
 
				+A connection that behaves like a TCP stream has an input buffer and an
			
 
				+output buffer.  Incoming data is
			
 
				+written into the input buffer ("inbuf"); data to be written to the
			
 
				+network is queued on an output buffer ("outbuf").
			
 
				+
			
 
				+Buffers are implemented in buffers.c.  Each of these buffers is
			
 
				+implemented as a linked queue of memory extents, in the style of classic
			
 
				+BSD mbufs, or Linux skbufs.
			
 
				+
			
 
				+A connection's reading and writing can be enabled or disabled.  Under
			
 
				+the hood, this functionality is implemented using libevent events: one
			
 
				+for reading, one for writing.  These events are turned on/off in
			
 
				+main.c, in the functions connection_{start,stop}_{reading,writing}.
			
 
				+
			
 
				+When a read or write event is turned on, the main libevent loop polls
			
 
				+the kernel, asking which sockets are ready to read or write.  (This
			
 
				+polling happens in the event_base_loop() call in run_main_loop_once()
			
 
				+in main.c.)  When libevent finds a socket that's ready to read or write,
			
 
				+it invokes conn_{read,write}_callback(), also in main.c
			
 
				+
			
 
				+These callback functions delegate to connection_handle_read() and
			
 
				+connection_handle_write() in connection.c, which read or write on the
			
 
				+network as appropriate, possibly delegating to openssl.
			
 
				+
			
 
				+After data is read or written, or other event occurs, these
			
 
				+connection_handle_read_write() functions call logic functions whose job is
			
 
				+to respond to the information.  Some examples included:
			
 
				+
			
 
				+   * connection_flushed_some() -- called after a connection writes any
			
 
				+     amount of data from its outbuf.
			
 
				+   * connection_finished_flushing() -- called when a connection has
			
 
				+     emptied its outbuf.
			
 
				+   * connection_finished_connecting() -- called when an in-process connection
			
 
				+     finishes making a remote connection.
			
 
				+   * connection_reached_eof() -- called after receiving a FIN from the
			
 
				+     remote server.
			
 
				+   * connection_process_inbuf() -- called when more data arrives on
			
 
				+     the inbuf.
			
 
				+
			
 
				+These functions then call into specific implementations depending on
			
 
				+the type of the connection.  For example, if the connection is an
			
 
				+edge_connection_t, connection_reached_eof() will call
			
 
				+connection_edge_reached_eof().
			
 
				+
			
 
				+> **Note:** "Also there are bufferevents!"  We have vestigial
			
 
				+> code for an alternative low-level networking
			
 
				+> implementation, based on Libevent's evbuffer and bufferevent
			
 
				+> code.  These two object types take on (most of) the roles of
			
 
				+> buffers and connections respectively. It isn't working in today's
			
 
				+> Tor, due to code rot and possible lingering libevent bugs.  More
			
 
				+> work is needed; it would be good to get this working efficiently
			
 
				+> again, to have IOCP support on Windows.
			
 
				+
			
 
				+
			
 
				+#### Controlling connections ####
			
 
				+
			
 
				+A connection can have reading or writing enabled or disabled for a
			
 
				+wide variety of reasons, including:
			
 
				+
			
 
				+   * Writing is disabled when there is no more data to write
			
 
				+   * For some connection types, reading is disabled when the inbuf is
			
 
				+     too full.
			
 
				+   * Reading/writing is temporarily disabled on connections that have
			
 
				+     recently read/written enough data up to their bandwidth 
			
 
				+   * Reading is disabled on connections when reading more data from them
			
 
				+     would require that data to be buffered somewhere else that is
			
 
				+     already full.
			
 
				+
			
 
				+Currently, these conditions are checked in a diffuse set of
			
 
				+increasingly complex conditional expressions.  In the future, it could
			
 
				+be helpful to transition to a unified model for handling temporary
			
 
				+read/write suspensions.
			
 
				+
			
 
				+#### Kinds of connections ####
			
 
				+
			
 
				+Today Tor has the following connection and pseudoconnection types.
			
 
				+For the most part, each type of channel has an associated C module
			
 
				+that implements its underlying logic.
			
 
				+
			
 
				+**Edge connections** receive data from and deliver data to points
			
 
				+outside the onion routing network.  See `connection_edge.c`. They fall into two types:
			
 
				+
			
 
				+**Entry connections** are a type of edge connection. They receive data
			
 
				+from the user running a Tor client, and deliver data to that user.
			
 
				+They are used to implement SOCKSPort, TransPort, NATDPort, and so on.
			
 
				+Sometimes they are called "AP" connections for historical reasons (it
			
 
				+used to stand for "Application Proxy").
			
 
				+
			
 
				+**Exit connections** are a type of edge connection. They exist at an
			
 
				+exit node, and transmit traffic to and from the network.
			
 
				+
			
 
				+(Entry connections and exit connections are also used as placeholders
			
 
				+when performing a remote DNS request; they are not decoupled from the
			
 
				+notion of "stream" in the Tor protocol. This is implemented partially
			
 
				+in `connection_edge.c`, and partially in `dnsserv.c` and `dns.c`.)
			
 
				+
			
 
				+**OR connections** send and receive Tor cells over TLS, using some
			
 
				+version of the Tor link protocol.  Their implementation is spread
			
 
				+across `connection_or.c`, with a bit of logic in `command.c`,
			
 
				+`relay.c`, and `channeltls.c`.
			
 
				+
			
 
				+**Extended OR connections** are a type of OR connection for use on
			
 
				+bridges using pluggable transports, so that the PT can tell the bridge
			
 
				+some information about the incoming connection before passing on its
			
 
				+data.  They are implemented in `ext_orport.c`.
			
 
				+
			
 
				+**Directory connections** are server-side or client-side connections
			
 
				+that implement Tor's HTTP-based directory protocol.  These are
			
 
				+instantiated using a socket when Tor is making an unencrypted HTTP
			
 
				+connection.  When Tor is tunneling a directory request over a Tor
			
 
				+circuit, directory connections are implemented using a linked
			
 
				+connection pair (see below).  Directory connections are implemented in
			
 
				+`directory.c`; some of the server-side logic is implemented in
			
 
				+`dirserver.c`.
			
 
				+
			
 
				+**Controller connections** are local connections to a controller
			
 
				+process implementing the controller protocol from
			
 
				+control-spec.txt. These are in `control.c`.
			
 
				+
			
 
				+**Listener connections** are not stream oriented!  Rather, they wrap a
			
 
				+listening socket in order to detect new incoming connections.  They
			
 
				+bypass most of stream logic.  They don't have associated buffers.
			
 
				+They are implemented in `connection.c`.
			
 
				+
			
 
				+![structure hierarchy for connection types](./diagrams/02/02-connection-types.png "structure hierarchy for connection types")
			
 
				+
			
 
				+>**Note**: "History Time!" You might occasionally find reference to a couple types of connections
			
 
				+> which no longer exist in modern Tor.  A *CPUWorker connection*
			
 
				+>connected the main Tor process to a thread or process used for
			
 
				+>computation.  (Nowadays we use in-process communication.)  Even more
			
 
				+>anciently, a *DNSWorker connection* connected the main tor process to
			
 
				+>a separate thread or process used for running `gethostbyname()` or
			
 
				+>`getaddrinfo()`.  (Nowadays we use Libevent's evdns facility to
			
 
				+>perform DNS requests asynchronously.)
			
 
				+
			
 
				+#### Linked connections ####
			
 
				+
			
 
				+Sometimes two channels are joined together, such that data which the
			
 
				+Tor process sends on one should immediately be received by the same
			
 
				+Tor process on the other.  (For example, when Tor makes a tunneled
			
 
				+directory connection, this is implemented on the client side as a
			
 
				+directory connection whose output goes, not to the network, but to a
			
 
				+local entry connection. And when a directory receives a tunnelled
			
 
				+directory connection, this is implemented as an exit connection whose
			
 
				+output goes, not to the network, but to a local directory connection.)
			
 
				+
			
 
				+The earliest versions of Tor to support linked connections used
			
 
				+socketpairs for the purpose.  But using socketpairs forced us to copy
			
 
				+data through kernelspace, and wasted limited file descriptors.  So
			
 
				+instead, a pair of connections can be linked in-process.  Each linked
			
 
				+connection has a pointer to the other, such that data written on one
			
 
				+is immediately readable on the other, and vice versa.
			
 
				+
			
 
				+### From connections to channels ###
			
 
				+
			
 
				+There's an abstraction layer above OR connections (the ones that
			
 
				+handle cells) and below cells called **Channels**.  A channel's
			
 
				+purpose is to transmit authenticated cells from one Tor instance
			
 
				+(relay or client) to another.
			
 
				+
			
 
				+Currently, only one implementation exists: Channel_tls, which sends
			
 
				+and receiveds cells over a TLS-based OR connection.
			
 
				+
			
 
				+Cells are sent on a channel using
			
 
				+`channel_write_{,packed_,var_}cell()`. Incoming cells arrive on a
			
 
				+channel from its backend using `channel_queue*_cell()`, and are
			
 
				+immediately processed using `channel_process_cells()`.
			
 
				+
			
 
				+Some cell types are handled below the channel layer, such as those
			
 
				+that affect handshaking only.  And some others are passed up to the
			
 
				+generic cross-channel code in `command.c`: cells like `DESTROY` and
			
 
				+`CREATED` are all trivial to handle.  But relay cells
			
 
				+require special handling...
			
 
				+
			
 
				+### From channels through circuits ###
			
 
				+
			
 
				+When a relay cell arrives on an existing circuit, it is handled in
			
 
				+`circuit_receive_relay_cell()` -- one of the innermost functions in
			
 
				+Tor.  This function encrypts or decrypts the relay cell as
			
 
				+appropriate, and decides whether the cell is intended for the current
			
 
				+hop of the circuit.
			
 
				+
			
 
				+If the cell *is* intended for the current hop, we pass it to
			
 
				+`connection_edge_process_relay_cell()` in `relay.c`, which acts on it
			
 
				+based on its relay command, and (possibly) queues its data on an
			
 
				+`edge_connection_t`.
			
 
				+
			
 
				+If the cell *is not* intended for the current hop, we queue it for the
			
 
				+next channel in sequence with `append cell_to_circuit_queue()`.  This
			
 
				+places the cell on a per-circuit queue for cells headed out on that
			
 
				+particular channel.
			
 
				+
			
 
				+### Sending cells on circuits: the complicated bit. ###
			
 
				+
			
 
				+Relay cells are queued onto circuits from one of two (main) sources:
			
 
				+reading data from edge connections, and receiving a cell to be relayed
			
 
				+on a circuit.  Both of these sources place their cells on cell queue:
			
 
				+each circuit has one cell queue for each direction that it travels.
			
 
				+
			
 
				+A naive implementation would skip using cell queues, and instead write
			
 
				+each outgoing relay cell.  (Tor did this in its earlier versions.)
			
 
				+But such an approach tends to give poor performance, because it allows
			
 
				+high-volume circuits to clog channels, and it forces the Tor server to
			
 
				+send data queued on a circuit even after that circuit has been closed.
			
 
				+
			
 
				+So by using queues on each circuit, we can add cells to each channel
			
 
				+on a just-in-time basis, choosing the cell at each moment based on
			
 
				+a performance-aware algorithm.
			
 
				+
			
 
				+This logic is implemented in two main modules: `scheduler.c` and
			
 
				+`circuitmux*.c`.  The scheduler code is responsible for determining
			
 
				+globally, across all channels that could write cells, which one should
			
 
				+next receive queued cells.  The circuitmux code determines, for all
			
 
				+of the circuits with queued cells for a channel, which one should
			
 
				+queue the next cell.
			
 
				+
			
 
				+(This logic applies to outgoing relay cells only; incoming relay cells
			
 
				+are processed as they arrive.)
			
--- a/doc/HACKING/design/03-modules.md
+++ b/doc/HACKING/design/03-modules.md
@@ -0,0 +1,247 @@
 
				+
			
 
				+## Tor's modules ##
			
 
				+
			
 
				+### Generic modules ###
			
 
				+
			
 
				+`buffers.c`
			
 
				+: Implements the `buf_t` buffered data type for connections, and several
			
 
				+low-level data handling functions to handle network protocols on it.
			
 
				+
			
 
				+`channel.c`
			
 
				+: Generic channel implementation. Channels handle sending and receiving cells
			
 
				+among tor nodes.
			
 
				+
			
 
				+`channeltls.c`
			
 
				+: Channel implementation for TLS-based OR connections. Uses `connection_or.c`.
			
 
				+
			
 
				+`circuitbuild.c`
			
 
				+: Code for constructing circuits and choosing their paths.  (*Note*:
			
 
				+this module could plausibly be split into handling the client side,
			
 
				+the server side, and the path generation aspects of circuit building.)
			
 
				+
			
 
				+`circuitlist.c`
			
 
				+: Code for maintaining and navigating the global list of circuits.
			
 
				+
			
 
				+`circuitmux.c`
			
 
				+: Generic circuitmux implementation. A circuitmux handles deciding, for a
			
 
				+particular channel, which circuit should write next.
			
 
				+
			
 
				+`circuitmux_ewma.c`
			
 
				+: A circuitmux implementation based on the EWMA (exponentially
			
 
				+weighted moving average) algorithm.
			
 
				+
			
 
				+`circuituse.c`
			
 
				+: Code to actually send and receive data on circuits.
			
 
				+
			
 
				+`command.c`
			
 
				+: Handles incoming cells on channels.
			
 
				+
			
 
				+`config.c`
			
 
				+: Parses options from torrc, and uses them to configure the rest of Tor.
			
 
				+
			
 
				+`confparse.c`
			
 
				+: Generic torrc-style parser. Used to parse torrc and state files.
			
 
				+
			
 
				+`connection.c`
			
 
				+: Generic and common connection tools, and implementation for the simpler
			
 
				+connection types.
			
 
				+
			
 
				+`connection_edge.c`
			
 
				+: Implementation for entry and exit connections.
			
 
				+
			
 
				+`connection_or.c`
			
 
				+: Implementation for OR connections (the ones that send cells over TLS).
			
 
				+
			
 
				+`main.c`
			
 
				+: Principal entry point, main loops, scheduled events, and network
			
 
				+management for Tor.
			
 
				+
			
 
				+`ntmain.c`
			
 
				+: Implements Tor as a Windows service. (Not very well.)
			
 
				+
			
 
				+`onion.c`
			
 
				+: Generic code for generating and responding to CREATE and CREATED
			
 
				+cells, and performing the appropriate onion handshakes. Also contains
			
 
				+code to manage the server-side onion queue.
			
 
				+
			
 
				+`onion_fast.c`
			
 
				+: Implements the old SHA1-based CREATE_FAST/CREATED_FAST circuit
			
 
				+creation handshake. (Now deprecated.)
			
 
				+
			
 
				+`onion_ntor.c`
			
 
				+: Implements the Curve25519-based NTOR circuit creation handshake.
			
 
				+
			
 
				+`onion_tap.c`
			
 
				+: Implements the old RSA1024/DH1024-based TAP circuit creation handshake. (Now
			
 
				+deprecated.)
			
 
				+
			
 
				+`relay.c`
			
 
				+: Handles particular types of relay cells, and provides code to receive,
			
 
				+encrypt, route, and interpret relay cells.
			
 
				+
			
 
				+`scheduler.c`
			
 
				+: Decides which channel/circuit pair is ready to receive the next cell.
			
 
				+
			
 
				+`statefile.c`
			
 
				+: Handles loading and storing Tor's state file.
			
 
				+
			
 
				+`tor_main.c`
			
 
				+: Contains the actual `main()` function.  (This is placed in a separate
			
 
				+file so that the unit tests can have their own `main()`.)
			
 
				+
			
 
				+
			
 
				+### Node-status modules ###
			
 
				+
			
 
				+`directory.c`
			
 
				+: Implements the HTTP-based directory protocol, including sending,
			
 
				+receiving, and handling most request types.  (*Note*: The client parts
			
 
				+of this, and the generic-HTTP parts of this, could plausibly be split
			
 
				+off.)
			
 
				+
			
 
				+`microdesc.c`
			
 
				+: Implements the compact "microdescriptor" format for keeping track of
			
 
				+what we know about a router.
			
 
				+
			
 
				+`networkstatus.c`
			
 
				+: Code for fetching, storing, and interpreting consensus vote documents.
			
 
				+
			
 
				+`nodelist.c`
			
 
				+: Higher-level view of our knowledge of which Tor servers exist.  Each
			
 
				+`node_t` corresponds to a router we know about.
			
 
				+
			
 
				+`routerlist.c`
			
 
				+: Code for storing and retrieving router descriptors and extrainfo
			
 
				+documents.
			
 
				+
			
 
				+`routerparse.c`
			
 
				+: Generic and specific code for parsing all Tor directory information
			
 
				+types.
			
 
				+
			
 
				+`routerset.c`
			
 
				+: Parses and interprets a specification for a set of routers (by IP
			
 
				+range, fingerprint, nickname (deprecated), or country).
			
 
				+
			
 
				+
			
 
				+### Client modules ###
			
 
				+
			
 
				+`addressmap.c`
			
 
				+: Handles client-side associations between one address and another.
			
 
				+These are used to implement client-side DNS caching (NOT RECOMMENDED),
			
 
				+MapAddress directives, Automapping, and more.
			
 
				+
			
 
				+`circpathbias.c`
			
 
				+: Path bias attack detection for circuits: tracks whether
			
 
				+connections made through a particular guard have an unusually high failure rate.
			
 
				+
			
 
				+`circuitstats.c`
			
 
				+: Code to track circuit performance statistics in order to adapt our behavior.
			
 
				+Notably includes an algorithm to track circuit build times.
			
 
				+
			
 
				+`dnsserv.c`
			
 
				+: Implements DNSPort for clients. (Note that in spite of the word
			
 
				+"server" in this module's name, it is used for Tor clients.  It
			
 
				+implements a DNS server, not DNS for servers.)
			
 
				+
			
 
				+`entrynodes.c`
			
 
				+: Chooses, monitors, and remembers guard nodes.  Also contains some
			
 
				+bridge-related code.
			
 
				+
			
 
				+`torcert.c`
			
 
				+: Code to interpret and generate Ed25519-based certificates.
			
 
				+
			
 
				+### Server modules ###
			
 
				+
			
 
				+`dns.c`
			
 
				+: Server-side DNS code.  Handles sending and receiving DNS requests on
			
 
				+exit nodes, and implements the server-side DNS cache.
			
 
				+
			
 
				+`dirserv.c`
			
 
				+: Implements part of directory caches that handles responding to
			
 
				+client requests.
			
 
				+
			
 
				+`ext_orport.c`
			
 
				+: Implements the extended ORPort protocol for communication between
			
 
				+server-side pluggable transports and Tor servers.
			
 
				+
			
 
				+`hibernate.c`
			
 
				+: Performs bandwidth accounting, and puts Tor relays into hibernation
			
 
				+when their bandwidth is exhausted.
			
 
				+
			
 
				+`router.c`
			
 
				+: Management code for running a Tor server. In charge of RSA key
			
 
				+maintenance, descriptor generation and uploading.
			
 
				+
			
 
				+`routerkeys.c`
			
 
				+: Key handling code for a Tor server. (Currently handles only the
			
 
				+Ed25519 keys, but the RSA keys could be moved here too.)
			
 
				+
			
 
				+
			
 
				+### Onion service modules ###
			
 
				+
			
 
				+`rendcache.c`
			
 
				+: Stores onion service descriptors.
			
 
				+
			
 
				+`rendclient.c`
			
 
				+: Client-side implementation of the onion service protocol.
			
 
				+
			
 
				+`rendcommon.c`
			
 
				+: Parts of the onion service protocol that are shared by clients,
			
 
				+services, and/or Tor servers.
			
 
				+
			
 
				+`rendmid.c`
			
 
				+: Tor-server-side implementation of the onion service protocol. (Handles
			
 
				+acting as an introduction point or a rendezvous point.)
			
 
				+
			
 
				+`rendservice.c`
			
 
				+: Service-side implementation of the onion service protocol.
			
 
				+
			
 
				+`replaycache.c`
			
 
				+: Backend to check introduce2 requests for replay attempts.
			
 
				+
			
 
				+
			
 
				+### Authority modules ###
			
 
				+
			
 
				+`dircollate.c`
			
 
				+: Helper for `dirvote.c`: Given a set of votes, each containing a list
			
 
				+of Tor nodes, determines which entries across all the votes correspond
			
 
				+to the same nodes, and yields them in a useful order.
			
 
				+
			
 
				+`dirvote.c`
			
 
				+: Implements the directory voting algorithms that authorities use.
			
 
				+
			
 
				+`keypin.c`
			
 
				+: Implements a persistent key-pinning mechanism to tie RSA1024
			
 
				+identities to ed25519 identities.
			
 
				+
			
 
				+### Miscellaneous modules ###
			
 
				+
			
 
				+`control.c`
			
 
				+: Implements the Tor controller protocol.
			
 
				+
			
 
				+`cpuworker.c`
			
 
				+: Implements the inner work queue function.  We use this to move the
			
 
				+work of circuit creation (on server-side) to other CPUs.
			
 
				+
			
 
				+`fp_pair.c`
			
 
				+: Types for handling 2-tuples of 20-byte fingerprints.
			
 
				+
			
 
				+`geoip.c`
			
 
				+: Parses geoip files (which map IP addresses to country codes), and
			
 
				+performs lookups on the internal geoip table.  Also stores some
			
 
				+geoip-related statistics.
			
 
				+
			
 
				+`policies.c`
			
 
				+: Parses and implements Tor exit policies.
			
 
				+
			
 
				+`reasons.c`
			
 
				+: Maps internal reason-codes to human-readable strings.
			
 
				+
			
 
				+`rephist.c`
			
 
				+: Tracks Tor servers' performance over time.
			
 
				+
			
 
				+`status.c`
			
 
				+: Writes periodic "heartbeat" status messages about the state of the Tor
			
 
				+process.
			
 
				+
			
 
				+`transports.c`
			
 
				+: Implements management for the pluggable transports subsystem.
			
--- a/doc/HACKING/design/Makefile
+++ b/doc/HACKING/design/Makefile
@@ -0,0 +1,34 @@
 
				+
			
 
				+
			
 
				+
			
 
				+HTML= \
			
 
				+	00-overview.html \
			
 
				+	01-common-utils.html \
			
 
				+	01a-memory.html \
			
 
				+	01b-collections.html \
			
 
				+	01c-time.html \
			
 
				+	01d-crypto.html \
			
 
				+	01e-os-compat.html \
			
 
				+	01f-threads.html \
			
 
				+	01g-strings.html \
			
 
				+	02-dataflow.html \
			
 
				+	03-modules.html \
			
 
				+	this-not-that.html
			
 
				+
			
 
				+PNG = \
			
 
				+	diagrams/02/02-dataflow.png \
			
 
				+	diagrams/02/02-connection-types.png
			
 
				+
			
 
				+all: generated
			
 
				+
			
 
				+generated: $(HTML) $(PNG)
			
 
				+
			
 
				+%.html: %.md
			
 
				+	maruku $< -o $@
			
 
				+
			
 
				+%.png: %.dia
			
 
				+	dia $< --export=$@
			
 
				+
			
 
				+clean:
			
 
				+	rm -f $(HTML)
			
 
				+	rm -f $(PNG)
			
--- a/doc/HACKING/design/this-not-that.md
+++ b/doc/HACKING/design/this-not-that.md
@@ -0,0 +1,51 @@
 
				+
			
 
				+Don't use memcmp.  Use {tor,fast}_{memeq,memneq,memcmp}.
			
 
				+
			
 
				+Don't use assert.  Use tor_assert or tor_assert_nonfatal or BUG.  Prefer
			
 
				+nonfatal assertions or BUG()s.
			
 
				+
			
 
				+Don't use sprintf or snprintf.  Use tor_asprintf or tor_snprintf.
			
 
				+
			
 
				+Don't write hand-written binary parsers.  Use trunnel.
			
 
				+
			
 
				+Don't use malloc, realloc, calloc, free, strdup, etc. Use tor_malloc,
			
 
				+tor_realloc, tor_calloc, tor_free, tor_strdup, etc.
			
 
				+
			
 
				+Don't use tor_realloc(x, y\*z). Use tor_reallocarray(x, y, z);
			
 
				+
			
 
				+Don't say "if (x) foo_free(x)".  Just foo_free(x) and make sure that
			
 
				+foo_free(NULL) is a no-op.
			
 
				+
			
 
				+Don't use toupper or tolower; use TOR_TOUPPER and TOR_TOLOWER.
			
 
				+
			
 
				+Don't use isalpha, isalnum, etc.  Instead use TOR_ISALPHA, TOR_ISALNUM, etc.
			
 
				+
			
 
				+Don't use strcat, strcpy, strncat, or strncpy. Use strlcat and strlcpy
			
 
				+instead.
			
 
				+
			
 
				+Don't use tor_asprintf then smartlist_add; use smartlist_add_asprintf.
			
 
				+
			
 
				+Don't use any of these functions: they aren't portable. Use the
			
 
				+version prefixed with `tor_` instead: strtok_r, memmem, memstr,
			
 
				+asprintf, localtime_r, gmtime_r, inet_aton, inet_ntop, inet_pton,
			
 
				+getpass, ntohll, htonll, strdup,   (This list is incomplete.)
			
 
				+
			
 
				+Don't create or close sockets directly. Instead use the wrappers in
			
 
				+compat.h.
			
 
				+
			
 
				+When creating new APIs, only use 'char \*' to represent 'pointer to a
			
 
				+nul-terminated string'.  Represent 'pointer to a chunk of memory' as
			
 
				+'uint8_t \*'.  (Many older Tor APIs ignore this rule.)
			
 
				+
			
 
				+Don't encode/decode u32, u64, or u16 to byte arrays by casting
			
 
				+pointers. That can crash if the pointers aren't aligned, and can cause
			
 
				+endianness problems.  Instead say something more like set_uint32(ptr,
			
 
				+htonl(foo)) to encode, and ntohl(get_uint32(ptr)) to decode.
			
 
				+
			
 
				+Don't declare a 0-argument function with "void foo()".  That's C++
			
 
				+syntax. In C you say "void foo(void)".
			
 
				+
			
 
				+When creating new APIs, use const everywhere you reasonably can.
			
 
				+
			
 
				+Sockets should have type tor_socket_t, not int.
			
 
				+