# Simulator for Testing Walking Onions Performance

This repository contains the simulator used in the paper "[Walking Onions:
Scaling Anonymity Networks while Protecting Users](https://crysp.uwaterloo.ca/software/walkingonions/)".

The simulator was tested with python 3.6.9, using the python packages pynacl (version 1.3.0) and merklelib (version 1.0).

This is open-source software, under the [MIT License](LICENSE).

## What is included

In this repository, you will find:

  * **README.md**: this file
  * **client.py**, **dirauth.py**, **network.py**, **relay.py**, **simulator.py**: the source code for the simulator
  * **build-docker**, **download-docker**, **run-docker**, **attach-docker**: scripts to create and run the docker containing the simulator (see below)
  * **Dockerfile.in**, **run\_small.in**, **wo\_docker\_start.in**: templates used by build-docker and run-docker
  * **analysis**: a directory containing scripts to analyze the log files produced by the simulator and generate graphs in PDF form.  See [Analyzing the results](#analyzing-the-results) below for more information.
  * **logs**: a directory containing the logs output by the simulator when _we_ ran it.  These are the very logfiles that were processed by the [parselogs.py](analysis/parselogs.py) and [plotdats.py](analysis/plotdats.py) scripts to produce the graphs in the paper.  (When you run the simulator yourself, your log files will end up in a directory called **logdir** that will be created by **run-docker**.)

## tl;dr

  * `./build-docker` _or_ `./download-docker`
  * `./run-docker`
  * Edit the **logdir/run_sims** file to uncomment the simulations you want to run in parallel, noting the memory requirements of each simulation noted in that file.
  * `./attach-docker`
  * Inside the docker container:
    * `logdir/run_sims 1`
    * Wait for the simulations to finish
    * `cd logdir`
    * `../analysis/parselogs.py *.log`
    * `../analysis/plotdats.py`
    * `exit`

## Building the simulator

The simulator is written in Python, so you don't strictly have to build it per se.  However, for convenience, compatibility, and reproduceability, we provide a docker environment that is all set up so that you can run the simulator.

### A note about user ids

The simulator running in the docker container will write its log files into a directory **logdir** on the host machine via a [bind mount](https://docs.docker.com/storage/bind-mounts/).  In order that you (the person running the simulator) can read and analyze those log files outside of the docker, the log files should be owned by your user id (on the host machine).

To accomplish this, when the docker image is run, the **wo\_docker\_start** docker init script will check the user and group ids that own the **logdir** directory, and create the "walkingo" user in the docker with those same user and group ids.  That way, when the walkingo user in the docker runs the simulator, it will write files to the **logdir** directory owned by you, and you will be able to easily read them.

### Building the docker image

Run `./build-docker` to create a docker image called `walkingonions`.  This image is meant to be run from this same directory.

### Downloading the docker image

In the event that you're trying to run this software far enough in the future that the packages downloaded by the **Dockerfile** used by the `./build-docker` script are no longer available or compatible, we have put a copy of the built docker image online.  The `./download-docker` script will download it (note: the image is 261 MB), verify its sha256 checksum, and use `docker load` to install the image.  You will need the `wget` and `sha256sum` utilities, in addition to a version of `docker` that hopefully still accepts the docker image format our version (18.03.1-ce) generated.

## Running the simulator

To start the docker container, use the `./run-docker` command.  This will do several things:

  * Create the **logdir** directory, if it does not already exist.  This directory will be visible both inside and outside the docker container.
  * Create the **run_sims** script inside the **logdir** directory (if it does not already exist); this is a script you can edit and run to start the simulations.
  * Start a docker container named `walkingo_exp`, using the docker image `walkingonions` created above.

The docker container will start **in the background**.

On the host machine (_not_ in the docker container), edit the **logdir/run_sims** script.  This script specifies which simulations you want to run.  The simulator has three different circuit creation modes (see the paper for details):

  * `vanilla`: Vanilla Onion Routing (equivalent to regular Tor)
  * `telescoping`: Telescoping Walking Onions
  * `singlepass`: Single-Pass Walking Onions

In addition, the two Walking Onions modes each have two possible _SNIP authentication_ modes:

  * `threshsig`: Threshold signatures
  * `merkle`: Merkle trees

(Vanilla Onion Routing only has `none` for the SNIP authentication mode, as it has no SNIPs.)

For any of the five valid combinations of circuit creation mode and SNIP authentication mode, the simulator can run at a specified _scale_.  This is a decimal fraction of a network around the size of today's Tor network: 6,500 relays and 2,500,000 clients.

The **logdir/run_sims** file has (initially commented-out) entries for all five mode combinations and a range of scales from 0.05 to 0.30.  Edit that file to uncomment the lines for the simulations you want to run.

The simulations can be configured to run for a certain number of _epochs_.  An epoch represents one hour of real time, but the simulator can be much slower than real time, as we will see below.  In epoch number 1, the directory authorities start up, and the relays start up and register with the directory authorities.  In epoch number 2, the relays bootstrap, and the clients start up.  In epoch number 3, the clients bootstrap and start building circuits.  The number of epochs specified in the **logdir/run_sims** file is the number of epochs in which circuits are built (default 10).  However, the first such epoch (epoch 3) is the one in which all clients are bootstrapping, and so it is not part of the "steady state" behaviour.  The scripts in the **analysis** directory thus separate out epoch 3 when computing the steady state, and so each simulation run will contribute 9 epochs' worth of data points.  After epoch 3, some number of relays and clients will disconnect from the network each epoch, and some number will connect and bootstrap.  The distributions of these numbers were selected to be reflective of the current Tor network (see the paper for details).

**Note**: these simulations can take a lot of time and memory.  They only use a single core each, so if you have multiple cores, you can uncomment multiple lines of the **logdir/run_sims** file, but you'll need to keep an eye on your RAM usage.  The estimated RAM usage for each simulation is documented in the **logdir/run_sims** file; it ranges (on our machines) from 12 GiB for the smallest 0.05-scale simulations up to 76 GiB for the largest 0.30-scale simulations.  Our machines took about 15 hours for each of the smallest simulations, and about 11 days for each of the largest.

Once you have uncommented the simulations you want to run, attach to the docker container with the `./attach-docker` command.  The docker container is running `screen`, so you can detach from the docker (_without_ terminating any running simulations) using the `screen` _Ctrl-a d_ command.  If you exit the shell in the docker with `exit` or just _Ctrl-d_, and no simulations are running, the `screen` process will exit, and the docker container will terminate.

Once attached to the docker, start the simulations by running (from the walkingo user's home directory) `logdir/run_sims` _`seed`_, where _`seed`_ is a small integer (e.g., 1, 2, 8, 10, something like that) that seeds the random number generator.  The intent is that if you run the same simulation with the same seed, you should get identical results out.  (It turns out if you use Python 3.5.2 on Ubuntu 16.04, you do _not_ get identical results out, but you do on Python 3.6.9 on Ubuntu 18.04, which is what is installed in the docker image.)  For our experiments, we used seeds of 8, 10, 20, 21, 22, and 23.  The name of the logfile (e.g., `TELESCOPING_MERKLE_0.200000_10_21.log`) records the mode, the scale, the number of (circuit-building) epochs, and the seed.

When you run the `logdir/run_sims` _`seed`_ command, `screen` will switch to showing you the output of your simulation (or one of them if you started more than one).  The output is not important to save (the simulator will save the important information in the log files), but it can help you keep an eye on the simulation's progress.  To get back to your command line, use the _Ctrl-a 0_ command to `screen` (that's a zero, not a letter o).  From there, as above, use _Ctrl-a d_ to detach from the docker container while leaving the simulations running.  You can re-attach to the running container at any time using the `./attach-docker` command.

Once your simulations are complete, you can terminate the docker container by attaching to it, and exiting the shell.

### Analyzing the results

The analysis scripts have two steps:

  1. Parse the log files to produce dat files containing statistical data from the log files.
  2. Plot the dat files as PDFs.

You can run the analysis scripts in whatever directory you like, but it will put the output dat files and pdfs in the current directory.  You're likely to want that directory to be the bind-mounted **logdir** directory, so that you can access the results from the host machine.  So run:

```
$ cd logdir
$ ../analysis/parselogs.py *.log
$ ../analysis/plotdats.py
```

The `parselogs.py` command will parse the log files you give it, and write the dat files to the current directory.  The `plotdats.py` command will turn those dat files into PDF graphs using `gnuplot` (which is installed in the docker image).

Note that if you did not run simulations for all five mode combinations, you will be missing the corresponding dat files.  `gnuplot` will output warnings that it can't find them when you run `plotdats.py`, but it will graph the data you _do_ have anyway.

Some of the graphs also plot _analytical formulas_.  These are computations of what the results _should_ be mathematically, and hopefully your simulation results (taking the error bars into account) do in fact follow the analytical formulas.  The formulas themselves can be found in the [analytical.py](analysis/analytical.py) file.

The `plotdats.py` script will produce a number of PDF graphs in the current directory (**logdir** in the above example):

  * **relay_ss.pdf**: The average number of bytes per epoch each relay sends or receives (total relay bytes divided by the number of relays). The error bars are on a per-epoch basis.  Only data from steady-state epochs (ss) is used.
  * **relay_ss_wide.pdf**: A zoomed-out view of the above plot, on a log-log scale, showing the asymptotic behaviour of the analytical formulas.
  * **client_ss.pdf**, **client_ss_wide.pdf**: as above, but for client bytes instead of for relay bytes.

The above four PDFs are the important ones, and are the ones presented in the paper.  There are a number of others, however:

  * **relay_perclient_ss.pdf**, **relay_perclient_ss_wide.pdf**: The total number of bytes sent or received by relays, divided by the number of _clients_ (not relays).  The reasoning here is that the number of clients is the largest determiner of the total amount of traffic in the network (since the number of circuits built is proportional to the number of clients).  Due to churn, the number of clients and the number of relays each change from epoch to epoch.  Since the total number of bytes is largely determined by the number of clients, then the **relay_ss.pdf** plot is showing a value whose numerator is a random variable in the number of clients, and whose denominator is a random variable in the number of relays.  On _average_, the ratio of the number of clients to the number of relays is fixed, but since both the numerator and denominator are varying, the error bars are larger.  This plot has the number of clients in both the numerator and denominator, so the error bars are much smaller, and show the variance due to relay churn, but not also due to client churn.
  * **dirauth_ss.pdf**: The number of bytes sent and received by directory authorities, only counting steady-state epochs.
  * **dirauth.pdf**: As above, but for all epochs (there is pretty much no difference for directory authorities, so this graph and the above are very similar).
  * **relay.pdf**, **client.pdf**: The total number of bytes sent or received per epoch per relay or client, not only in steady state.  The data points are on a per-relay (or per-client) basis, not a per-epoch basis, as above.  The error bars are not plotted on this graph because they are not meaningful: different relays are _expected_ to have vastly different results, because they have different roles (fallback vs not), properties (bootstrapping vs not), and bandwidths (higher-bandwidth relays are used by clients with higher probability).  Similarly clients can be bootstrapping or not, and bootstrapping clients use much more bandwidth in Vanilla Onion Routing than non-bootstrapping clients.  We therefore break up the data into different roles and properties in the graphs below:
  * **relay_bf.pdf**, **relay_f.pdf**, **relay_b.pdf**, **relay_n.pdf**: Separate plots for bootstrapping fallback relays, non-bootstrapping fallback relays, bootstrapping non-fallback relays, and non-bootstrapping non-fallback relays.  Each plot shows the total number of bytes sent and received per epoch, _divided by_ the relay's bandwidth.
  * **client_b.pdf**, **client_n.pdf**: Separate plots for bootstrapping and non-bootstrapping clients.  These plots are total bytes sent and received per epoch by clients.  (Clients do not have bandwidth weights, so these plots are not normalized to bandwidth like the ones above.)

## Hacking the simulator

This section contains some brief notes about the structure of the simulator code itself, and where you might find things you may want to modify.

The code is in five modules:

  * **network.py**: The code for the network simulation.  This code implements network addresses, network connections, network messages, and epoch transitions.
  * **dirauth.py**: The code for directory authorities.
  * **relay.py**: The code for relays.
  * **client.py**: The code for clients.
  * **simulator.py**: The orchestrator that runs the whole simulation.  This is the main program to run the simulator.

As seen in the paper, there are a number of _empirical parameters_ that were (safely) measured from the live Tor network, such as client and relay churn distributions, the average number of circuits created per client per epoch, and the average size of a consensus diff as compared to the full consensus.  If you want to change those parameters, the last one above is called `P_Delta` and is found in **network.py**.  The others are found near the top of **simulator.py**.

Thanks for your interest in Walking Onions, and we're happy to answer any questions you may have.