A simulator for the Walking Onions protocol for allowing Tor to scale to large numbers of clients and relays

Ian Goldberg 7d60eef938 Update the size of the docker image download in the README		4 år sedan
analysis	fe91fc6393 Change font size and aspect ratio of plots	5 år sedan
logs	e8f9e4710f The log files output by our runs of the simulator	5 år sedan
Dockerfile.in	f5d2876d7b Remove the requirement for each user to build their own docker image	5 år sedan
LICENSE	928d357bdc Add the MIT license	5 år sedan
README.md	7d60eef938 Update the size of the docker image download in the README	4 år sedan
attach-docker	cb1ecf925b Handle the case where the logdir directory has group "users"	4 år sedan
build-docker	f5d2876d7b Remove the requirement for each user to build their own docker image	5 år sedan
client.py	c537ca21b1 Comment out a couple of warnings	5 år sedan
dirauth.py	5151a304ed Relays must tick their epoch before clients in Single-Pass Walking Onions	5 år sedan
download-docker	9ccfe29591 Add an option to download the docker image instead of building it	5 år sedan
network.py	5151a304ed Relays must tick their epoch before clients in Single-Pass Walking Onions	5 år sedan
relay.py	5151a304ed Relays must tick their epoch before clients in Single-Pass Walking Onions	5 år sedan
run-docker	f5d2876d7b Remove the requirement for each user to build their own docker image	5 år sedan
run_sims.in	c833bad063 Usage message for run_sims	5 år sedan
simulator.py	beb5820498 Now that the VRF failures are supposed to have gone away, _do_ crash if we hit one	5 år sedan
walkingonions-image.tar.gz.sha256	11f1ff53d6 Update the sha256 hash of the new docker image	4 år sedan
wo_docker_start.in	cb1ecf925b Handle the case where the logdir directory has group "users"	4 år sedan

Simulator for Testing Walking Onions Performance

This repository contains the simulator used in the paper "Walking Onions: Scaling Anonymity Networks while Protecting Users".

The simulator was tested with python 3.6.9, using the python packages pynacl (version 1.3.0) and merklelib (version 1.0).

This is open-source software, under the MIT License.

What is included

In this repository, you will find:

README.md: this file
client.py, dirauth.py, network.py, relay.py, simulator.py: the source code for the simulator
build-docker, download-docker, run-docker, attach-docker: scripts to create and run the docker containing the simulator (see below)
Dockerfile.in, run_small.in, wo_docker_start.in: templates used by build-docker and run-docker
analysis: a directory containing scripts to analyze the log files produced by the simulator and generate graphs in PDF form. See Analyzing the results below for more information.
logs: a directory containing the logs output by the simulator when we ran it. These are the very logfiles that were processed by the parselogs.py and plotdats.py scripts to produce the graphs in the paper. (When you run the simulator yourself, your log files will end up in a directory called logdir that will be created by run-docker.)

tl;dr

./build-docker or ./download-docker
./run-docker
Edit the logdir/run_sims file to uncomment the simulations you want to run in parallel, noting the memory requirements of each simulation noted in that file.
./attach-docker
Inside the docker container:
- logdir/run_sims 1
- Wait for the simulations to finish
- cd logdir
- ../analysis/parselogs.py *.log
- ../analysis/plotdats.py
- exit

Building the simulator

The simulator is written in Python, so you don't strictly have to build it per se. However, for convenience, compatibility, and reproduceability, we provide a docker environment that is all set up so that you can run the simulator.

A note about user ids

The simulator running in the docker container will write its log files into a directory logdir on the host machine via a bind mount. In order that you (the person running the simulator) can read and analyze those log files outside of the docker, the log files should be owned by your user id (on the host machine).

To accomplish this, when the docker image is run, the wo_docker_start docker init script will check the user and group ids that own the logdir directory, and create the "walkingo" user in the docker with those same user and group ids. That way, when the walkingo user in the docker runs the simulator, it will write files to the logdir directory owned by you, and you will be able to easily read them.

Building the docker image

Run ./build-docker to create a docker image called walkingonions. This image is meant to be run from this same directory.

Downloading the docker image

In the event that you're trying to run this software far enough in the future that the packages downloaded by the Dockerfile used by the ./build-docker script are no longer available or compatible, we have put a copy of the built docker image online. The ./download-docker script will download it (note: the image is 261 MB), verify its sha256 checksum, and use docker load to install the image. You will need the wget and sha256sum utilities, in addition to a version of docker that hopefully still accepts the docker image format our version (18.03.1-ce) generated.

Running the simulator

To start the docker container, use the ./run-docker command. This will do several things:

Create the logdir directory, if it does not already exist. This directory will be visible both inside and outside the docker container.
Create the run_sims script inside the logdir directory (if it does not already exist); this is a script you can edit and run to start the simulations.
Start a docker container named walkingo_exp, using the docker image walkingonions created above.

The docker container will start in the background.

On the host machine (not in the docker container), edit the logdir/run_sims script. This script specifies which simulations you want to run. The simulator has three different circuit creation modes (see the paper for details):

vanilla: Vanilla Onion Routing (equivalent to regular Tor)
telescoping: Telescoping Walking Onions
singlepass: Single-Pass Walking Onions

In addition, the two Walking Onions modes each have two possible SNIP authentication modes:

threshsig: Threshold signatures
merkle: Merkle trees

(Vanilla Onion Routing only has none for the SNIP authentication mode, as it has no SNIPs.)

For any of the five valid combinations of circuit creation mode and SNIP authentication mode, the simulator can run at a specified scale. This is a decimal fraction of a network around the size of today's Tor network: 6,500 relays and 2,500,000 clients.

The logdir/run_sims file has (initially commented-out) entries for all five mode combinations and a range of scales from 0.05 to 0.30. Edit that file to uncomment the lines for the simulations you want to run.

The simulations can be configured to run for a certain number of epochs. An epoch represents one hour of real time, but the simulator can be much slower than real time, as we will see below. In epoch number 1, the directory authorities start up, and the relays start up and register with the directory authorities. In epoch number 2, the relays bootstrap, and the clients start up. In epoch number 3, the clients bootstrap and start building circuits. The number of epochs specified in the logdir/run_sims file is the number of epochs in which circuits are built (default 10). However, the first such epoch (epoch 3) is the one in which all clients are bootstrapping, and so it is not part of the "steady state" behaviour. The scripts in the analysis directory thus separate out epoch 3 when computing the steady state, and so each simulation run will contribute 9 epochs' worth of data points. After epoch 3, some number of relays and clients will disconnect from the network each epoch, and some number will connect and bootstrap. The distributions of these numbers were selected to be reflective of the current Tor network (see the paper for details).

Note: these simulations can take a lot of time and memory. They only use a single core each, so if you have multiple cores, you can uncomment multiple lines of the logdir/run_sims file, but you'll need to keep an eye on your RAM usage. The estimated RAM usage for each simulation is documented in the logdir/run_sims file; it ranges (on our machines) from 12 GiB for the smallest 0.05-scale simulations up to 76 GiB for the largest 0.30-scale simulations. Our machines took about 15 hours for each of the smallest simulations, and about 11 days for each of the largest.

Once you have uncommented the simulations you want to run, attach to the docker container with the ./attach-docker command. The docker container is running screen, so you can detach from the docker (without terminating any running simulations) using the screen Ctrl-a d command. If you exit the shell in the docker with exit or just Ctrl-d, and no simulations are running, the screen process will exit, and the docker container will terminate.

Once attached to the docker, start the simulations by running (from the walkingo user's home directory) logdir/run_sims seed, where seed is a small integer (e.g., 1, 2, 8, 10, something like that) that seeds the random number generator. The intent is that if you run the same simulation with the same seed, you should get identical results out. (It turns out if you use Python 3.5.2 on Ubuntu 16.04, you do not get identical results out, but you do on Python 3.6.9 on Ubuntu 18.04, which is what is installed in the docker image.) For our experiments, we used seeds of 8, 10, 20, 21, 22, and 23. The name of the logfile (e.g., TELESCOPING_MERKLE_0.200000_10_21.log) records the mode, the scale, the number of (circuit-building) epochs, and the seed.

When you run the logdir/run_sims seed command, screen will switch to showing you the output of your simulation (or one of them if you started more than one). The output is not important to save (the simulator will save the important information in the log files), but it can help you keep an eye on the simulation's progress. To get back to your command line, use the Ctrl-a 0 command to screen (that's a zero, not a letter o). From there, as above, use Ctrl-a d to detach from the docker container while leaving the simulations running. You can re-attach to the running container at any time using the ./attach-docker command.

Once your simulations are complete, you can terminate the docker container by attaching to it, and exiting the shell.

Analyzing the results

The analysis scripts have two steps:

Parse the log files to produce dat files containing statistical data from the log files.
Plot the dat files as PDFs.

You can run the analysis scripts in whatever directory you like, but it will put the output dat files and pdfs in the current directory. You're likely to want that directory to be the bind-mounted logdir directory, so that you can access the results from the host machine. So run:

$ cd logdir
$ ../analysis/parselogs.py *.log
$ ../analysis/plotdats.py

The parselogs.py command will parse the log files you give it, and write the dat files to the current directory. The plotdats.py command will turn those dat files into PDF graphs using gnuplot (which is installed in the docker image).

Note that if you did not run simulations for all five mode combinations, you will be missing the corresponding dat files. gnuplot will output warnings that it can't find them when you run plotdats.py, but it will graph the data you do have anyway.

Some of the graphs also plot analytical formulas. These are computations of what the results should be mathematically, and hopefully your simulation results (taking the error bars into account) do in fact follow the analytical formulas. The formulas themselves can be found in the analytical.py file.

The plotdats.py script will produce a number of PDF graphs in the current directory (logdir in the above example):

relay_ss.pdf: The average number of bytes per epoch each relay sends or receives (total relay bytes divided by the number of relays). The error bars are on a per-epoch basis. Only data from steady-state epochs (ss) is used.
relay_ss_wide.pdf: A zoomed-out view of the above plot, on a log-log scale, showing the asymptotic behaviour of the analytical formulas.
client_ss.pdf, client_ss_wide.pdf: as above, but for client bytes instead of for relay bytes.

The above four PDFs are the important ones, and are the ones presented in the paper. There are a number of others, however:

relay_perclient_ss.pdf, relay_perclient_ss_wide.pdf: The total number of bytes sent or received by relays, divided by the number of clients (not relays). The reasoning here is that the number of clients is the largest determiner of the total amount of traffic in the network (since the number of circuits built is proportional to the number of clients). Due to churn, the number of clients and the number of relays each change from epoch to epoch. Since the total number of bytes is largely determined by the number of clients, then the relay_ss.pdf plot is showing a value whose numerator is a random variable in the number of clients, and whose denominator is a random variable in the number of relays. On average, the ratio of the number of clients to the number of relays is fixed, but since both the numerator and denominator are varying, the error bars are larger. This plot has the number of clients in both the numerator and denominator, so the error bars are much smaller, and show the variance due to relay churn, but not also due to client churn.
dirauth_ss.pdf: The number of bytes sent and received by directory authorities, only counting steady-state epochs.
dirauth.pdf: As above, but for all epochs (there is pretty much no difference for directory authorities, so this graph and the above are very similar).
relay.pdf, client.pdf: The total number of bytes sent or received per epoch per relay or client, not only in steady state. The data points are on a per-relay (or per-client) basis, not a per-epoch basis, as above. The error bars are not plotted on this graph because they are not meaningful: different relays are expected to have vastly different results, because they have different roles (fallback vs not), properties (bootstrapping vs not), and bandwidths (higher-bandwidth relays are used by clients with higher probability). Similarly clients can be bootstrapping or not, and bootstrapping clients use much more bandwidth in Vanilla Onion Routing than non-bootstrapping clients. We therefore break up the data into different roles and properties in the graphs below:
relay_bf.pdf, relay_f.pdf, relay_b.pdf, relay_n.pdf: Separate plots for bootstrapping fallback relays, non-bootstrapping fallback relays, bootstrapping non-fallback relays, and non-bootstrapping non-fallback relays. Each plot shows the total number of bytes sent and received per epoch, divided by the relay's bandwidth.
client_b.pdf, client_n.pdf: Separate plots for bootstrapping and non-bootstrapping clients. These plots are total bytes sent and received per epoch by clients. (Clients do not have bandwidth weights, so these plots are not normalized to bandwidth like the ones above.)

Hacking the simulator

This section contains some brief notes about the structure of the simulator code itself, and where you might find things you may want to modify.

The code is in five modules:

network.py: The code for the network simulation. This code implements network addresses, network connections, network messages, and epoch transitions.
dirauth.py: The code for directory authorities.
relay.py: The code for relays.
client.py: The code for clients.
simulator.py: The orchestrator that runs the whole simulation. This is the main program to run the simulator.

As seen in the paper, there are a number of empirical parameters that were (safely) measured from the live Tor network, such as client and relay churn distributions, the average number of circuits created per client per epoch, and the average size of a consensus diff as compared to the full consensus. If you want to change those parameters, the last one above is called P_Delta and is found in network.py. The others are found near the top of simulator.py.

Thanks for your interest in Walking Onions, and we're happy to answer any questions you may have.

README.md