This repository contains the simulator used in the paper "Walking Onions: Scaling Anonymity Networks while Protecting Users".
The simulator was tested with python 3.6.9, using the python packages pynacl (version 1.3.0) and merklelib (version 1.0).
This is open-source software, under the MIT License.
In this repository, you will find:
The simulator is written in Python, so you don't strictly have to build it per se. However, for convenience, compatibility, and reproduceability, we provide a docker environment that is all set up so that you can run the simulator.
The simulator running in the docker container will write its log files into a directory logdir on the host machine via a bind mount. In order that you (the person running the simulator) can read and analyze those log files outside of the docker, the log files should be owned by your user id (on the host machine).
To accomplish this, when the docker image is run, the wo_docker_start docker init script will check the user and group ids that own the logdir directory, and create the "walkingo" user in the docker with those same user and group ids. That way, when the walkingo user in the docker runs the simulator, it will write files to the logdir directory owned by you, and you will be able to easily read them.
./build-docker to create a docker image called
walkingonions. This image is meant to be run from this same directory.
In the event that you're trying to run this software far enough in the future that the packages downloaded by the Dockerfile used by the
./build-docker script are no longer available or compatible, we have put a copy of the built docker image online. The
./download-docker script will download it (note: the image is 261 MB), verify its sha256 checksum, and use
docker load to install the image. You will need the
sha256sum utilities, in addition to a version of
docker that hopefully still accepts the docker image format our version (18.03.1-ce) generated.
To start the docker container, use the
./run-docker command. This will do several things:
walkingo_exp, using the docker image
The docker container will start in the background.
On the host machine (not in the docker container), edit the logdir/run_sims script. This script specifies which simulations you want to run. The simulator has three different circuit creation modes (see the paper for details):
vanilla: Vanilla Onion Routing (equivalent to regular Tor)
telescoping: Telescoping Walking Onions
singlepass: Single-Pass Walking Onions
In addition, the two Walking Onions modes each have two possible SNIP authentication modes:
threshsig: Threshold signatures
merkle: Merkle trees
(Vanilla Onion Routing only has
none for the SNIP authentication mode, as it has no SNIPs.)
For any of the five valid combinations of circuit creation mode and SNIP authentication mode, the simulator can run at a specified scale. This is a decimal fraction of a network around the size of today's Tor network: 6,500 relays and 2,500,000 clients.
The logdir/run_sims file has (initially commented-out) entries for all five mode combinations and a range of scales from 0.05 to 0.30. Edit that file to uncomment the lines for the simulations you want to run.
The simulations can be configured to run for a certain number of epochs. An epoch represents one hour of real time, but the simulator can be much slower than real time, as we will see below. In epoch number 1, the directory authorities start up, and the relays start up and register with the directory authorities. In epoch number 2, the relays bootstrap, and the clients start up. In epoch number 3, the clients bootstrap and start building circuits. The number of epochs specified in the logdir/run_sims file is the number of epochs in which circuits are built (default 10). However, the first such epoch (epoch 3) is the one in which all clients are bootstrapping, and so it is not part of the "steady state" behaviour. The scripts in the analysis directory thus separate out epoch 3 when computing the steady state, and so each simulation run will contribute 9 epochs' worth of data points. After epoch 3, some number of relays and clients will disconnect from the network each epoch, and some number will connect and bootstrap. The distributions of these numbers were selected to be reflective of the current Tor network (see the paper for details).
Note: these simulations can take a lot of time and memory. They only use a single core each, so if you have multiple cores, you can uncomment multiple lines of the logdir/run_sims file, but you'll need to keep an eye on your RAM usage. The estimated RAM usage for each simulation is documented in the logdir/run_sims file; it ranges (on our machines) from 12 GiB for the smallest 0.05-scale simulations up to 76 GiB for the largest 0.30-scale simulations. Our machines took about 15 hours for each of the smallest simulations, and about 11 days for each of the largest.
Once you have uncommented the simulations you want to run, attach to the docker container with the
./attach-docker command. The docker container is running
screen, so you can detach from the docker (without terminating any running simulations) using the
screen Ctrl-a d command. If you exit the shell in the docker with
exit or just Ctrl-d, and no simulations are running, the
screen process will exit, and the docker container will terminate.
Once attached to the docker, start the simulations by running (from the walkingo user's home directory)
seed is a small integer (e.g., 1, 2, 8, 10, something like that) that seeds the random number generator. The intent is that if you run the same simulation with the same seed, you should get identical results out. (It turns out if you use Python 3.5.2 on Ubuntu 16.04, you do not get identical results out, but you do on Python 3.6.9 on Ubuntu 18.04, which is what is installed in the docker image.) For our experiments, we used seeds of 8, 10, 20, 21, 22, and 23. The name of the logfile (e.g.,
TELESCOPING_MERKLE_0.200000_10_21.log) records the mode, the scale, the number of (circuit-building) epochs, and the seed.
When you run the
screen will switch to showing you the output of your simulation (or one of them if you started more than one). The output is not important to save (the simulator will save the important information in the log files), but it can help you keep an eye on the simulation's progress. To get back to your command line, use the Ctrl-a 0 command to
screen (that's a zero, not a letter o). From there, as above, use Ctrl-a d to detach from the docker container while leaving the simulations running. You can re-attach to the running container at any time using the
Once your simulations are complete, you can terminate the docker container by attaching to it, and exiting the shell.
The analysis scripts have two steps:
You can run the analysis scripts in whatever directory you like, but it will put the output dat files and pdfs in the current directory. You're likely to want that directory to be the bind-mounted logdir directory, so that you can access the results from the host machine. So run:
$ cd logdir $ ../analysis/parselogs.py *.log $ ../analysis/plotdats.py
parselogs.py command will parse the log files you give it, and write the dat files to the current directory. The
plotdats.py command will turn those dat files into PDF graphs using
gnuplot (which is installed in the docker image).
Note that if you did not run simulations for all five mode combinations, you will be missing the corresponding dat files.
gnuplot will output warnings that it can't find them when you run
plotdats.py, but it will graph the data you do have anyway.
Some of the graphs also plot analytical formulas. These are computations of what the results should be mathematically, and hopefully your simulation results (taking the error bars into account) do in fact follow the analytical formulas. The formulas themselves can be found in the analytical.py file.
plotdats.py script will produce a number of PDF graphs in the current directory (logdir in the above example):
The above four PDFs are the important ones, and are the ones presented in the paper. There are a number of others, however:
This section contains some brief notes about the structure of the simulator code itself, and where you might find things you may want to modify.
The code is in five modules:
As seen in the paper, there are a number of empirical parameters that were (safely) measured from the live Tor network, such as client and relay churn distributions, the average number of circuits created per client per epoch, and the average size of a consensus diff as compared to the full consensus. If you want to change those parameters, the last one above is called
P_Delta and is found in network.py. The others are found near the top of simulator.py.
Thanks for your interest in Walking Onions, and we're happy to answer any questions you may have.