A simulator for the Walking Onions protocol for allowing Tor to scale to large numbers of clients and relays
|
5 years ago | |
---|---|---|
analysis | 5 years ago | |
logs | 5 years ago | |
Dockerfile.in | 5 years ago | |
README.md | 5 years ago | |
attach-docker | 5 years ago | |
build-docker | 5 years ago | |
client.py | 5 years ago | |
dirauth.py | 5 years ago | |
network.py | 5 years ago | |
relay.py | 5 years ago | |
run-docker | 5 years ago | |
run_sims.in | 5 years ago | |
simulator.py | 5 years ago |
This repository contains the simulator used in the paper "Walking Onions: Scaling Anonymity Networks while Protecting Users".
The simulator was tested with python 3.6.9, using the python packages pynacl (version 1.3.0) and merklelib (version 1.0).
In this repository, you will find:
The simulator is written in Python, so you don't strictly have to build it per se. However, for convenience, compatibility, and reproduceability, we provide a docker environment that is all set up so that you can run the simulator.
The simulator running in the docker container will write its log files into a directory logdir on the host machine via a bind mount. In order that you (the person running the simulator) can read and analyze those log files outside of the docker, the log files should be owned by your user id (on the host machine).
To accomplish this, when the docker image is built, the build-docker script will check what your user and group ids are on the host machine, and create the "walkingo" user in the docker with those same user and group ids. That way, when the walkingo user in the docker runs the simulator, it will write files to the logdir directory owned by you, and you will be able to easily read them.
The downside is that each person running the simulator needs to build their own docker image, since the image is customized to their user and group ids. Luckily this is easy.
Run ./build-docker
to create a docker image called walkingonions
. As above, this image is meant to be run only by the person that built it, and run from this same directory.
To start the docker container, use the ./run-docker
command. This will do several things:
walkingo_exp
, using the docker image walkingonions
created above.The docker container will start in the background.
On the host machine (not in the docker container), edit the logdir/run_sims script. This script specifies which simulations you want to run. The simulator has three different circuit creation modes (see the paper for details):
vanilla
: Vanilla Onion Routing (equivalent to regular Tor)telescoping
: Telescoping Walking Onionssinglepass
: Single-Pass Walking OnionsIn addition, the two Walking Onions modes each have two possible SNIP authentication modes:
threshsig
: Threshold signaturesmerkle
: Merkle trees(Vanilla Onion Routing only has none
for the SNIP authentication mode, as it has no SNIPs.)
For any of the five valid combinations of circuit creation mode and SNIP authentication mode, the simulator can run at a specified scale. This is a decimal fraction of a network around the size of today's Tor network: 6,500 relays and 2,500,000 clients.
The logdir/run_sims file has (initially commented-out) entries for all five mode combinations and a range of scales from 0.05 to 0.30. Edit that file to uncomment the lines for the simulations you want to run.
The simulations can be configured to run for a certain number of epochs. An epoch represents one hour of real time, but the simulator can be much slower than real time, as we will see below. In epoch number 1, the directory authorities start up, and the relays start up and register with the directory authorities. In epoch number 2, the relays bootstrap, and the clients start up. In epoch number 3, the clients bootstrap and start building circuits. The number of epochs specified in the logdir/run_sims file is the number of epochs in which circuits are built (default 10). However, the first such epoch (epoch 3) is the one in which all clients are bootstrapping, and so it is not part of the "steady state" behaviour. The scripts in the analysis directory thus separate out epoch 3 when computing the steady state, and so each simulation run will contribute 9 epochs' worth of data points. After epoch 3, some number of relays and clients will disconnect from the network each epoch, and some number will connect and bootstrap. The distributions of these numbers were selected to be reflective of the current Tor network (see the paper for details).
Note: these simulations can take a lot of time and memory. They only use a single core each, so if you have multiple cores, you can uncomment multiple lines of the logdir/run_sims file, but you'll need to keep an eye on your RAM usage. The estimated RAM usage for each simulation is documented in the logdir/run_sims file; it ranges (on our machines) from 12 GiB for the smallest 0.05-scale simulations up to 76 GiB for the largest 0.30-scale simulations. Our machines took about 15 hours for each of the smallest simulations, and about 11 days for each of the largest.
Once you have uncommented the simulations you want to run, attach to the docker container with the ./attach-docker
command. The docker container is running screen
, so you can detach from the docker (without terminating any running simulations) using the screen
Ctrl-a d command. If you exit the shell in the docker with exit
or just Ctrl-d, and no simulations are running, the screen
process will exit, and the docker container will terminate.
Once attached to the docker, start the simulations by running (from the walkingo user's home directory) logdir/run_sims
seed
, where seed
is a small integer (e.g., 1, 2, 8, 10, something like that) that seeds the random number generator. The intent is that if you run the same simulation with the same seed, you should get identical results out. (It turns out if you use Python 3.5.2 on Ubuntu 16.04, you do not get identical results out, but you do on Python 3.6.9 on Ubuntu 18.04, which is what is installed in the docker image.) For our experiments, we used seeds of 8, 10, 20, 21, 22, and 23. The name of the logfile (e.g., TELESCOPING_MERKLE_0.200000_10_21.log
) records the mode, the scale, the number of (circuit-building) epochs, and the seed.
When you run the logdir/run_sims
seed
command, screen
will switch to showing you the output of your simulation (or one of them if you started more than one). The output is not important to save (the simulator will save the important information in the log files), but it can help you keep an eye on the simulation's progress. To get back to your command line, use the Ctrl-a 0 command to screen
(that's a zero, not a letter o). From there, as above, use Ctrl-a d to detach from the docker container while leaving the simulations running. You can re-attach to the running container at any time using the ./attach-docker
command.
Once your simulations are complete, you can terminate the docker container by attaching to it, and exiting the shell.
The analysis scripts have two steps:
You can run the analysis scripts in whatever directory you like, but it will put the output dat files and pdfs in the current directory. You're likely to want that directory to be the bind-mounted logdir directory, so that you can access the results from the host machine. So run:
$ cd logdir
$ ../analysis/parselogs.py *.log
$ ../analysis/plotdats.py
The parselogs.py
command will parse the log files you give it, and write the dat files to the current directory. The plotdats.py
command will turn those dat files into PDF graphs using gnuplot
(which is installed in the docker image).
Note that if you did not run simulations for all five mode combinations, you will be missing the corresponding dat files. gnuplot
will output warnings that it can't find them when you run plotdats.py
, but it will graph the data you do have anyway.
Some of the graphs also plot analytical formulas. These are computations of what the results should be mathematically, and hopefully your simulation results (taking the error bars into account) do in fact follow the analytical formulas. The formulas themselves can be found in the analytical.py file.
The plotdats.py
script will produce a number of PDF graphs in the current directory (logdir in the above example):
The above four PDFs are the important ones, and are the ones presented in the paper. There are a number of others, however:
This section contains some brief notes about the structure of the simulator code itself, and where you might find things you may want to modify.
The code is in five modules:
As seen in the paper, there are a number of empirical parameters that were (safely) measured from the live Tor network, such as client and relay churn distributions, the average number of circuits created per client per epoch, and the average size of a consensus diff as compared to the full consensus. If you want to change those parameters, the last one above is called P_Delta
and is found in network.py. The others are found near the top of simulator.py.
Thanks for your interest in Walking Onions, and we're happy to answer any questions you may have.