Implementation and results for DHTPIR simulations

Ian Goldberg 64ebbe5d68 Update to the latest cuckoo simulation code		3 years ago
cuckoo_simulation	64ebbe5d68 Update to the latest cuckoo simulation code	3 years ago
dhtpir_simulation	44de1f2746 Preparing artifacts for external review	3 years ago
outputs	44de1f2746 Preparing artifacts for external review	3 years ago
plots	44de1f2746 Preparing artifacts for external review	3 years ago
README.md	44de1f2746 Preparing artifacts for external review	3 years ago

DHTPIR

Simulation code for research performed by Miti Mazmudar, Stan Gurtler, and Ian Goldberg at the University of Waterloo

Directories

./dhtpir_simulation/: The code used to simulate DHTPIR and other related works, in order to compare their respective computational and bandwidth usage. (Details on running this code in "Usage" below)
./cuckoo_simulation/: The code used to simulate behaviour of the cuckoo and commensal cuckoo rules referred to in the work, in order to obtain specific values for the relevant parameters. (Details on running this code in "Usage" below)
./outputs/: The raw outputs of the code in ./dhtpir_simulation/, as obtained when we ran it on our machines. Subdirectories are organized by type of system, then by the number of quorums in the simulation, then by the number of nodes per quorum, then by the number of documents in the simulation, then by seed. At the leaf level, a variety of output files (depending on the system) are present describing different measured values for that given simulation run.
./plots/: The graphs we generated from these outputs, as well as the code to generate these graphs. Many of these graphs are in the work itself, but a few additional graphs are also present in this directory; further detail on graphs in "Graphs" below.

Usage

DHTPIR Simulation

To run an individual run of our simulator, you will need to be in the ./dhtpir_simulation/ directory. From there, run ./test_harness.py <type> <numDocuments> <sizeOfDocuments> <numGroups> <numNodes> --seed [seed], with appropriate values for each of these variables.

<type> takes one of the following forms, depending on which type of system you intend to simulate:

-b: "BaseNode", or a basic DHT system with no additional privacy-preserving or secure behaviour of any kind simulated
-r: "RCPNode", which simulates the RCP procedures described by Young et al. on top of a basic DHT system
-q: "QPNode", which simulates the QP procedures described by Backes et al. on top of the RCP system (but does not implement any form of oblivious transfer at the final quorum, which returns back the sought document)
-l: "QPNode + LastHop", which simulates the same QP procedures as -q, but additionally implements oblivious transfer at the final quorum (which returns back the sought document)
-d: "DHTPIRNode", which simulates DHTPIR procedures (that is, QP procedures until the final quorum, and then IT-PIR among the final quorum instead of oblivious transfer)

<numDocuments> describes the number of documents that will be in the system during simulation (as used in the paper, these are intended to simulate erasure-coded chunks of documents, rather than full documents themselves).

<sizeOfDocuments> describes the size of documents in bytes that will be in the system during simulation (again, as used in the paper, these actually set the size of the erasure-coded chunks of documents: we used 512 bytes consistently throughout our simulation). All documents in the system will be randomly generated with exactly the specified number of bytes.

<numGroups> describes the number of quorums in the system as a whole. For BaseNode, this describes the number of total nodes in the DHT, as BaseNode does not use quorums at all.

<numNodes> describes the number of nodes per quorum in the system. For BaseNode, this value is ignored, as BaseNode does not use quorums.

[seed] is an optional parameter, which seeds the randomness used to generate documents. Because document IDs within the system are their hashes (in our case, the SHA256 hash of the file), this impacts the distribution of documents throughout the system; in our simulations, we ran several instances of each type of node with each set of relevant parameters, varying the seed, in a Monte Carlo simulation.

In addition, ./test_harness.py -h displays a help message similar in content to what is discussed in this README.

There are two other programs in the directory, options_setup.c and run_tests.c. options_setup.c generates a config file used by run_tests.c to run several simulation runs simultaneously/in succession. Both can be compiled in the normal ways (e.g. gcc -o options_setup options_setup.c), without any additional installation necessary beyond gcc or clang (though run_tests.c will need the -pthread flag during compilation/linking). options_setup.c does not take any arguments, and run_tests.c takes one argument, the maximum number of processes it may run simulations inside of simultaneously.

Cuckoo Simulation

In order to build this code, you will need Cargo installed. From the ./dhtpir_simulation/ directory, run cargo build in order to build the simulation code. With the code built, run cargo run --bin cuckoo_simulation <honest> <malicious> <log2_regions> <log2_regionsPerQuorum> <k> <nodesPerRegion> <iterations> <seed>, with appropriate values for each of these variables.

<honest> describes the number of honest nodes in the system.

<malicious> describes the number of malicious nodes in the system.

<log2_regions> describes log2(the number of regions in the system).

<log2_regionsPerQuorum> describes log2(the number of regions per quorum in the system).

<k> describes k-1, the number of secondary joins in the commensal cuckoo rule before new primary joins are accepted.

<nodesPerRegion> describes the desired number of nodes in a given region.

<seed> is a random seed.

There is an additional file in this directory, cuckoo_vars.py, which was used to do arithmetic to calculate certain variables associated with the cuckoo and commensal cuckoo rules as used in the work.

Graphs

The graphs present in the ./plots/ directory are generated by make_graphs.py, from output in the ./outputs/ directory, using plots.json as a config file for determining text content in the graphs. Simply running ./make_graphs.py from the ./plots/ directory, so long as plots.json has not been altered (or has been altered appropriately to generate new graphs), will generate these graphs.

The graphs present are organized as follows: <y-axis>-<x-axis>-<which>.pdf.

The <y-axis> value can be one of the following:

"latencies": latency per query request (generally in seconds)
"num_bytes_recv": number of bytes sent by the client (received by the system) per query request
"num_bytes_sent": number of bytes received by the client (sent by the system) per query request
"num_bytes_total": sum of "num_bytes_sent" and "num_bytes_recv"
"num_pub_bytes_total": number of bytes required for a client to publish a document (note that, as DHTPIR does not do anything novel around publishing a document, merely around retrieving it, this graph does not show much of interest)
"throughputs": throughput of the system, in maximum number of lookups possible per second

The <x-axis> value can be one of the following:

"num_chunks_per_quorum": Number of chunks per quorum. The <which> associated with these graphs identifies what the number of quorums and nodes per quorum were in the simulation runs that make the selected graph.
"num_quorums": Number of quorums. The <which> associated with these graphs identifies what the number of chunks per quorum were in the simulation runs that make the selected graph.

README.md