Quellcode durchsuchen

Draft of artifact evaluation file

Ian Goldberg vor 5 Monaten
Ursprung
Commit
162cb3b441
1 geänderte Dateien mit 269 neuen und 0 gelöschten Zeilen
  1. 269 0
      ARTIFACT-EVALUATION.md

+ 269 - 0
ARTIFACT-EVALUATION.md

@@ -0,0 +1,269 @@
+# Artifact Appendix
+
+Paper title: TEEMS: A Trusted Execution Environment based Metadata-protected Messaging System
+
+Artifacts HotCRP Id: 8
+
+Requested Badge: Reproduced
+
+## Description
+
+This repository contains the source code for TEEMS, as well as the dockerization and experimentation scripts we used to generate the data files that are plotted in Figure 7 (varying the number of clients) and Figure 8 (varying the number of server cores) in our paper.
+
+This repository only contains the TEEMS code; the data for the comparator scheme Sparta-SB (also plotted in those figures) was obtained by running [our dockerization](https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker) of [the original Sparta code](https://github.com/ucsc-anonymity/sparta-experiments/).
+
+### Security/Privacy Issues and Ethical Concerns (All badges)
+N/A
+
+## Basic Requirements
+
+### Hardware Requirements
+
+TEEMS is based on Intel's SGX trusted execution environment, version 2 (SGX2), so you will need a server with one or more Intel Xeon CPUs that support SGX2.  In addition to CPU support, you may have to enable SGX in the machine's BIOS, if it is not already enabled.  The things to look for and turn on in the BIOS are:
+
+  - SGX support
+  - Flexible Launch Control (FLC)
+  - Total Memory Encryption (TME)
+  - Enclave Page Cache (EPC)
+
+If your BIOS gives you a choice of how much EPC to allocate, choose a value **at least 5 GiB** if you want to run the largest experiments we report in the figures in the paper.
+
+### Hardware requirements to run all of the experiments
+
+In order to run all of the experiments to replicate Figures 7 and 8 in the paper, your server will need at least:
+
+  - 80 CPU cores (not hyperthreads)
+  - 20 GiB of available RAM
+  - 5 GiB of available EPC
+
+We used a machine with two Intel Xeon 8380 CPUs (40 cores each), running at 2.3 GHz.
+
+To see how much EPC you have available, you can either:
+
+  - Install the `cpuid` program (`apt install cpuid`), and run `./epc_probe.py`
+  - Build the TEEMS docker (see [below](#set-up-the-environment)), and then run `docker/epc-probe`
+
+### Minimum hardware requirements to run TEEMS at all
+
+If your server does not meet the above specs, you may still be able to run TEEMS, to get a feel for it, but the results will of course not be directly comparable to those in the figures.
+
+If you have fewer than 80 cores, you can set the environment variable `OVERLOAD_CORES=1` to tell the experimentation scripts to run multiple server and simulated client threads on the same core.
+
+If you have less than 20 GiB of available RAM and/or less than 5 GiB of available EPC, you can set the environment variable `SHRINK_TO_MEM=1` to tell the experimentation scripts to run smaller experiments than those reported in the paper, but which will fit in your machine's configuration.  The sizes of the experiments to run will be autodetected based on your available RAM and EPC.  Even with that setting, however, you will _minimally_ require more than 0.9 GiB of EPC and 4.2 GiB of RAM.
+
+
+### Software Requirements
+
+The recommended OS is Ubuntu 22.04, but any Linux with a kernel at least 5.11 should work.  If your BIOS is configured properly, and you have the required SGX2 support, the device files `/dev/sgx_enclave` and `/dev/sgx_provision` should both exist.
+
+You will need docker installed; on Ubuntu 22.04, for example: `apt install docker.io`
+
+The TEEMS client simulator will open a socket to each client's ingestion server and storage server.  It will therefore require a number of open sockets that is twice the number of simulated users, plus a handful of additional file descriptors for logfiles, etc.  Our largest experiments simulate 1048576 (2<sup>20</sup>) users, and so the client simulator will use 2097162 file descriptors.  You will need to configure your OS to allow a process to have that many file descriptors.  You can see the maximum number currently configured with:
+
+`sysctl fs.nr_open`
+
+and you can set the value to 2097162 with:
+
+`sysctl fs.nr_open=2097162`
+
+In addition, **if** you attempt to run TEEMS on your bare machine instead of in a docker (running in a docker is strongly recommended), you will also need to similarly set:
+
+`sysctl net.nf_conntrack_max=2097162`
+
+Again, setting `net.nf_conntrack_max` is not necessary if you run TEEMS in a docker, as outlined in the instructions below.
+
+### Estimated Time and Storage Consumption
+
+On our system, building the docker image took about 30 minutes and 4 GB of disk space.  Running the experiments took about 3 hours and 20 MB of disk space.
+
+## Environment
+
+### Accessibility
+
+This artifact is hosted at the [`popets-artifact`
+tag](https://git-crysp.uwaterloo.ca/iang/teems/src/popets-artifact) of the
+[https://git-crysp.uwaterloo.ca/iang/teems](https://git-crysp.uwaterloo.ca/iang/teems)
+git repository.
+
+### Set up the environment
+
+Clone the repo and check out the `popets-artifact` tag:
+
+```bash
+git clone https://git-crysp.uwaterloo.ca/iang/teems
+cd teems
+git checkout popets-artifact
+```
+
+To build the TEEMS docker, just run:
+
+`docker/build-docker`
+
+This should take about half an hour, depending on your hardware and network speeds.
+
+
+
+### Testing the Environment
+
+To ensure everything is set up properly (hardware and software), run the short "kick the tires" test:
+
+`docker/short-test`
+
+This should take 1 to 2 minutes.  Set the `OVERLOAD_CORES` environment variable, as described [above](#minimum-hardware-requirements-to-run-teems-at-all), if your server has fewer than 5 physical cores; for example:
+
+`OVERLOAD_CORES=1 docker/short-test`
+
+The output should end with something like the following:
+
+```
+=== Short test output ===
+
+N,M,T,B,E,epoch_mean,epoch_stddev,epoch_max,wn_mean,wn_stddev,wn_max,bytes_mean,bytes_stddev,bytes_max
+128,4,1,256,2,0.181,0.034,0.219,0.005,0.001,0.005,30287.250,940.700,31063.000
+```
+
+The output fields are:
+
+  - `N`: the number of simulated TEEMS clients
+  - `M`: the number of TEEMS server processes
+  - `T`: the number of threads per server process
+  - `B`: the number of bytes per TEEMS message
+  - `E`: the number of completed epochs
+  - `epoch_{mean,stddev,max}`: the mean, stddev, and max of the epoch times (in seconds)
+  - `wn_{mean,stddev,max}`: the mean, stddev, and max of the time to precompute the Waksman networks needed for the next epoch (in seconds)
+  - `bytes_{mean,stddev,max}`: the mean, stddev, and max of the total number of bytes transmitted from a given server process to all other server processes
+
+
+## Artifact Evaluation
+
+### Main Results and Claims
+
+#### Main Result 1: Latency and throughput with fixed servers (client scaling experiments, Figure 7)
+
+Figure 7 demonstrates the epoch times (minimum latency) achievable by TEEMS (and the comparator system, Sparta) for a fixed server configuration of 16 cores, as the number of clients (each sending one message per epoch) scales from 2<sup>15</sup> to 2<sup>20</sup>.  Maximum throughput in this setting is simply the number of messages sent per epoch divided by the minimum epoch time, so the throughput is inversely proportional to latency for a fixed number of clients.
+
+This figure shows that (for both TEEMS' ID and token channels):
+
+  - TEEMS' latency scales approximately linearly with the number of clients (save for small overheads that are more visible for low numbers of clients that yield very low sub-second epoch times).
+  - For 2<sup>20</sup> clients, TEEMS' latency is more than one order of magnitude lower than that of Sparta (total of send and fetch), resulting in a throughput more than one order of magnitude higher.
+
+Note that TEEMS' latencies _do_ include network communication latencies in addition to compute latencies, while Sparta's _do not_.  This gives an advantage to Sparta in the comparison.  In addition, TEEMS is transmitting 256-byte messages, while Sparta is transmitting only 128-byte messages, again giving an advantage to Sparta.
+
+#### Main Result 2: Horizontal scalability (core scaling experiments, Figure 8)
+
+Figure 8 demonstrates the epoch times (minimum latency) achievable by TEEMS and Sparta for a fixed number of clients (2<sup>20</sup>), each sending one message per epoch, while the number of server cores scales from 4 to 72.  (The experiment reserves 8 cores out of 80 for executing the clients.)  Here again, throughput is inversely proportional to latency for a fixed number of clients.
+
+This figure shows that (for both TEEMS' ID and token channels):
+
+  - TEEMS' latency decreases consistently as the number of server cores increases, until the small overheads become dominant for sub-second epoch times. (Note that if you have fewer than 80 cores on your experimental machine, you will see a "bottoming out" of the minimum epoch times when you exhaust your available cores.)
+  - Sparta's latency (total of send and fetch) starts out (for 4 cores) more than one order of magnitude higher than that of TEEMS (so a throughput more than one order of magnitude lower).  As the number of cores grows, Sparta's latency decreases more slowly than TEEMS', yielding even larger ratios for increasing core counts.
+
+This experiment gives the same communication latency and message size advantages to Sparta as in the experiment above.
+
+### Experiments
+
+#### TEEMS experiments
+
+To run all the experiments needed to generate the TEEMS data plotted in Figures 7 and 8, just run:
+
+`docker/repro`
+
+As [above](#minimum-hardware-requirements-to-run-teems-at-all), if you need to set the environment variables `OVERLOAD_CORES` (if you have fewer than 80 physical cores) and/or `SHRINK_TO_MEM` (if you have less then 20 GiB of available RAM and/or less than 5 GiB of EPC), do it here.  For example:
+
+`OVERLOAD_CORES=1 SHRINK_TO_MEM=1 docker/repro`
+
+The `docker/repro` script should take about 2 to 3 hours to run (plus a bit if `OVERLOAD_CORES` is set to 1, because the CPU cores will be overloaded, and minus a bit if `SHRINK_TO_MEM` is set to 1, because the experiments being run will be smaller).
+
+The end of the output should look like the following.  (This is in fact the exact output we obtained, and what is plotted in the figures.)
+
+```
+*** Adding latency corresponding to 13 Gbps bandwidth between servers ***
+
+=== Figure 7 ID channel ===
+
+N,M,T,B,epoch_mean,epoch_stddev
+32768,4,4,256,0.455,0.029
+65536,4,4,256,0.612,0.034
+131072,4,4,256,0.842,0.036
+262144,4,4,256,1.220,0.033
+524288,4,4,256,2.120,0.030
+1048576,4,4,256,3.823,0.032
+
+=== Figure 7 Token channel ===
+
+N,M,T,B,epoch_mean,epoch_stddev
+32768,4,4,256,0.260,0.018
+65536,4,4,256,0.360,0.020
+131072,4,4,256,0.526,0.026
+262144,4,4,256,0.841,0.030
+524288,4,4,256,1.473,0.029
+1048576,4,4,256,2.622,0.039
+
+=== Figure 8 ID channel ===
+
+N,M,T,B,epoch_mean,epoch_stddev
+1048576,4,1,256,5.078,0.167
+1048576,6,1,256,3.829,0.236
+1048576,8,1,256,3.070,0.102
+1048576,16,1,256,1.761,0.051
+1048576,20,1,256,1.500,0.049
+1048576,24,1,256,1.315,0.045
+1048576,32,1,256,1.131,0.031
+1048576,36,1,256,1.056,0.036
+1048576,40,1,256,0.981,0.027
+1048576,44,1,256,0.958,0.026
+1048576,48,1,256,0.864,0.028
+1048576,64,1,256,0.800,0.038
+1048576,72,1,256,0.804,0.032
+
+=== Figure 8 Token channel ===
+
+N,M,T,B,epoch_mean,epoch_stddev
+1048576,4,1,256,4.375,0.348
+1048576,6,1,256,3.091,0.198
+1048576,8,1,256,2.389,0.085
+1048576,16,1,256,1.336,0.035
+1048576,20,1,256,1.100,0.036
+1048576,24,1,256,0.957,0.031
+1048576,32,1,256,0.798,0.034
+1048576,36,1,256,0.724,0.026
+1048576,40,1,256,0.661,0.026
+1048576,44,1,256,0.650,0.026
+1048576,48,1,256,0.574,0.022
+1048576,64,1,256,0.537,0.035
+1048576,72,1,256,0.529,0.026
+```
+
+In this output, the fields have the same meanings as for the short test above, but the cost of transmitting `bytes_max` bytes at a network bandwidth of 13 Gbps (as reported in the paper) has been added to the epoch times.
+
+In addition, the files `id-channel.csv` and `token-channel.csv` will be left in the `docker` directory, containing all of the `{epoch,wn,bytes}_{mean,stddev,max}` statistics.
+
+**Note**: if you just want to regenerate the above output (with the network cost added to the epoch times, and sorted into figures and data lines) from existing `id-channel.csv` and `token-channel.csv` files, **without** re-running all of the experiments, you can pass the `-n` flag to `docker/repro`:
+
+`docker/repro -n`
+
+#### Sparta experiments
+
+As mentioned above, to get the data for the comparator scheme Sparta-SB, you can see [the README for our dockerization](https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker/src/main/docker/README.md) of [the original Sparta code](https://github.com/ucsc-anonymity/sparta-experiments/).
+
+Building the sparta docker should take about 3.5 GB of disk space and 4 minutes (on our machine).  Running the Sparta experiments should take about 20 minutes, and negligible disk space.
+
+The tl;dr:
+
+  - Ensure you have the `aesmd` service running on your machine.  You should have a file called `/var/run/aesmd/aesm.socket` on your system.  See [the README](https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker/src/main/docker/README.md#software-requirements) for instructions if you don't.
+
+  - Run:
+
+```bash
+git clone https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker
+cd sparta-experiments-docker/docker
+./build-docker
+./run-clientscale-experiments | tee /dev/stderr | ./parse-clientscale-logs
+./run-corescale-experiments | tee /dev/stderr | ./parse-corescale-logs
+```
+
+## Limitations
+N/A
+
+## Notes on Reusability
+N/A