# sparta-experiments dockerization

This repository is a dockerization of the [ucsc-anonymity/sparta-experiments](https://github.com/ucsc-anonymity/sparta-experiments) repository.  This dockerization can be used to reproduce the Sparta-SB datapoints in Figures 7 and 8 of the TEEMS paper:

Sajin Sasy, Aaron Johnson, and Ian Goldberg.  [TEEMS: A Trusted Execution
Environment based Metadata-protected Messaging
System](https://cs.uwaterloo.ca/~iang/pubs/teems-popets25.pdf).
Proceedings on Privacy Enhancing Technologies, Vol. 2025, No. 4, July
2025.

This dockerization by Ian Goldberg, iang@uwaterloo.ca.

## Hardware requirements

You will need a server with an Intel Xeon CPU that supports SGX2, and
SGX2 must be enabled in the BIOS.  To fully reproduce the graphs in the
paper, you will need 72 CPU cores (not hyperthreads), but if you have
fewer, your experiments will just run more slowly (Figure 7) or only
partially (Figure 8).  We used a machine with two 40-core Intel Xeon 8380
CPUs running at 2.3 GHz to generate the figures, if you aim to compare
your results to ours.

## Software requirements

The server should run Linux with kernel at least 5.11.  We used Ubuntu
22.04.

SGX2 must be enabled on your machine, and so you should see device files
`/dev/sgx/enclave` and `/dev/sgx/provision`.

You will need `docker`.  On Ubuntu, for example: `apt install docker.io`
and be sure to run all the experiments as a user with docker permissions
(in the `docker` group).

You will need the `aesmd` service.  If you have the file
`/var/run/aesmd/aesm.socket`, then all is well.  You already have the
`aesmd` service running on your machine.

If not, run:

```bash
sudo mkdir -p -m 0755 /var/run/aesmd
docker run -d --rm --device /dev/sgx/enclave --device /dev/sgx/provision \
    -v /var/run/aesmd:/var/run/aesmd --name aesmd fortanix/aesmd
```

That will start the `aesmd` service in a docker, and you should then see
the `/var/run/aesmd/aesm.socket` file existing.

If you started `aesmd` with this docker method, then when you're done
with the experiments:

```bash
docker stop aesmd
```

You will need `python3` with `numpy` (on the _host_ machine) to run the
log parser scripts.

## Quickstart

Once you have SGX2 and `aesmd` set up, the following will build sparta
and its dependencies, and run the experiments:

```bash
git clone https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker
cd sparta-experiments-docker/docker
./build-docker
./run-clientscale-experiments | tee /dev/stderr | ./parse-clientscale-logs
./run-corescale-experiments | tee /dev/stderr | ./parse-corescale-logs
```

Running the clientscale experiments (Figure 7) should take about 4
minutes; runinng the corescale experiments (Figure 8) should take about
13 minutes.  (These runtimes are on a 2.3 GHz CPU, so your runtimes may
vary.)

See below for optional arguments you can pass to the experiment scripts.

## Details

### Changes to the upstream repository

Other than the dockerization, this repository makes the following
changes to [the original ucsc-anonymity/sparta-experiments
repository](https://github.com/ucsc-anonymity/sparta-experiments):

  - Added functions for sparta to return the current size of its message
    store ([commit b8d48d7](https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker/commit/b8d48d7a77abd746d581a7b429699c2b965e34fc)).
    As we note in Section A.3 of the TEEMS paper, Sparta's message store
    grows whenever any user sends a message, but _does not shrink_ when
    messages are retrieved.  These functions enable demonstration of
    this behaviour.

  - Increased the maximum number of threads and heap size for the SGX
    enclave ([commit
    0b32698](https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker/commit/0b32698ca2885dc5dd5d0c3e05daac248e18ec4c)
    and [commit
    afc33a7](https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker/commit/afc33a780823ab4e785db77322661135902c36b9)).
    When we experimented with multiple rounds of message sending, the
    message store grew larger than the configured maximum enclave heap
    size.  Our machine also had more cores than the maximum number of
    configured threads.

  - The original code did a single batch send, followed by a
    configurable number of batch fetches, and timed only the fetches.
    We slightly rearrange the code to do a configurable number of
    (batch send + batch fetch) rounds ([commit
    d11e4fc](https://git-crysp.uwaterloo.ca/iang/sparta-experiments-docker/commit/d11e4fc5a617829a3bf1b125f7b1caa7ae105cf8)).
    For each round, we report:

      - The number of messages sent in the batch (the number of users)
      - How many batches have been sent so far (the round number)
      - The size of the message store after this iteration
      - The time to send the batch
      - The time to fetch the batch
      - The total time to send and fetch the batch (the sum of the
        previous two)

    All times are in seconds.

### The clientscale experiments

The `./run-clientscale-experiments` script will generate the data for
Figure 7 in our paper, which holds the number of cores fixed at 16, and
varies the number of messages per batch from 32768 to 1048576.  (If you
have fewer than 16 cores, the script will just use however many you
have, but will run more slowly, of course.)

This script can take two optional parameters:

  - `niters` (defaults to 1): the number of times to run the experiment.
    The variance is quite small, so this doesn't need to be large.  Even
    1 is probably fine if you're just checking that your results are
    similar to ours.  (Your CPU speed will of course be a factor if you
    compare your raw timings to ours.)

  - `nrounds` (defaults to 1): the number of (send + fetch) rounds to do
    per batch size per experiment.  To give maximal benefit to the
    Sparta-SB data points in our Figure 7, we only used 1 round (just
    like the original code).  However, by setting this higher, you can
    see the effect described above, where larger numbers of sending
    rounds cause the message store to get larger, and the send and
    fetch operations to get slower.

The output of the `./run-clientscale-experiments` script can be parsed
by the `./parse-clientscale-logs` script, which will output a CSV with
columns:

  - Number of users
  - Number of batches sent (sending round)
  - Time to send that batch (mean and stddev over the `niters` experiment runs)
  - Time to fetch that batch (mean and stddev over the `niters` experiment runs)
  - Total time to send and fetch that batch (mean and stddev over the `niters` experiment runs)

These are the values (with sending round equal to 1) plotted in Figure 7
in the TEEMS paper.  Our exact results (and what is plotted in the
figure) were:

```
users,batches,send_mean,send_stddev,fetch_mean,fetch_stddev,tot_mean,tot_stddev
32768,1,0.676,0.020,1.182,0.016,1.858,0.028
65536,1,1.301,0.014,2.190,0.013,3.491,0.017
131072,1,2.488,0.033,4.049,0.052,6.537,0.055
262144,1,4.959,0.042,7.856,0.071,12.816,0.085
524288,1,9.742,0.102,15.284,0.144,25.027,0.167
1048576,1,18.448,0.340,29.074,0.367,47.522,0.595
```

### The corescale experiments

The `./run-corescale-experiments` script will generate the data for
Figure 8 in our paper, which holds the batch size fixed at 1048576,
while varying the number of cores from 4 to 72.  If you have fewer than
72 cores, the experiment will only gather data for core counts you have
available.

This script can take two optional parameters:

  - `niters` (defaults to 1): the number of times to run the experiment.
    The variance is quite small, so this doesn't need to be large.  Even
    1 is probably fine if you're just checking that your results are
    similar to ours.  (Your CPU speed will of course be a factor if you
    compare your raw timings to ours.)

  - `sends` (defaults to 1048576): the number of messages in each batch

The output of the `./run-corescale-experiments` script can be parsed
by the `./parse-corescale-logs` script, which will output a CSV with
the same columns as `./parse-clientscale-logs` above, except an
additional column just before the timings:

  - The number of cores

These are the values plotted in Figure 8 in the TEEMS paper.  Our exact
results (and what is plotted in the figure) were:

```
users,batches,ncores,send_mean,send_stddev,fetch_mean,fetch_stddev,tot_mean,tot_stddev
1048576,1,4,25.666,0.367,51.238,0.808,76.904,0.772
1048576,1,6,25.803,0.396,51.315,0.767,77.117,0.927
1048576,1,8,26.084,0.383,52.118,0.415,78.202,0.364
1048576,1,16,18.273,0.190,28.953,0.219,47.226,0.235
1048576,1,24,13.194,0.221,18.997,0.377,32.191,0.384
1048576,1,32,13.434,0.404,18.312,0.189,31.747,0.518
1048576,1,36,13.375,0.259,18.619,0.410,31.995,0.473
1048576,1,40,13.435,0.275,18.482,0.414,31.917,0.540
1048576,1,44,12.984,0.290,18.399,0.374,31.383,0.569
1048576,1,48,10.382,0.401,14.264,0.322,24.646,0.471
1048576,1,64,10.432,0.382,14.579,0.263,25.011,0.590
1048576,1,72,10.820,0.286,15.066,0.296,25.886,0.466
```