Bladeren bron

Add PoPETs artifact appendix

Vecna 3 weken geleden
bovenliggende
commit
7b93980ef8
1 gewijzigde bestanden met toevoegingen van 239 en 111 verwijderingen
  1. 239 111
      README.md

+ 239 - 111
README.md

@@ -1,121 +1,249 @@
-# Troll Patrol Artifact
+# Artifact Appendix
 
-This repo contains scripts for reproducing the results in our paper.
+Paper title: **Troll Patrol: Anonymous User Reporting of Bridge Censorship**
 
-## Reproducing our results
+Requested Badge(s):
+  - [x] **Available**
+  - [x] **Functional**
+  - [x] **Reproduced**
+
+## Description
+
+This artifact accompanies the paper:
+    Vecna, Ian Goldberg. 2026. Troll Patrol: Anonymous User Reporting of Bridge Censorship. *Proceedings on Privacy Enhancing Technologies* 2026, 4 (2026).
+
+This repository contains scripts for reproducing the results from Section 6, as well as additional results in Appendices A and C.
+
+### Security/Privacy Issues and Ethical Concerns
+
+N/A
+
+## Basic Requirements
+
+### Hardware Requirements
+
+Can run on a laptop (No special hardware requirements)
+
+The results in the paper were computed using one [Intel Xeon Platinum 8380](https://www.intel.com/content/www/us/en/products/sku/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz/specifications.html) CPU (80 threads; 40 cores @ 2.30 GHz, up to 3.40 GHz).
+
+### Software Requirements
 
 Dependencies:
 - bash
-- curl
 - docker
 - git
 
-### Quick Start
+We used Ubuntu 22.04 for our host OS, but any Linux distribution should work.
+
+Docker must be installed. (On Ubuntu 22.04, for example, run `sudo apt install docker.io`.) We used Docker version 29.1.3.
+
+We use the rust:1.93.0 as the base Docker image for our experiments. Other dependencies are installed within Docker images.
+
+### Estimated Time and Storage Consumption
+
+Running the artifact should take around 1–2 hours and requires around 20 GB of disk space.
+
+## Environment
+
+### Accessibility
+
+This artifact is available at https://git-crysp.uwaterloo.ca/vvecna/troll-patrol-artifact.
+
+### Set Up the Environment
+
+Time: 2 human-minutes + 6–11 compute-minutes
+
+```bash
+git clone https://git-crysp.uwaterloo.ca/vvecna/troll-patrol-artifact.git
+cd troll-patrol-artifact
+./scripts/setup.sh
+```
+
+This will clone related repositories and build the Docker images used for the artifact.
+
+### Testing the Environment
+
+Time: 1 human-minute + 1 compute-minute
+
+After building the Docker images, you can run:
+
+```bash
+./scripts/test.sh
+```
+
+This will test that necessary dependencies are present in the Docker images and that you can compile LaTeX within the Docker, and it will run unit tests to ensure the three versions of the Lox code are working as expected.
+This script should print an error if a dependency is missing or if the .tex file failed to compile.
+Otherwise, it should output a file called "test-table.pdf" and various unit test names, followed by "ok", assuming that these unit tests pass.
+Here is an example of what the end of the script output should look like:
+
+    b5acaef2bfa8da1df47d1cbbceba01f6a5f025f8b06dc24db61957f8318352df
+    test tests::test_open_invite ... ok
+    test tests::test_trust_promotion ... ok
+    test tests::test_level0_migration ... ok
+    test tests::test_level_up ... ok
+    test tests::test_issue_invite ... ok
+    test tests::test_redeem_invite ... ok
+    test tests::test_mark_unreachable ... ok
+    test tests::test_blockage_migration ... ok
+    lox-old-test
+    a123ae8faf5ef0ef5c32da13aa97d3a737623fcac6465e8797d39764bf1eaf5f
+    test proto::check_blockage::tests::test_check_blockage ... ok
+    test proto::issue_invite::tests::test_issue_invite ... ok
+    test proto::redeem_invite::tests::test_redeem_invite ... ok
+    test proto::migration::tests::test_trust_migration ... ok
+    test proto::update_cred::tests::test_update_cred ... ok
+    test proto::level_up::tests::test_level_up ... ok
+    test proto::blockage_migration::tests::test_blockage_migration ... ok
+    test proto::open_invite::tests::test_open_invitation ... ok
+    test proto::update_invite::tests::test_update_invite ... ok
+    test proto::trust_promotion::tests::test_trust_promotion ... ok
+    lox-new-test
+    125c9737726072113f4dfbcdc61fddea17be335014aad4eabaf7b4c41a4a9e8e
+    test proto::check_blockage::tests::test_check_blockage ... ok
+    test proto::issue_invite::tests::test_issue_invite ... ok
+    test proto::redeem_invite::tests::test_redeem_invite ... ok
+    test proto::migration::tests::test_trust_migration ... ok
+    test proto::update_cred::tests::test_update_cred ... ok
+    test proto::level_up::tests::test_level_up ... ok
+    test proto::blockage_migration::tests::test_blockage_migration ... ok
+    test proto::open_invite::tests::test_open_invitation ... ok
+    test proto::update_invite::tests::test_update_invite ... ok
+    test proto::trust_promotion::tests::test_trust_promotion ... ok
+    test proto::report_submit::tests::test_report_protocols ... ok
+    troll-patrol-test
+    Everything seems to be set up correctly!
+
+(The hexadecimal strings will be different on each run.)
+
+## Artifact Evaluation
+
+### Main Results and Claims
+
+The artifact produces Tables 2 and 3 (from Section 6) and Table 5 (from Appendix C), showing benchmarks for the current development branch of Lox (Table 2), our fork of this development branch that includes reporting protocols (Table 3), and the original Lox implementation (Table 5 from Appendix C).
+
+Tables 2 and 3 relate to Main Results 1 and 2 (listed below).
+
+The artifact also reproduces the results in Appendix A of our paper, which are described below as Main Results 3 and 4.
+
+#### Main Result 1: New Protocols are Comparable to Existing Protocols
+
+Our paper claims that our new protocols for reporting are comparable to existing Lox protocols in terms of communication and computation.
+This is found by looking at Table 3 (benchmarks for our modified code, which adds reporting protocols).
+Our new protocols (Report Submit, Report Status, and Report Resolve) have times and sizes comparable to those of the other protocols listed in the table.
+This claim is reproduced by Experiment 1, which produces Tables 2 and 3.
+
+#### Main Result 2: Our Modifications Have a Small Impact on Existing Protocols
+
+Our paper claims that our modifications to Lox result in only a small increase in communication and computation costs for the existing Lox protocols, with request sizes increasing by around 100-200 bytes.
+This is found by comparing the results for protocols listed in both Table 2 (benchmarks for the development branch of Lox from which we forked) and Table 3 (benchmarks for our modified code, which adds reporting protocols).
+The protocols listed in both tables should have only slightly higher values for computation times in Table 3 than in Table 2.
+The request and response sizes produced by running this artifact should be identical to those provided in the paper.
+Request sizes in Table 3 should be at most 192 bytes greater than the sizes in Table 2 for the same protocols.
+(The specific differences are 0, 96, 128, and 192.)
+Response sizes in Tables 2 and 3 should be identical.
+This claim is reproduced by Experiment 1, which produces Tables 2 and 3.
+
+#### Main Result 3: Algorithms 1–3 are Poor Classifiers of Censorship in the Belarus Case Study
+
+Appendix A of our paper specifies three algorithms (one of which comes from prior work by Loesing) for detecting bridge censorship based on daily connection counts.
+These algorithms all failed spectacularly when trying to detect censorship of email-distributed obfs4 bridges in Belarus starting in late February 2021.
+This is shown in Table 4, in which all but one configuration had 0 true positives.
+The one instance with any true positives was Algorithm 3, with d=8.
+In this case, we observed 5 true positives and 995 false positives.
+This claim is reproduced by Experiment 2, which produces Table 4 from Appendix A.
+
+#### Main Result 4: Email-Distributed obfs4 Bridges Were Not Popular in Belarus
+
+Appendix A of our paper finds that email-distributed obfs4 bridges had very low connection counts in Belarus, even prior to the censorship event.
+Specifically, we find the following results.
+Of the 93 email-distributed obfs4 bridges...
+
+- 43 received more than 0 connections on a single day.
+- 13 received more than 8 connections on a single day.
+- 2 received more than 16 connections on a single day.
+- 0 received more than 24 connections on a single day.
+
+Only 5 of the 93 email-distributed obfs4 bridges had a mean connection count more than 1 standard deviation away from 0.
+Of these 5 bridges, the greatest distance between 0 and the mean was only about 1.6 standard deviations.
+
+### Experiments
+
+Our entire artifact can be run with the `./run.sh` script, which accepts the following (optional) arguments:
+
+    -n NUM_PERFORMANCE_CORES    (Experiment 1) Use only NUM_PERFORMANCE_CORES threads.
+    -N PERFORMANCE_CORE_RANGE   (Experiment 1) Use the threads specified in PERFORMANCE_CORE_RANGE. -n must also be used.
+    -s                          (Experiment 2) Run ./scripts/belarus.sh sequentially, instead of in parallel.
+    --fast                      (Experiment 2) Start with pre-processed data.
+
+This process should take around 1–2 hours and requires 20 GB of free disk space.
+(`-s` or `--fast` will change the time and disk space requirements. See Experiment 2 for details.)
+
+#### Experiment 1: Lox Benchmarking
+
+- Time: 1 human-minute + 5–20 compute-minutes
+
+```bash
+./scripts/generate-lox-results.sh [-n NUM_PERFORMANCE_CORES] [-N PERFORMANCE_CORE_RANGE] && \
+./scripts/process-lox-results.sh
+```
+
+By default, the script will use all available threads (up to the number of processes to be run, which is 16), but you can use `-n NUM_PERFORMANCE_CORES` to restrict it to use only `NUM_PERFORMANCE_CORES` threads.
+
+By default, if the `-n` option is used, the script will use the first `NUM_PERFORMANCE_CORES` threads.
+If these are not the ones you want to use, you can also specify `-N PERFORMANCE_CORE_RANGE` to indicate the specific threads to use.
+**If you use `-N` to indicate a range, please also specify `-n` to indicate the number of values in that range.**
+The script does not compute this for you.
+
+These scripts perform benchmarks for:
+1. the development branch of Lox from which we forked (Table 2, in Section 6)
+2. our modified code that adds reporting protocols (Table 3, in Section 6)
+3. the original Lox code (Table 5, in Appendix C)
+
+These scripts create .tex files for these tables and compile them to PDFs.
+These .pdf files (as well as the corresponding .tex files) are copied to the root directory of the artifact.
+The results can be found as table-2-results.pdf (Table 2), table-3-results.pdf (Table 3), and appendix-c-results.pdf (Table 5).
+
+table-2-results.pdf and table-3-results.pdf can be used to verify Main Results 1 and 2.
+
+#### Experiment 2: Belarus Case Study
+
+- Time: 1 human-minute + 1 compute-hour
+- Storage: 20 GB
+
+```bash
+./scripts/belarus.sh [-s|--fast]
+```
+
+The recommended way to run this script is without any arguments.
+It will download 8 bridge extra-info archives from the Tor Project for the period of July 2020–April 2021, extract and process this data to learn daily connection counts from Belarus for each bridge, then use Algorithms 1–3 to try to detect which bridges were blocked starting in late February 2021 and which were not.
+This takes about 1 hour and requires 20 GB of storage.
+
+If the `-s` option is used, then only one of the extra-info archives will be extracted at a time (instead of them being processed in parallel) and deleted after processing.
+This will take much longer (around 8–10 hours instead of 1 hour), but requires only a few GB of storage.
+
+If the `--fast` option is used, then the extra-info archives will not be downloaded from the Tor Project.
+Instead, the script starts with a 6.7 MB archive (87 MB uncompressed) containing only the information we need.
+It yields identical results and takes only 1-2 minutes; however, it requires trusting that the pre-processed data we provide was extracted correctly.
+
+When processing the data downloaded from the Tor Project (unless the `--fast` option is used), the script will output many lines like this:
+    Error for 
+        fingerprint: 
+        date:        2461184
+        count:       0
+**Do not be alarmed by these errors** or if there is a long period with no output after the errors appear.
+The errors appear because there are some records that are malformed (missing bridge fingerprints) that cannot be parsed and will simply be ignored.
+
+This experiment outputs appendix-a-results.pdf (and a corresponding .tex file), containing Table 4 (from Appendix A of our paper).
+As this process begins with downloading existing data, and the algorithms used are deterministic, the contents of the output Table 4 should be identical to those in our paper.
+This table can be used to verify Main Result 3.
+
+This experiment also outputs a text file called "appendix-a-results", which contains various statistics about connection counts to the bridges we evaluate, including those described in Appendix A.2.
+These can be used to verify Main Result 4.
+
+## Limitations
+
+N/A
 
-To reproduce our results:
+## Notes on Reusability
 
-1. Install the dependencies listed above.
-2. `git clone -b artifact https://git-crysp.uwaterloo.ca/vvecna/lox-troll-patrol-extension`
-3. `cd lox-troll-patrol-extension`
-4. `./run.sh`
-
-See below for additional options. In particular, this script is expected
-to take about 1-2 hours and requires about 20 GB of free disk space. If
-either of these requirements is too high, see the `-s` and `--fast`
-options below. If your CPU has some cores that are more performant than
-others, see the `-n` and `-N` options below.
-
-### Options
-
-Full usage: `./run.sh [-s|--fast] [-n NUM_PERFORMANCE_CORES] [-N PERFORMANCE_CORE_RANGE]`
-
-`./run.sh` actually runs four different scripts:
-
-1. `./scripts/setup.sh` sets up the docker images.
-2. `./scripts/belarus.sh` produces the results from the Belarus 2020-2021 case study (Section 3 and Appendix A of our paper).
-3. `./scripts/generate-lox-results.sh` runs benchmarks for the original version of Lox, a current development branch of Lox, and our fork of Lox containing additional report-related protocols.
-4. `./scripts/process-lox-results.sh` processes the results from the previous step and outputs the tables from Section 6 and Appendix B of our paper.
-
-If the `-s` or `--fast` option (these are mutually exclusive) is specified, it is passed to `./scripts/belarus.sh`.
-
-`./scripts/belarus.sh` downloads and processes all extra-info records
-for all bridges from 2020-07 through 2021-04 from the Tor Project's
-CollecTor service. This is about 4 million files (around 20 GB
-uncompressed). By default, data from all 10 months is processed in
-parallel, which takes under an hour but requires extracting and storing
-all 20 GB of data at once.
-
-If you cannot afford to fill up 20 GB of disk space at once, you can use
-the `-s` option, which performs these 10 steps sequentially. This will
-take about 10 times as long but will only use a few GB of space at a
-time.
-
-If you cannot afford the time and space requirements, you can use the
-`--fast` option, which does not download the extra-info archives from
-the Tor Project. Instead, it starts with a 6.7 MB archive (87 MB
-uncompressed) containing only the information we need. It yields
-identical results and takes only 1-2 minutes; however, it requires
-trusting that the pre-processed data we provide was extracted correctly.
-
-If the `-n` and/or `-N` options are specified, they are passed to `./scripts/generate-lox-results.sh`.
-
-`./scripts/generate-lox-results.sh` runs benchmarks, with each process
-isolated to a single thread. Ideally, the threads used should all be
-equally performant, to ensure that all results computed can be
-reasonably compared to each other.
-
-By default, the script will use all available threads (up to the number
-of processes to be run, which is 16), but you can use
-`-n NUM_PERFORMANCE_CORES` to restrict it to only use
-`NUM_PERFORMANCE_CORES` threads.
-
-By default, if the `-n` option is used, the script will use the first
-`NUM_PERFORMANCE_CORES` threads. If these are not the ones you want to
-use, you can also specify `-N PERFORMANCE_CORE_RANGE` to indicate the
-specific threads to use. **If you use `-N` to indicate a range, please
-also specify `-n` to indicate the number of values in that range.** The
-script does not compute this for you.
-
-Examples:
-
-- `./run.sh --fast -n 4`: Use pre-processed data for the `./scripts/belarus.sh` step, and use only the first 4 threads (i.e., 0-3) for `./scripts/generate-lox-results.sh`.
-- `./run.sh -n 5 -N 1-2,4-6`: Use only threads 1, 2, 4, 5, and 6 for `./scripts/generate-lox-results.sh`.
-
-## Results
-
-The results are all output to the top-level directory of the project.
-After running `./run.sh`:
-- The results from Section 3 can be found in the **section-3-results** file.
-- Table 4 (found in Appendix A of the paper) can be found in **appendix-a-results.pdf**
-- Tables 2 and 3 (from Section 6) can be found in **table-2-results.pdf** and **table-3-results.pdf**.
-- Table 5 (in Appendix B) can be found in **appendix-b-results.pdf**.
-
-## Recovering from Errors
-
-Hopefully the script will just run everything without issue. If there is
-an issue at a step, try deleting the files/directories related to that
-step, stopping and removing any docker containers related to that step,
-and running the above script again. The script will not attempt to
-re-run steps that have files. If a step worked, just leave its files
-there, and you shouldn't have to redo that step.
-
-If you want to run the commands for individual steps, you can do that.
-See the Options section above.
-
-## Time to Run
-
-On my laptop ([13th Gen Intel Core i7-1360P](https://www.intel.com/content/www/us/en/products/sku/232155/intel-core-i71360p-processor-18m-cache-up-to-5-00-ghz/specifications.html); 16 threads; 4 performance cores, up to 5 GHz):
-- `./scripts/setup.sh`: 10m 35s
-- `./scripts/belarus.sh`: 51m 2s
-- `./scripts/belarus.sh -s`: 9h 15m 16s
-- `./scripts/belarus.sh --fast`: 1m 4s
-- `./scripts/generate-lox-results.sh -n 4`: 18m 39s
-- `./scripts/process-lox-results.sh`: 18s
-
-On the device used for the paper ([Intel Xeon Platinum 8380](https://www.intel.com/content/www/us/en/products/sku/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz/specifications.html); 80 threads; 40 cores @ 2.30 GHz, up to 3.40 GHz):
-- `./scripts/setup.sh`: 6m 47s
-- `./scripts/belarus.sh`: 42m 32s
-- `./scripts/belarus.sh -s`: 7h 57m 10s
-- `./scripts/belarus.sh --fast`: 1m 0s
-- `./scripts/generate-lox-results.sh`: 5m 56s
-- `./scripts/process-lox-results.sh`: 17s
+N/A