Нет описания

Vecna b8af919e2c Document that -n and -N are not both needed 3 недель назад
conf b384d8d283 Fix non-interactive docker setup 3 месяцев назад
parsing-results e7fdaaa271 Align tables on decimal points 3 недель назад
scripts 6e26117cae Specify command to remove docker images 3 недель назад
Dockerfile 509dc31483 Add dependency for siunitx, test LaTeX compilation 3 недель назад
LICENSE 60f23e2549 Add license 3 месяцев назад
README.md b8af919e2c Document that -n and -N are not both needed 3 недель назад
run.sh 5a0141b3d9 Compute ncpus if not specified 3 недель назад

README.md

Artifact Appendix

Paper title: Troll Patrol: Anonymous User Reporting of Bridge Censorship

Requested Badge(s):

  • Available
  • Functional
  • Reproduced

Description

This artifact accompanies the paper:

  • Vecna, Ian Goldberg. 2026. Troll Patrol: Anonymous User Reporting of Bridge Censorship. Proceedings on Privacy Enhancing Technologies 2026, 4 (2026).

This repository contains scripts for reproducing the results from Section 6, as well as additional results in Appendices A and C.

Security/Privacy Issues and Ethical Concerns

N/A

Basic Requirements

Hardware Requirements

Can run on a laptop (No special hardware requirements)

The results in the paper were computed using one Intel Xeon Platinum 8380 CPU (80 threads; 40 cores @ 2.30 GHz, up to 3.40 GHz).

Software Requirements

Dependencies:

  • bash
  • docker
  • git

We used Ubuntu 22.04 for our host OS, but any Linux distribution should work.

Docker must be installed. (On Ubuntu 22.04, for example, run sudo apt install docker.io.) We used Docker version 29.1.3.

We use the rust:1.93.0 as the base Docker image for our experiments. Other dependencies are installed within Docker images.

Estimated Time and Storage Consumption

Running the artifact should take around 1–2 hours and requires around 20 GB of disk space.

Environment

Accessibility

This artifact is available at https://git-crysp.uwaterloo.ca/vvecna/troll-patrol-artifact.

Set Up the Environment

Time: 2 human-minutes + 6–11 compute-minutes

git clone https://git-crysp.uwaterloo.ca/vvecna/troll-patrol-artifact.git
cd troll-patrol-artifact
./scripts/setup.sh

This will clone related repositories and build the Docker images used for the artifact.

Testing the Environment

Time: 1 human-minute + 1 compute-minute

After building the Docker images, you can run:

./scripts/test.sh

This will test that necessary dependencies are present in the Docker images and that you can compile LaTeX within the Docker, and it will run unit tests to ensure the three versions of the Lox code are working as expected. This script should print an error if a dependency is missing or if the .tex file failed to compile. Otherwise, it should output a file called "test-table.pdf" and various unit test names, followed by "ok", assuming that these unit tests pass. Here is an example of what the end of the script output should look like:

b5acaef2bfa8da1df47d1cbbceba01f6a5f025f8b06dc24db61957f8318352df
test tests::test_open_invite ... ok
test tests::test_trust_promotion ... ok
test tests::test_level0_migration ... ok
test tests::test_level_up ... ok
test tests::test_issue_invite ... ok
test tests::test_redeem_invite ... ok
test tests::test_mark_unreachable ... ok
test tests::test_blockage_migration ... ok
lox-old-test
a123ae8faf5ef0ef5c32da13aa97d3a737623fcac6465e8797d39764bf1eaf5f
test proto::check_blockage::tests::test_check_blockage ... ok
test proto::issue_invite::tests::test_issue_invite ... ok
test proto::redeem_invite::tests::test_redeem_invite ... ok
test proto::migration::tests::test_trust_migration ... ok
test proto::update_cred::tests::test_update_cred ... ok
test proto::level_up::tests::test_level_up ... ok
test proto::blockage_migration::tests::test_blockage_migration ... ok
test proto::open_invite::tests::test_open_invitation ... ok
test proto::update_invite::tests::test_update_invite ... ok
test proto::trust_promotion::tests::test_trust_promotion ... ok
lox-new-test
125c9737726072113f4dfbcdc61fddea17be335014aad4eabaf7b4c41a4a9e8e
test proto::check_blockage::tests::test_check_blockage ... ok
test proto::issue_invite::tests::test_issue_invite ... ok
test proto::redeem_invite::tests::test_redeem_invite ... ok
test proto::migration::tests::test_trust_migration ... ok
test proto::update_cred::tests::test_update_cred ... ok
test proto::level_up::tests::test_level_up ... ok
test proto::blockage_migration::tests::test_blockage_migration ... ok
test proto::open_invite::tests::test_open_invitation ... ok
test proto::update_invite::tests::test_update_invite ... ok
test proto::trust_promotion::tests::test_trust_promotion ... ok
test proto::report_submit::tests::test_report_protocols ... ok
troll-patrol-test
Everything seems to be set up correctly!

(The hexadecimal strings will be different on each run.)

Artifact Evaluation

Main Results and Claims

The artifact produces Tables 2 and 3 (from Section 6) and Table 5 (from Appendix C), showing benchmarks for the current development branch of Lox (Table 2), our fork of this development branch that includes reporting protocols (Table 3), and the original Lox implementation (Table 5 from Appendix C).

Tables 2 and 3 relate to Main Results 1 and 2 (listed below).

The artifact also reproduces the results in Appendix A of our paper, which are described below as Main Results 3 and 4.

Main Result 1: New Protocols are Comparable to Existing Protocols

Our paper claims that our new protocols for reporting are comparable to existing Lox protocols in terms of communication and computation. This is found by looking at Table 3 (benchmarks for our modified code, which adds reporting protocols). Our new protocols (Report Submit, Report Status, and Report Resolve) have times and sizes comparable to those of the other protocols listed in the table. This claim is reproduced by Experiment 1, which produces Tables 2 and 3.

Main Result 2: Our Modifications Have a Small Impact on Existing Protocols

Our paper claims that our modifications to Lox result in only a small increase in communication and computation costs for the existing Lox protocols, with request sizes increasing by around 100-200 bytes. This is found by comparing the results for protocols listed in both Table 2 (benchmarks for the development branch of Lox from which we forked) and Table 3 (benchmarks for our modified code, which adds reporting protocols). The protocols listed in both tables should have only slightly higher values for computation times in Table 3 than in Table 2. The request and response sizes produced by running this artifact should be identical to those provided in the paper. Request sizes in Table 3 should be at most 192 bytes greater than the sizes in Table 2 for the same protocols. (The specific differences are 0, 96, 128, and 192.) Response sizes in Tables 2 and 3 should be identical. This claim is reproduced by Experiment 1, which produces Tables 2 and 3.

Main Result 3: Algorithms 1–3 are Poor Classifiers of Censorship in the Belarus Case Study

Appendix A of our paper specifies three algorithms (one of which comes from prior work by Loesing) for detecting bridge censorship based on daily connection counts. These algorithms all failed spectacularly when trying to detect censorship of email-distributed obfs4 bridges in Belarus starting in late February 2021. This is shown in Table 4, in which all but one configuration had 0 true positives. The one instance with any true positives was Algorithm 3, with d=8. In this case, we observed 5 true positives and 995 false positives. This claim is reproduced by Experiment 2, which produces Table 4 from Appendix A.

Main Result 4: Email-Distributed obfs4 Bridges Were Not Popular in Belarus

Appendix A of our paper finds that email-distributed obfs4 bridges had very low connection counts in Belarus, even prior to the censorship event. Specifically, we find the following results. Of the 93 email-distributed obfs4 bridges...

  • 43 received more than 0 connections on a single day.
  • 13 received more than 8 connections on a single day.
  • 2 received more than 16 connections on a single day.
  • 0 received more than 24 connections on a single day.

Only 5 of the 93 email-distributed obfs4 bridges had a mean connection count more than 1 standard deviation away from 0. Of these 5 bridges, the greatest distance between 0 and the mean was only about 1.6 standard deviations.

Experiments

Our entire artifact can be run with the ./run.sh script, which accepts the following (optional) arguments:

-n NUM_PERFORMANCE_CORES    (Experiment 1) Use only NUM_PERFORMANCE_CORES threads.
-N PERFORMANCE_CORE_RANGE   (Experiment 1) Use the threads specified in PERFORMANCE_CORE_RANGE.
-s                          (Experiment 2) Run ./scripts/belarus.sh sequentially, instead of in parallel.
--fast                      (Experiment 2) Start with pre-processed data.

See Experiment 1 for examples of using the -n and -N options. This process should take around 1–2 hours and requires 20 GB of free disk space. (-s or --fast will change the time and disk space requirements. See Experiment 2 for details.)

Experiment 1: Lox Benchmarking

  • Time: 1 human-minute + 5–20 compute-minutes
./scripts/generate-lox-results.sh [-n NUM_PERFORMANCE_CORES|-N PERFORMANCE_CORE_RANGE] && \
./scripts/process-lox-results.sh

By default, the script will use all available threads (up to the number of processes to be run, which is 16), but you can use -n NUM_PERFORMANCE_CORES to restrict it to use only the first NUM_PERFORMANCE_CORES threads. If these are not the ones you want to use, you can instead specify -N PERFORMANCE_CORE_RANGE to indicate the specific threads to use. The script does not compute this for you.

Examples:

-n 16           Use threads 0-15
-N 2-8          Use threads 2-8
-N 0-2,6,9      Use threads 0,1,2,6,9

These scripts perform benchmarks for:

  1. the development branch of Lox from which we forked (Table 2, in Section 6)
  2. our modified code that adds reporting protocols (Table 3, in Section 6)
  3. the original Lox code (Table 5, in Appendix C)

These scripts create .tex files for these tables and compile them to PDFs. These .pdf files (as well as the corresponding .tex files) are copied to the root directory of the artifact. The results can be found as table-2-results.pdf (Table 2), table-3-results.pdf (Table 3), and appendix-c-results.pdf (Table 5).

table-2-results.pdf and table-3-results.pdf can be used to verify Main Results 1 and 2.

Experiment 2: Belarus Case Study

  • Time: 1 human-minute + 1 compute-hour
  • Storage: 20 GB
./scripts/belarus.sh [-s|--fast]

The recommended way to run this script is without any arguments. It will download 8 bridge extra-info archives from the Tor Project for the period of July 2020–April 2021, extract and process this data to learn daily connection counts from Belarus for each bridge, then use Algorithms 1–3 to try to detect which bridges were blocked starting in late February 2021 and which were not. This takes about 1 hour and requires 20 GB of storage.

If the -s option is used, then only one of the extra-info archives will be extracted at a time (instead of them being processed in parallel) and deleted after processing. This will take much longer (around 8–10 hours instead of 1 hour), but requires only a few GB of storage.

If the --fast option is used, then the extra-info archives will not be downloaded from the Tor Project. Instead, the script starts with a 6.7 MB archive (87 MB uncompressed) containing only the information we need. It yields identical results and takes only 1-2 minutes; however, it requires trusting that the pre-processed data we provide was extracted correctly.

When processing the data downloaded from the Tor Project (unless the --fast option is used), the script will output many lines like this:

Error for 
    fingerprint: 
    date:        2461184
    count:       0

Do not be alarmed by these errors or if there is a long period with no output after the errors appear. The errors appear because there are some records that are malformed (missing bridge fingerprints) that cannot be parsed and will simply be ignored.

This experiment outputs appendix-a-results.pdf (and a corresponding .tex file), containing Table 4 (from Appendix A of our paper). As this process begins with downloading existing data, and the algorithms used are deterministic, the contents of the output Table 4 should be identical to those in our paper. This table can be used to verify Main Result 3.

This experiment also outputs a text file called "appendix-a-results", which contains various statistics about connection counts to the bridges we evaluate, including those described in Appendix A.2. These can be used to verify Main Result 4.

Limitations

N/A

Notes on Reusability

N/A