Brak opisu

Vecna c5dba84891 Dash instead of hyphen		1 tydzień temu
conf	b384d8d283 Fix non-interactive docker setup	3 miesięcy temu
parsing-results	e7fdaaa271 Align tables on decimal points	3 tygodni temu
scripts	4578104d28 Update README, remove -s option	1 tydzień temu
Dockerfile	509dc31483 Add dependency for siunitx, test LaTeX compilation	3 tygodni temu
LICENSE	60f23e2549 Add license	3 miesięcy temu
README.md	c5dba84891 Dash instead of hyphen	1 tydzień temu
run.sh	4578104d28 Update README, remove -s option	1 tydzień temu

Artifact Appendix

Paper title: Troll Patrol: Anonymous User Reporting of Bridge Censorship

Requested Badge(s):

Available
Functional
Reproduced

Description

This artifact accompanies the paper:

Vecna, Ian Goldberg. 2026. Troll Patrol: Anonymous User Reporting of Bridge Censorship. Proceedings on Privacy Enhancing Technologies 2026, 4 (2026).

This repository contains scripts for reproducing the results from Section 6, as well as additional results in Appendices A and C.

Security/Privacy Issues and Ethical Concerns

N/A

Basic Requirements

Hardware Requirements

Can run on a laptop (No special hardware requirements)

The results in the paper were computed using one Intel Xeon Platinum 8380 CPU (80 threads; 40 cores @ 2.30 GHz, up to 3.40 GHz).

We also tested the artifact on a laptop with a 13th Gen Intel Core i7-1360P (16 threads; 4 performance cores, up to 5 GHz) and in a PETS artifact evaluation Standard VM allocated 4 threads from an Intel(R) Xeon(R) Gold 6226 CPU @ 2.70 GHz (up to 3.70 GHz).

Software Requirements

Dependencies:

curl
docker (provided by the docker.io package on Debian/Ubuntu)
git

We used Ubuntu 22.04 for our host OS, but any Linux distribution should work.

In addition to installing the dependencies, ensure that your user is in the docker group. For example, on Ubuntu 22.04:

sudo apt update
sudo apt install curl docker.io git
sudo usermod -aG docker $(whoami)
Log out and log back in

Run docker run hello-world to test that Docker is working properly. The output should be something like this:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
4f55086f7dd0: Pull complete 
Digest: sha256:0e760fdfbc48ba8041e7c6db999bb40bfca508b4be580ac75d32c4e29d202ce1
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

We used Docker version 29.1.3 with rust:1.93.0 as the base Docker image for our experiments. Other dependencies are installed within Docker images.

Estimated Time and Storage Consumption

Running the artifact should take around 2–5 hours and requires around 15 GB of disk space.

Environment

Accessibility

This artifact is available at https://git-crysp.uwaterloo.ca/vvecna/troll-patrol-artifact.

Set Up the Environment

Time: 2 human-minutes + 6–16 compute-minutes

git clone https://git-crysp.uwaterloo.ca/vvecna/troll-patrol-artifact.git
cd troll-patrol-artifact
./scripts/setup.sh

This will clone related repositories and build the Docker images used for the artifact. The Docker images use around 10 GB of storage.

Testing the Environment

Time: 1 human-minute + 1 compute-minute

After building the Docker images, you can run:

./scripts/test.sh

This will test that necessary dependencies are present in the Docker images and that you can compile LaTeX within the Docker, and it will run unit tests to ensure the three versions of the Lox code are working as expected. This script should print an error if a dependency is missing or if the .tex file failed to compile. Otherwise, it should output a file called "test-table.pdf" and various unit test names, followed by "ok", assuming that these unit tests pass. Here is an example of what the end of the script output should look like:

b5acaef2bfa8da1df47d1cbbceba01f6a5f025f8b06dc24db61957f8318352df
test tests::test_open_invite ... ok
test tests::test_trust_promotion ... ok
test tests::test_level0_migration ... ok
test tests::test_level_up ... ok
test tests::test_issue_invite ... ok
test tests::test_redeem_invite ... ok
test tests::test_mark_unreachable ... ok
test tests::test_blockage_migration ... ok
lox-old-test
a123ae8faf5ef0ef5c32da13aa97d3a737623fcac6465e8797d39764bf1eaf5f
test proto::check_blockage::tests::test_check_blockage ... ok
test proto::issue_invite::tests::test_issue_invite ... ok
test proto::redeem_invite::tests::test_redeem_invite ... ok
test proto::migration::tests::test_trust_migration ... ok
test proto::update_cred::tests::test_update_cred ... ok
test proto::level_up::tests::test_level_up ... ok
test proto::blockage_migration::tests::test_blockage_migration ... ok
test proto::open_invite::tests::test_open_invitation ... ok
test proto::update_invite::tests::test_update_invite ... ok
test proto::trust_promotion::tests::test_trust_promotion ... ok
lox-new-test
125c9737726072113f4dfbcdc61fddea17be335014aad4eabaf7b4c41a4a9e8e
test proto::check_blockage::tests::test_check_blockage ... ok
test proto::issue_invite::tests::test_issue_invite ... ok
test proto::redeem_invite::tests::test_redeem_invite ... ok
test proto::migration::tests::test_trust_migration ... ok
test proto::update_cred::tests::test_update_cred ... ok
test proto::level_up::tests::test_level_up ... ok
test proto::blockage_migration::tests::test_blockage_migration ... ok
test proto::open_invite::tests::test_open_invitation ... ok
test proto::update_invite::tests::test_update_invite ... ok
test proto::trust_promotion::tests::test_trust_promotion ... ok
test proto::report_submit::tests::test_report_protocols ... ok
troll-patrol-test
Everything seems to be set up correctly!

(The hexadecimal strings will be different on each run.)

Artifact Evaluation

Main Results and Claims

The artifact produces Tables 2 and 3 (from Section 6) and Table 5 (from Appendix C), showing benchmarks for the current development branch of Lox (Table 2), our fork of this development branch that includes reporting protocols (Table 3), and the original Lox implementation (Table 5 from Appendix C).

Tables 2 and 3 relate to Main Results 1 and 2 (listed below).

The artifact also reproduces the results in Appendix A of our paper, which are described below as Main Results 3 and 4.

Main Result 1: New Protocols are Comparable to Existing Protocols

Our paper claims that our new protocols for reporting are comparable to existing Lox protocols in terms of communication and computation. This is found by looking at Table 3 (benchmarks for our modified code, which adds reporting protocols). Our new protocols (Report Submit, Report Status, and Report Resolve) have times and sizes comparable to those of the other protocols listed in the table. This claim is reproduced by Experiment 1, which produces Tables 2 and 3.

Main Result 2: Our Modifications Have a Small Impact on Existing Protocols

Our paper claims that our modifications to Lox result in only a small increase in communication and computation costs for the existing Lox protocols, with request sizes increasing by around 100-200 bytes. This is found by comparing the results for protocols listed in both Table 2 (benchmarks for the development branch of Lox from which we forked) and Table 3 (benchmarks for our modified code, which adds reporting protocols). The protocols listed in both tables should have only slightly higher values for computation times in Table 3 than in Table 2. The request and response sizes produced by running this artifact should be identical to those provided in the paper. Request sizes in Table 3 should be at most 192 bytes greater than the sizes in Table 2 for the same protocols. (The specific differences are 0, 96, 128, and 192.) Response sizes in Tables 2 and 3 should be identical. This claim is reproduced by Experiment 1, which produces Tables 2 and 3.

Main Result 3: Algorithms 1–3 are Poor Classifiers of Censorship in the Belarus Case Study

Appendix A of our paper specifies three algorithms (one of which comes from prior work by Loesing) for detecting bridge censorship based on daily connection counts. These algorithms all failed spectacularly when trying to detect censorship of email-distributed obfs4 bridges in Belarus starting in late February 2021. This is shown in Table 4, in which all but one configuration had 0 true positives. The one instance with any true positives was Algorithm 3, with d=8. In this case, we observed 5 true positives and 995 false positives. This claim is reproduced by Experiment 2, which produces Table 4 from Appendix A.

Main Result 4: Email-Distributed obfs4 Bridges Were Not Popular in Belarus

Appendix A of our paper finds that email-distributed obfs4 bridges had very low connection counts in Belarus, even prior to the censorship event. Specifically, we find the following results. Of the 93 email-distributed obfs4 bridges...

43 received more than 0 connections on a single day.
13 received more than 8 connections on a single day.
2 received more than 16 connections on a single day.
0 received more than 24 connections on a single day.

Only 5 of the 93 email-distributed obfs4 bridges had a mean connection count more than 1 standard deviation away from 0. Of these 5 bridges, the greatest distance between 0 and the mean was only about 1.6 standard deviations.

These details are reproduced by Experiment 2.

Experiments

Our entire artifact can be run with the ./run.sh script, which accepts the following (optional) arguments:

-n NUM_PERFORMANCE_CORES    (Experiment 1) Use only NUM_PERFORMANCE_CORES threads.
-N PERFORMANCE_CORE_RANGE   (Experiment 1) Use the threads specified in PERFORMANCE_CORE_RANGE.
--fast                      (Experiment 2) Start with pre-processed data.

This will run the setup script (if it has not already been run) and the scripts for both Experiment 1 and Experiment 2, passing the arguments to those scripts as appropriate. See Experiment 1 for examples of using the -n and -N options. This process should take around 2–5 hours. (If the --fast option is used, the process should take around half an hour. See Experiment 2 for details.)

Alternatively, the scripts for the experiments can be run individually, as documented in the experiment descriptions below.

Experiment 1: Lox Benchmarking

Time: 1 human-minute + 5–20 compute-minutes

./scripts/generate-lox-results.sh [-n NUM_PERFORMANCE_CORES] [-N PERFORMANCE_CORE_RANGE] && \
./scripts/process-lox-results.sh

(This will run only Experiment 1. Alternatively, you can run both Experiment 1 and Experiment 2 with the ./run.sh script, as documented above.)

By default, the script will use all available threads (up to the number of processes to be run, which is 16), but you can use -n NUM_PERFORMANCE_CORES to restrict it to use only the first NUM_PERFORMANCE_CORES threads. If these are not the ones you want to use, you can instead specify -N PERFORMANCE_CORE_RANGE to indicate the specific threads to use.

Examples:

-n 16           Use threads 0-15
-N 2-8          Use threads 2-8
-N 0-2,6,9      Use threads 0,1,2,6,9

These scripts perform benchmarks for:

the development branch of Lox from which we forked (Table 2, in Section 6)
our modified code that adds reporting protocols (Table 3, in Section 6)
the original Lox code (Table 5, in Appendix C)

These scripts create .tex files for these tables and compile them to PDFs. These .pdf files (as well as the corresponding .tex files) are copied to the root directory of the artifact. The results can be found as table-2-results.pdf (Table 2), table-3-results.pdf (Table 3), and appendix-c-results.pdf (Table 5).

table-2-results.pdf and table-3-results.pdf can be used to verify Main Results 1 and 2.

Experiment 2: Belarus Case Study

Time: 1 human-minute + 1.5–4.5 compute-hours

./scripts/belarus.sh [--fast]

(This will run only Experiment 2. Alternatively, you can run both Experiment 1 and Experiment 2 with the ./run.sh script, as documented above.)

The recommended way to run this script is without the --fast argument: ./scripts/belarus.sh. This will download 10 bridge extra-info archives from the Tor Project for the period of July 2020–April 2021, parse this data to learn daily connection counts from Belarus for each bridge, and then use Algorithms 1–3 to try to detect which bridges were blocked starting in late February 2021 and which were not. This takes about 1.5–4.5 hours.

If the --fast option is used, then the extra-info archives will not be downloaded from the Tor Project. Instead, the script starts with a 6.7 MB archive (87 MB uncompressed) containing only the information we need. It yields identical results and takes only 1-2 minutes; however, it requires trusting that the pre-processed data we provide was extracted correctly.

This experiment outputs appendix-a-results.pdf (and a corresponding .tex file), containing Table 4 (from Appendix A of our paper). As this process begins with downloading existing data, and the algorithms used are deterministic, the contents of the output Table 4 should be identical to those in our paper. This table can be used to verify Main Result 3.

This experiment also outputs a text file called "appendix-a-results", which contains various statistics about connection counts to the bridges we evaluate, including those described in Appendix A.2. These can be used to verify Main Result 4.

Limitations

N/A

Notes on Reusability

N/A

README.md