Data analysis related to the Tor blocking events in Belarus from 2020-2021

Vecna a1fb07dfcb Document parallelization, -s option 1 week ago
data 8d386a5f85 Scripts and some pre-processed data 2 months ago
scripts 6ce624f803 Script for Troll Patrol artifact 1 week ago
LICENSE c99c18057e License 3 weeks ago
README.md a1fb07dfcb Document parallelization, -s option 1 week ago
run.sh c305a1c822 Use -s for sequential; clean up 1 week ago

README.md

Belarus 2020--2021

This repo contains data analysis related to the Tor blocking events in Belarus from 2020-2021. In particular, in late February 2021, the censor apparently enumerated the set of obfs4 bridges that were distributed via email and blocked those bridges. Given the set of 1890 bridges that were distributed in February 2021 prior to the 22nd and the subset of 93 email-distributed obfs4 bridges, our goal was to detect based on low connection counts from Belarus that the 93 email-distributed obfs4 bridges were blocked, while avoiding false detections that the other bridges were blocked.

Our main finding is that the 93 email-distributed obfs4 bridges that were blocked were not commonly used in Belarus prior to the censorship event. Very low connection counts (e.g., 0) were common, so we were unable to infer censorship of these bridges based on this signal.

Reproducing our results

Dependencies:

  • bash
  • curl
  • python3
  • numpy

To reproduce our results, just run ./run.sh. This should take about an hour and requires around 20 GB of free space, which is used to download and extract all extra-info records for all bridges from 2020-07 through 2021-04 from the Tor Project's CollecTor service. (The download size for the compressed archives is only about 734 MB.)

To reduce the space needed (at the cost of significantly increasing the time required), you can use ./run.sh -s, which extracts and processes a single extra-info archive at a time, then removes the uncompressed files to free up space before moving on to the next one. This will take roughly 10 times as long, but it only requires a few GB of free disk space at any time.

Alternatively, you can run ./run.sh --fast, which starts with a 6.7 MB archive (87 MB uncompressed) containing only the information we need. This is much faster (1-2 minutes on my device) and requires much less disk space (around 250 MB).

Full results will be available in the output file, and the relevant summary will be printed to the console.

Julian dates

Note: The Julian date conversion in get-bridge-data.sh and get-stats.py is 1 day earlier than reported by, e.g., https://aa.usno.navy.mil/data/JulianDate. What matters is that these two files use the same representation of a date, which they do.

License

This project as a whole is licensed under the MIT (Expat) License. It uses data from the Tor Project that is provided under the CC0 Deed.