Data analysis related to the Tor blocking events in Belarus from 2020-2021
|
|
1 месяц назад | |
|---|---|---|
| data | 1 месяц назад | |
| scripts | 1 месяц назад | |
| README.md | 1 месяц назад | |
| run.sh | 1 месяц назад |
This repo contains data analysis related to the Tor blocking events in Belarus from 2020-2021. In particular, in late February 2021, the censor apparently enumerated the set of obfs4 bridges that were distributed via email and blocked those bridges. Given the set of 1890 bridges that were distributed in February 2021 prior to the 22nd and the subset of 93 email-distributed obfs4 bridges, our goal was to detect based on low connection counts from Belarus that the 93 email-distributed obfs4 bridges were blocked, while avoiding false detections that the other bridges were blocked.
Our main finding is that the 93 email-distributed obfs4 bridges that were blocked were not commonly used in Belarus prior to the censorship event. Very low connection counts (e.g., 0) were common, so we were unable to infer censorship of these bridges based on this signal.
Dependencies:
To reproduce our results, just run ./run.sh. This will take a long
time (about 12.5 hours on my device) and require a few GB of free
space. The reason is that it needs to download all extra-info records
for all bridges from 2020-07 to 2021-04 from the Tor Project's CollecTor
service (734 MB compressed, about 20 GB uncompressed), extract that
data, and obtain the needed information (bridge fingerprint, record
date, and connection count from Belarus) from each record. There are A
LOT of these records.
If you want to avoid spending all the time to do that, you can run
./run.sh --fast, which starts with a 6.7 MB archive (87 MB
uncompressed) containing only the information we need.
Note: The Julian date conversion in get-bridge-data.sh and get-stats.py is 1 day earlier than reported by, e.g., https://aa.usno.navy.mil/data/JulianDate. What matters is that these two files use the same representation of a date, which they do.