|
@@ -24,17 +24,18 @@ Dependencies:
|
|
|
- python3
|
|
- python3
|
|
|
- numpy
|
|
- numpy
|
|
|
|
|
|
|
|
-To reproduce our results, just run `./run.sh`. **This will take a long
|
|
|
|
|
-time (about 12.5 hours on my device) and require a few GB of free
|
|
|
|
|
-space.** The reason is that it downloads all extra-info records for all
|
|
|
|
|
-bridges from 2020-07 to 2021-04 from the Tor Project's CollecTor service
|
|
|
|
|
-(734 MB compressed, about 20 GB uncompressed), extracts that data, and
|
|
|
|
|
-obtains the needed information (bridge fingerprint, record date, and
|
|
|
|
|
-connection count from Belarus) from each record. We reduce the disk
|
|
|
|
|
-space required by extracting only one archive at a time, then deleting
|
|
|
|
|
-the resulting files after processing. Nevertheless, there are A LOT of
|
|
|
|
|
-these records (about 4 million), and it takes a long time to process
|
|
|
|
|
-them all.
|
|
|
|
|
|
|
+To reproduce our results, just run `./run.sh`. This should take about an
|
|
|
|
|
+hour and requires around 20 GB of free space, which is used to download
|
|
|
|
|
+and extract all extra-info records for all bridges from 2020-07 through
|
|
|
|
|
+2021-04 from the Tor Project's CollecTor service. (The download size for
|
|
|
|
|
+the compressed archives is only about 734 MB.)
|
|
|
|
|
+
|
|
|
|
|
+To reduce the space needed (at the cost of significantly increasing the
|
|
|
|
|
+time required), you can use `./run.sh -s`, which extracts and processes
|
|
|
|
|
+a single extra-info archive at a time, then removes the uncompressed
|
|
|
|
|
+files to free up space before moving on to the next one. This will take
|
|
|
|
|
+roughly 10 times as long, but it only requires a few GB of free disk
|
|
|
|
|
+space at any time.
|
|
|
|
|
|
|
|
Alternatively, you can run `./run.sh --fast`, which starts with a 6.7 MB
|
|
Alternatively, you can run `./run.sh --fast`, which starts with a 6.7 MB
|
|
|
archive (87 MB uncompressed) containing only the information we need.
|
|
archive (87 MB uncompressed) containing only the information we need.
|