Просмотр исходного кода

Document parallelization, -s option

Vecna 1 неделя назад
Родитель
Сommit
a1fb07dfcb
1 измененных файлов с 12 добавлено и 11 удалено
  1. 12 11
      README.md

+ 12 - 11
README.md

@@ -24,17 +24,18 @@ Dependencies:
 - python3
 - python3
 - numpy
 - numpy
 
 
-To reproduce our results, just run `./run.sh`. **This will take a long
-time (about 12.5 hours on my device) and require a few GB of free
-space.** The reason is that it downloads all extra-info records for all
-bridges from 2020-07 to 2021-04 from the Tor Project's CollecTor service
-(734 MB compressed, about 20 GB uncompressed), extracts that data, and
-obtains the needed information (bridge fingerprint, record date, and
-connection count from Belarus) from each record. We reduce the disk
-space required by extracting only one archive at a time, then deleting
-the resulting files after processing. Nevertheless, there are A LOT of
-these records (about 4 million), and it takes a long time to process
-them all.
+To reproduce our results, just run `./run.sh`. This should take about an
+hour and requires around 20 GB of free space, which is used to download
+and extract all extra-info records for all bridges from 2020-07 through
+2021-04 from the Tor Project's CollecTor service. (The download size for
+the compressed archives is only about 734 MB.)
+
+To reduce the space needed (at the cost of significantly increasing the
+time required), you can use `./run.sh -s`, which extracts and processes
+a single extra-info archive at a time, then removes the uncompressed
+files to free up space before moving on to the next one. This will take
+roughly 10 times as long, but it only requires a few GB of free disk
+space at any time.
 
 
 Alternatively, you can run `./run.sh --fast`, which starts with a 6.7 MB
 Alternatively, you can run `./run.sh --fast`, which starts with a 6.7 MB
 archive (87 MB uncompressed) containing only the information we need.
 archive (87 MB uncompressed) containing only the information we need.