|
@@ -3,6 +3,19 @@ This repo contains tools to extract empirical distributions from the ["Share and
|
|
|
More thorough documentation is coming soon, but the gist is:
|
|
|
- Download the `json_files.zip` file they provide, and extract it somewhere.
|
|
|
- Run the `extract` tool to pare and serialize the SaM data.
|
|
|
+ (Using `chat*.json` in any of the following commands means using all available chats; you can use a subset for faster processing, so long as you're consistent.)
|
|
|
+ ``cargo run --bin extract stats/ json_files/chat*.json``
|
|
|
- Use the tools in `hmm` to label messages as "active" or "idle".
|
|
|
+ - install the dependencies via `pip install -r requirements.txt`
|
|
|
+ - run the shell script to invoke the python script in parallel
|
|
|
+ ``./parallel_run.sh ../stats/ stats2/``
|
|
|
- Run the `process` tool to generate all empirical distributions other than message sizes.
|
|
|
+ ``cargo run --bin process dists/ hmm/stats2/ json_files/chat*.json``
|
|
|
- Run the `message-lens` tool to generate distributions for message sizes.
|
|
|
+ This takes an optional argument for file sizes (must be first if provided, sorry for the jank).
|
|
|
+ If you have a source for file sizes, you can provide it here.
|
|
|
+ If you don't want to simulate sending files, you can omit it.
|
|
|
+ If you don't have a source, you can use the one we provide based on public WhatsApp groups in 2023.
|
|
|
+ ``cargo run --bin message-lens -- -s data/file_sizes.dat dists/ json_files/chat*.json``
|
|
|
+
|
|
|
+At this point, `dists/` will contain distributions ready for use in MGen, organized by the user being simulated.
|