Extract data from the "Share and Multiply" dataset for use with MGen.
Justin Tracey 4b84f3685a add some (too) minimal docs and markov model script | 6 miesięcy temu | |
---|---|---|
hmm | 6 miesięcy temu | |
src | 6 miesięcy temu | |
Cargo.toml | 1 rok temu | |
README.md | 6 miesięcy temu |
This repo contains tools to extract empirical distributions from the "Share and Multiply" (SaM) dataset of WhatsApp chat metadata.
More thorough documentation is coming soon, but the gist is:
json_files.zip
file they provide, and extract it somewhere.extract
tool to pare and serialize the SaM data.hmm
to label messages as "active" or "idle".process
tool to generate all empirical distributions other than message sizes.message-lens
tool to generate distributions for message sizes.