This repo contains tools to extract empirical distributions from the ["Share and Multiply" (SaM) dataset](https://figshare.com/articles/dataset/WhatsApp_Data_Set/19785193) of WhatsApp chat metadata. More thorough documentation is coming soon, but the gist is: - Download the `json_files.zip` file they provide, and extract it somewhere. - Run the `extract` tool to pare and serialize the SaM data. - Use the tools in `hmm` to label messages as "active" or "idle". - Run the `process` tool to generate all empirical distributions other than message sizes. - Run the `message-lens` tool to generate distributions for message sizes.