Tools for generating realistic messenger network traffic.

Justin Tracey d440c0d88f progress so far on p2p client (completely broken still, won't build) 1 anno fa
src d440c0d88f progress so far on p2p client (completely broken still, won't build) 1 anno fa
Cargo.toml c0ee99a134 add more optimizations to release builds 1 anno fa
README.md 78e76feb2c make client and server more robust against network failures 1 anno fa

README.md

MGen

MGen is a client, server, and library for generating simulated messenger traffic. It is designed for use analogous to (and likely in conjunction with) TGen, but for simulating traffic generated from communications in messenger apps, such as Signal or WhatsApp, rather than web traffic or file downloads. Notably, this allows for studying network traffic properties of messenger apps in Shadow.

Like TGen, MGen can create message flows built around Markov models. Unlike TGen, these models are expressly designed with user activity in messenger clients in mind. These messages can be relayed through a central server, which can handle group messages (i.e., traffic that originates from one sender, but gets forwarded to multiple recipients). Clients also generate received receipts (small messages used to indicate to someone who sent a message that the recipient device has received it). These receipts can make up to half of all traffic. (Read receipts, however, are not supported.)

Usage

MGen is written entirely in Rust, and is built like most pure Rust projects. If you have a working Rust install with Cargo, you can build the client and server with cargo build. Normal cargo features apply---e.g., use the --release flag to enable a larger set of compiler optimizations. The server can be built and executed with cargo run --bin server, and the client with cargo run --bin client [config.toml].... Alternatively, you can run the executables directly from the respective target directory (e.g., ./target/release/server).

Client Configuration

Clients are designed to simulate one conversation per configuration file. Part of this configuration is the user sending messages in this conversation---similar to techniques used in TGen, a single client instance can simulate traffic of many individual users. The following example configuration with explanatory comments should be enough to understand almost everything you need:

# conversation.toml

# A name used for logs and to create unique circuits for each user on a client.
sender = "Alice"

# A name used for logs and to create unique circuits for each conversation,
# even when two chats share the same participants.
group = "group1"

# The list of participants, except the sender.
recipients = ["Bob", "Carol", "Dave"]

# The <ip>:<port> of the socks5 proxy to connect through.
socks = "127.0.0.1:9050"

# The <address>:<port> of the message server, where <address> is an IP or onion address.
server = "insert.ip.or.onion:6397"

# The number of seconds to wait until the client starts sending messages.
# This should be long enough that all clients have had time to start (sending
# messages to a client that isn't registered on the server is a fatal error),
# but short enough all conversations will have started by the experiment start.
bootstrap = 5.0

# The number of seconds to wait after a network failure before retrying.
retry = 5.0


# Parameters for distributions used by the Markov model.
[distributions]

# Probabilities of Idle to Active transition after sending/receiving messages.
s = 0.5
r = 0.1

# The distribution of message sizes, as measured in padding blocks.
m = {distribution = "Poisson", lambda = 1.0}

# Distribution I, the amount of time Idle before sending a message.
i = {distribution = "Normal", mean = 30.0, std_dev = 100.0}

# Distribution W, the amount of time Active without sending or receiving
# messages to transition to Idle.
w = {distribution = "Uniform", low = 0.0, high = 90.0}

# Distribution A_{s/r}, the amount of time Active since last sent/received
# message until the client sends a message.
a_s = {distribution = "Exp", lambda = 2.0}
a_r = {distribution = "Pareto", scale = 1.0, shape = 3.0}

The client currently supports five probability distributions for message timings: Normal and LogNormal, Uniform, Exp(onential), and Pareto. The parameter names can be found in the example above. The distributions are sampled to return a double-precision floating point number of seconds. The particular distributions and parameters used in the example are for demonstration purposes only, they have no relationship to empirical conversation behaviors. When sampling, values below zero are clamped to 0---e.g., the i distribution above will have an outsize probability of yielding 0.0 seconds, instead of redistributing weight. Any distribution in the rand_distr crate would be simple to add support for. Distributions not in that crate can also be supported, but would require implementing.