1 year ago · 7c3f2d01d1
--- a/README.md
+++ b/README.md
@@ -1,30 +1,49 @@
 
				 ## MGen
			
 
				 
			
 
				-MGen is a client, server, and library for generating simulated messenger traffic.
			
 
				+MGen is a set of tools for generating simulated messenger traffic.
			
 
				 It is designed for use analogous to (and likely in conjunction with) [TGen](https://github.com/shadow/tgen), but for simulating traffic generated from communications in messenger apps, such as Signal or WhatsApp, rather than web traffic or file downloads.
			
 
				 Notably, this allows for studying network traffic properties of messenger apps in [Shadow](https://github.com/shadow/shadow).
			
 
				 
			
 
				 Like TGen, MGen can create message flows built around Markov models.
			
 
				 Unlike TGen, these models are expressly designed with user activity in messenger clients in mind.
			
 
				 These messages can be relayed through a central server, which can handle group messages (i.e., traffic that originates from one sender, but gets forwarded to multiple recipients).
			
 
				-Alternatively, a peer-to-peer client can be used.
			
 
				+Alternatively, a peer-to-peer client, what we call the "peer", can be used.
			
 
				 
			
 
				-Clients also generate received receipts (small messages used to indicate to someone who sent a message that the recipient device has received it).
			
 
				+Clients and peers also generate delivery receipts (small messages used to indicate to someone who sent a message that the recipient device has received it).
			
 
				 These receipts can make up to half of all traffic.
			
 
				 (Read receipts, however, are not supported.)
			
 
				 
			
 
				 ## Usage
			
 
				 
			
 
				 MGen is written entirely in Rust, and is built like most pure Rust projects.
			
 
				-If you have a working Rust install with Cargo, you can build the client and server with `cargo build`.
			
 
				-Normal cargo features apply---e.g., use the `--release` flag to enable a larger set of compiler optimizations.
			
 
				-The server can be built and executed with `cargo run --bin server`, the client (for use with client-server mode) with `cargo run --bin client [config.toml]...`, and the peer (for use with peer-to-peer mode) with `cargo run --bin peer [config.toml]...`.
			
 
				-Alternatively, you can run the executables directly from the respective target directory (e.g., `./target/release/server`).
			
 
				+If you have a working [Rust install](https://rustup.rs/) with Cargo, you can build and install the peer, client, and server with cargo by running from the project root:
			
 
				+
			
 
				+`cargo install --path .`
			
 
				+
			
 
				+This will generally place them somewhere in your environment's PATH.
			
 
				+Normal cargo features also apply—e.g., use `cargo build` to build debug builds, or use the `--release` flag to enable release optimizations without installing.
			
 
				+The server can be built and executed in debug mode with `cargo run --bin mgen-server`, and similar for the client and peer.
			
 
				+Alternatively, you can run the executables directly from the respective target directory after building (e.g., `./target/release/server`).
			
 
				+
			
 
				+### Invocation
			
 
				+
			
 
				+#### Server
			
 
				+`mgen-server [addr:port]`
			
 
				+
			
 
				+The server will listen for connections on the given interface.
			
 
				+If none is given, it will listen on `127.0.0.1:6397`.
			
 
				+
			
 
				+#### Client/Peer
			
 
				+`mgen-client [config.toml]...`
			
 
				+
			
 
				+`mgen-peer [config.toml]...`
			
 
				+
			
 
				+The client and peer configuration files are detailed below.
			
 
				 
			
 
				 ### Client Configuration
			
 
				 
			
 
				-Clients are designed to simulate one conversation per configuration file.
			
 
				-Part of this configuration is the user sending messages in this conversation---similar to techniques used in TGen, a single client instance can simulate traffic of many individual users.
			
 
				+Clients are designed to simulate one user per configuration file, with multiple conversations.
			
 
				+The client can take multiple configuration files, and also accepts globs—similar to techniques used in TGen, a single client can simulate traffic of many individual users.
			
 
				 The following example configuration with explanatory comments should be enough to understand almost everything you need:
			
 
				 
			
 
				 ```TOML
			
@@ -33,15 +52,18 @@ The following example configuration with explanatory comments should be enough t
 
				 # A name used for logs and to create unique circuits for each user on a client.
			
 
				 user = "Alice"
			
 
				 
			
 
				-# A name used for logs and to create unique circuits for each conversation,
			
 
				-# even when two chats share the same participants.
			
 
				-group = "group1"
			
 
				+# The <ip>:<port> of a socks5 proxy to connect through.
			
 
				+# Optional.
			
 
				+socks = "127.0.0.1:9050"
			
 
				 
			
 
				-# The list of participants, except the sender.
			
 
				-recipients = ["Bob", "Carol", "Dave"]
			
 
				 
			
 
				-# The <ip>:<port> of the socks5 proxy to connect through.
			
 
				-socks = "127.0.0.1:9050"
			
 
				+# The list of conversations associated with the user.
			
 
				+[[conversations]]
			
 
				+
			
 
				+# A conversation name used for logs, server-side lookups,
			
 
				+# and unique circuits for each conversation,
			
 
				+# even when two chats share the same participants.
			
 
				+group = "group1"
			
 
				 
			
 
				 # The <address>:<port> of the message server, where <address> is an IP or onion address.
			
 
				 server = "insert.ip.or.onion:6397"
			
@@ -57,7 +79,7 @@ retry = 5.0
 
				 
			
 
				 
			
 
				 # Parameters for distributions used by the Markov model.
			
 
				-[distributions]
			
 
				+[conversations.distributions]
			
 
				 
			
 
				 # Probabilities of Idle to Active transition after sending/receiving messages.
			
 
				 s = 0.5
			
@@ -80,31 +102,40 @@ a_r = {distribution = "Pareto", scale = 1.0, shape = 3.0}
 
				 
			
 
				 ```
			
 
				 
			
 
				-The client currently supports five probability distributions for message timings: Normal and LogNormal, Uniform, Exp(onential), and Pareto.
			
 
				-The parameter names can be found in the example above.
			
 
				+Additional examples can be found in the [client shadow test configurations](/shadow/client/shadow.data.template/hosts).
			
 
				+
			
 
				+The client currently supports six probability distributions for message timings: Normal and LogNormal, Uniform, Exp(onential), Pareto, and Weighted.
			
 
				+The parameter names can be found in the example above, except for Weighted (see the [Weighted section of this README](#weighted).
			
 
				 The distributions are sampled to return a double-precision floating point number of seconds.
			
 
				-The particular distributions and parameters used in the example are for demonstration purposes only, they have no relationship to empirical conversation behaviors.
			
 
				-When sampling, values below zero are clamped to 0---e.g., the `i` distribution above will have an outsize probability of yielding 0.0 seconds, instead of redistributing weight.
			
 
				+
			
 
				+The client currently supports five probability distributions for message sizes.
			
 
				+With their parameters, they are: Poisson (`lambda`: float), Binomial (`n`: integer, `p`: float), Geometric (`p`: float), Hypergeometric (`total_population_size`, `population_with_feature`, and `sample_size`: integers), and Weighted (see the [Weighted section of this README](#weighted)).
			
 
				+Floats and integer parameters are both 64-bits (i.e., double-precision floats and unsigned 64-bit ints, respectively).
			
 
				+
			
 
				+The particular distributions and parameters used in the example are for demonstration purposes only; they have no relationship to empirical conversation behaviors.
			
 
				+When sampling, values below zero are clamped to 0—e.g., the `i` distribution above will have an outsize probability of yielding 0.0 seconds, instead of redistributing weight.
			
 
				 Any distribution in the [rand_distr](https://docs.rs/rand_distr/latest/rand_distr/index.html) crate would be simple to add support for.
			
 
				 Distributions not in that crate can also be supported, but would require implementing.
			
 
				 
			
 
				 ### Peer configuration
			
 
				 
			
 
				 Running in peer-to-peer mode is very similar to running a client.
			
 
				-The only differences are that users and recipients consist of a name and address each, and there is no server.
			
 
				+The only differences are that recipients in a group must be listed, users and recipients consist of a name and address each, and there is no server.
			
 
				 Here is an example peer conversation configuration (again, all values are for demonstration purposes only):
			
 
				 
			
 
				 ```TOML
			
 
				 # peer-conversation.toml
			
 
				 
			
 
				 user = {name = "Alice", address = "127.0.0.1:6397"}
			
 
				+socks = "127.0.0.1:9050"
			
 
				+
			
 
				+[[conversations]]
			
 
				 group = "group1"
			
 
				 recipients = [{name = "Bob", address = "insert.ip.or.onion:6397"}]
			
 
				-socks = "127.0.0.1:9050"
			
 
				 bootstrap = 5.0
			
 
				 retry = 5.0
			
 
				 
			
 
				-[distributions]
			
 
				+[conversations.distributions]
			
 
				 s = 0.5
			
 
				 r = 0.1
			
 
				 m = {distribution = "Poisson", lambda = 1.0}
			
@@ -114,5 +145,25 @@ a_s = {distribution = "Normal", mean = 10.0, std_dev = 5.0}
 
				 a_r = {distribution = "Normal", mean = 10.0, std_dev = 5.0}
			
 
				 ```
			
 
				 
			
 
				+Additional examples can be found in the [peer shadow test configurations](/shadow/peer/shadow.data.template/hosts).
			
 
				+
			
 
				 In the likely case that these peers are connecting via onion addresses, you must configure each torrc file to match with each peer configuration (in the above example, Alice's HiddenService lines in the torrc must have a `HiddenServicePort` line that forwards to `127.0.0.1:6397`, and Bob's torrc must have a `HiddenServicePort` line that listens on `6397`).
			
 
				-Multiple users can share an onion address by using different ports (different cirtuits will be used).
			
 
				+Multiple users can share an onion address by using different ports (different circuits will be used), though doing so will of course not simulate, e.g., additional load in Tor's distributed hash table.
			
 
				+
			
 
				+### Weighted
			
 
				+Weighted distributions, which can be very large and commonly used across many clients or peers, are handled slightly differently from other distribution types.
			
 
				+`"Weighted"` can be used as the distribution type for both timing and message size distributions, with a single parameter of `weights_file`, which should be set to the path of the file storing the weights and values to be sampled from.
			
 
				+This file is a text file (technically, a CSV file), with two rows.
			
 
				+The first row is a series of non-negative integer weights, internally represented as 32-bit values that are then normalized into a probability distribution.
			
 
				+The second row is the corresponding values (32-bit integers for message sizes, double-precision floats for times).
			
 
				+When the distribution is sampled, the n'th value in the second row has a probability of being sampled equal to the n'th value of the first row divided by the sum of the first row.
			
 
				+The client/peer will abort if the two rows do not have the same number of elements.
			
 
				+Larger files will lead to slower initialization and higher memory overhead, but should still sample in constant time; see the docs for rand_distr's [WeightedAliasIndex](https://docs.rs/rand_distr/0.4.3/rand_distr/weighted_alias/struct.WeightedAliasIndex.html) for details.
			
 
				+
			
 
				+## Testing
			
 
				+
			
 
				+Unit tests can be run using the `cargo test` command.
			
 
				+Integration tests are performed using Shadow, so will require that it be installed, and probably in your environment's PATH.
			
 
				+You will also need the mgen executables to be in your environment's PATH for Shadow to find them (see the [installation instructions](#usage) above).
			
 
				+Because the tests are invoked in Shadow, there is no simple way to run them with cargo, so they are instead run using standalone shell scripts.
			
 
				+You can find them in this project's [shadow directory](/shadow).