#28 Add initial idle time distribution

Closed
opened 8 months ago by j3tracey · 0 comments

[N.b.: For the purposes of this bug, mostly ignore the active state and transitions to it, the concepts apply even when such transitions exist but it's easier to explain without them.]

Currently, mgen starts in the idle state, and samples the time it idles until it sends a message based on the provided I distribution, where I represents the amount of time spent idle before sending the next message. This seems right at first glance, but it is effectively saying that the user sent a message immediately before the messenger started. This does not reflect the intended effect of simulating starting from a random time -- if all messengers start at the same time, there is essentially an absurdly large message storm immediately before that time. It won't trigger active transitions, but it will mean we're sampling the wrong amount of expected time left until sending a message. The accurate sampling would be to sample according to the I distribution, but then further scaled by the size of that delay (then sampling uniformly at random within that amount of time or just dividing by 2 to get the expected value).

E.g., if a user waits a week between messages, but occasionally sends one message every 10 minutes for a day (again, all ignoring the active periods interspersed, where they would maybe send messages once a minute or whatever), then while it is true that we can mostly get away with sampling 10 minutes weighted by the number of 10 minute waits and 1 week weighted by the number of 1 week waits each time a user sends a message, the initial time will obviously be from the 1 week waits with far higher probability than the 10 minute waits, because we are starting at a random time, not after a random message.

This is the main reason we're seeing unusually high message counts. On a per-conversation basis, the message counts are reasonable for an hour. But, the number of conversations where messages are actively being sent per user is far too high, because we sample the number of conversations from the empirical number of conversations users generally are part of. This leads to an unusually high number of conversation participants that, while not "active", aren't fairly sampling from their initial idle states, causing the phantom message flood to turn into an actual message flood, in larger groups especially.

There are three ways I see of addressing this:

  • Sample the initial idle time properly in mgen. We have everything we need in the current I distribution, we just need to create a one-time-use distribution based on it that multiplies all the scaling factors by the associated values of the distribution. The problem with this is we can then expect most of these values to be extremely large, probably close to an hour (artificially inflated by how we represent long idle periods in conversations) or more (from times conversations had messages received but the user did not reply). This means that the simulation will have a bunch of conversations with no messages sent the entire duration of the conversation. This isn't wrong, per se, but will quickly use up resources on something that isn't giving us much useful data. We can't just free those resources early either, because we can't easily tell whether anyone else in the chat will eventually send a message, which could transition the state into active.
  • Decide that the number of conversations a user is in only represents the number of conversations where the user in question just sent an idle message. Mgen code would remain unchanged, but we would down-sample the number of conversations. The problems with this are that we have no data for the number of conversations a user just sent a message in, and that idle users are still part of conversations they're in, even if they send no idle messages, because they still have to download any messages sent, reply with delivery receipts, and possibly transition to active.
  • What I think the right approach, which is a sort of combination of the two, where most of the work is being done via configuration, but instead of redefining conversation count, we correctly sample from I to determine the initial wait time (this is already in the config, right now it's just constant for everyone). We can then go through each conversation and see if any conversations consist entirely of initial waits greater than the simulation runtime, and remove those (though we'd still want to register onion addresses for peers to test that aspect).
[N.b.: For the purposes of this bug, mostly ignore the active state and transitions to it, the concepts apply even when such transitions exist but it's easier to explain without them.] Currently, mgen starts in the idle state, and samples the time it idles until it sends a message based on the provided `I` distribution, where `I` represents the amount of time spent idle before sending the next message. This seems right at first glance, but it is effectively saying that the user sent a message *immediately before the messenger started*. This does not reflect the intended effect of simulating starting from a random time -- if all messengers start at the same time, there is essentially an absurdly large message storm immediately before that time. It won't trigger active transitions, but it will mean we're sampling the wrong amount of expected time left until sending a message. The accurate sampling would be to sample according to the `I` distribution, but then *further scaled by the size of that delay* (then sampling uniformly at random within that amount of time or just dividing by 2 to get the expected value). E.g., if a user waits a week between messages, but occasionally sends one message every 10 minutes for a day (again, all ignoring the active periods interspersed, where they would maybe send messages once a minute or whatever), then while it is true that we can mostly get away with sampling 10 minutes weighted by the number of 10 minute waits and 1 week weighted by the number of 1 week waits each time a user sends a message, the *initial* time will obviously be from the 1 week waits with far higher probability than the 10 minute waits, because we are starting at a random time, not after a random message. This is the main reason we're seeing unusually high message counts. On a per-conversation basis, the message counts are reasonable for an hour. But, the number of conversations where messages are actively being sent per user is far too high, because we sample the number of conversations from the empirical number of conversations users generally are part of. This leads to an unusually high number of conversation participants that, while not "active", aren't fairly sampling from their initial idle states, causing the phantom message flood to turn into an actual message flood, in larger groups especially. There are three ways I see of addressing this: - Sample the initial idle time properly in mgen. We have everything we need in the current `I` distribution, we just need to create a one-time-use distribution based on it that multiplies all the scaling factors by the associated values of the distribution. The problem with this is we can then expect most of these values to be extremely large, probably close to an hour (artificially inflated by how we represent long idle periods in conversations) or more (from times conversations had messages received but the user did not reply). This means that the simulation will have a bunch of conversations with no messages sent the entire duration of the conversation. This isn't wrong, per se, but will quickly use up resources on something that isn't giving us much useful data. We can't just free those resources early either, because we can't easily tell whether anyone else in the chat will eventually send a message, which could transition the state into active. - Decide that the number of conversations a user is in only represents the number of conversations where the user in question just sent an idle message. Mgen code would remain unchanged, but we would down-sample the number of conversations. The problems with this are that we have no data for the number of conversations a user just sent a message in, and that idle users are still part of conversations they're in, even if they send no idle messages, because they still have to download any messages sent, reply with delivery receipts, and possibly transition to active. - What I think the right approach, which is a sort of combination of the two, where most of the work is being done via configuration, but instead of redefining conversation count, we correctly sample from `I` to determine the initial wait time (this is already in the config, right now it's just constant for everyone). We can then go through each conversation and see if any conversations consist entirely of initial waits greater than the simulation runtime, and remove those (though we'd still want to register onion addresses for peers to test that aspect).
Sign in to join this conversation.
No Milestone
No assignee
1 Participants
Loading...
Cancel
Save
There is no content yet.