12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061 |
- Filename: 108-mtbf-based-stability.txt
- Title: Base "Stable" Flag on Mean Time Between Failures
- Version: $Revision: 12105 $
- Last-Modified: $Date: 2007-01-30T07:50:01.643717Z $
- Author: Nick Mathewson
- Created: 10-Mar-2007
- Status: Open
- Overview:
- This document proposes that we change how directory authorities set the
- stability flag from inspection of a router's declared Uptime to the
- authorities' perceived mean time between failure for the router.
- Motivation:
- Clients prefer nodes that the authorities call Stable. This flag is (as
- of 0.2.0.0-alpha-dev) set entirely based on the node's declared value for
- uptime. This creates an opportunity for malicious nodes to declare
- falsely high uptimes in order to get more traffic.
- Spec changes:
- Replace the current rule for setting the Stable flag with:
- "Stable" -- A router is 'Stable' if it is active and its observed Stability
- for the past month is at or above the median Stability for active routers.
- Routers are never called stable if they are running a version of Tor
- known to drop circuits stupidly. (0.1.1.10-alpha through 0.1.1.16-rc
- are stupid this way.)
- Stability shall be defined as the mean length of the runs observed by a
- given directory authority. A run begins when an authority decides
- that the server is Running, and ends when the authority decides that
- the server is not Running. In-progress runs are counted when
- measuring Stability.
- Issues:
- How do you define a clipped MTBF? If the current month begins with one
- day at the end of a one-year uptime, and then has 29 days of uptime, do we
- average one day and 29 days? Or do we average one year and 29 days? Or
- take 29 days on its own and discard the year?
- Surely somebody has done this kinds of thing before.
- Alternative:
- "A router's Stability shall be defined as the sum of $alpha ^ d$ for every
- $d$ such that the router was not observed to be unavailable $d$ days ago."
- This allows a simpler implementation: every day, we multiply yesterday's
- Stability by alpha, and if the router was running for all of today, we add
- 1.
- Limitations:
- Authorities can have false positives and false negatives when trying to
- tell whether a router is up or down. So long as these aren't terribly
- wrong, and so long as they aren't significantly biased, we should be able
- to use them to estimate stability pretty well.
|