108-mtbf-based-stability.txt 2.4 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
  1. Filename: 108-mtbf-based-stability.txt
  2. Title: Base "Stable" Flag on Mean Time Between Failures
  3. Version: $Revision: 12105 $
  4. Last-Modified: $Date: 2007-01-30T07:50:01.643717Z $
  5. Author: Nick Mathewson
  6. Created: 10-Mar-2007
  7. Status: Open
  8. Overview:
  9. This document proposes that we change how directory authorities set the
  10. stability flag from inspection of a router's declared Uptime to the
  11. authorities' perceived mean time between failure for the router.
  12. Motivation:
  13. Clients prefer nodes that the authorities call Stable. This flag is (as
  14. of 0.2.0.0-alpha-dev) set entirely based on the node's declared value for
  15. uptime. This creates an opportunity for malicious nodes to declare
  16. falsely high uptimes in order to get more traffic.
  17. Spec changes:
  18. Replace the current rule for setting the Stable flag with:
  19. "Stable" -- A router is 'Stable' if it is active and its observed Stability
  20. for the past month is at or above the median Stability for active routers.
  21. Routers are never called stable if they are running a version of Tor
  22. known to drop circuits stupidly. (0.1.1.10-alpha through 0.1.1.16-rc
  23. are stupid this way.)
  24. Stability shall be defined as the mean length of the runs observed by a
  25. given directory authority. A run begins when an authority decides
  26. that the server is Running, and ends when the authority decides that
  27. the server is not Running. In-progress runs are counted when
  28. measuring Stability.
  29. Issues:
  30. How do you define a clipped MTBF? If the current month begins with one
  31. day at the end of a one-year uptime, and then has 29 days of uptime, do we
  32. average one day and 29 days? Or do we average one year and 29 days? Or
  33. take 29 days on its own and discard the year?
  34. Surely somebody has done this kinds of thing before.
  35. Alternative:
  36. "A router's Stability shall be defined as the sum of $alpha ^ d$ for every
  37. $d$ such that the router was not observed to be unavailable $d$ days ago."
  38. This allows a simpler implementation: every day, we multiply yesterday's
  39. Stability by alpha, and if the router was running for all of today, we add
  40. 1.
  41. Limitations:
  42. Authorities can have false positives and false negatives when trying to
  43. tell whether a router is up or down. So long as these aren't terribly
  44. wrong, and so long as they aren't significantly biased, we should be able
  45. to use them to estimate stability pretty well.