166-statistics-extra-info-docs.txt 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391
  1. Filename: 166-statistics-extra-info-docs.txt
  2. Title: Including Network Statistics in Extra-Info Documents
  3. Author: Karsten Loesing
  4. Created: 21-Jul-2009
  5. Target: 0.2.2
  6. Status: Accepted
  7. Change history:
  8. 21-Jul-2009 Initial proposal for or-dev
  9. Overview:
  10. The Tor network has grown to almost two thousand relays and millions
  11. of casual users over the past few years. With growth has come
  12. increasing performance problems and attempts by some countries to
  13. block access to the Tor network. In order to address these problems,
  14. we need to learn more about the Tor network. This proposal suggests to
  15. measure additional statistics and include them in extra-info documents
  16. to help us understand the Tor network better.
  17. Introduction:
  18. As of May 2009, relays, bridges, and directories gather the following
  19. data for statistical purposes:
  20. - Relays and bridges count the number of bytes that they have pushed
  21. in 15-minute intervals over the past 24 hours. Relays and bridges
  22. include these data in extra-info documents that they send to the
  23. directory authorities whenever they publish their server descriptor.
  24. - Bridges further include a rough number of clients per country that
  25. they have seen in the past 48 hours in their extra-info documents.
  26. - Directories can be configured to count the number of clients they
  27. see per country in the past 24 hours and to write them to a local
  28. file.
  29. Since then we extended the network statistics in Tor. These statistics
  30. include:
  31. - Directories now gather more precise statistics about connecting
  32. clients. Fixes include measuring in intervals of exactly 24 hours,
  33. counting unsuccessful requests, measuring download times, etc. The
  34. directories append their statistics to a local file every 24 hours.
  35. - Entry guards count the number of clients per country per day like
  36. bridges do and write them to a local file every 24 hours.
  37. - Relays measure statistics of the number of cells in their circuit
  38. queues and how much time these cells spend waiting there. Relays
  39. write these statistics to a local file every 24 hours.
  40. - Exit nodes count the number of read and written bytes on exit
  41. connections per port as well as the number of opened exit streams
  42. per port in 24-hour intervals. Exit nodes write their statistics to
  43. a local file.
  44. The following four sections contain descriptions for adding these
  45. statistics to the relays' extra-info documents.
  46. Directory request statistics:
  47. The first type of statistics aims at measuring directory requests sent
  48. by clients to a directory mirror or directory authority. More
  49. precisely, these statistics aim at requests for v2 and v3 network
  50. statuses only. These directory requests are sent non-anonymously,
  51. either via HTTP-like requests to a directory's Dir port or tunneled
  52. over a 1-hop circuit.
  53. Measuring directory request statistics is useful for several reasons:
  54. First, the number of locally seen directory requests can be used to
  55. estimate the total number of clients in the Tor network. Second, the
  56. country-wise classification of requests using a GeoIP database can
  57. help counting the relative and absolute number of users per country.
  58. Third, the download times can give hints on the available bandwidth
  59. capacity at clients.
  60. Directory requests do not give any hints on the contents that clients
  61. send or receive over the Tor network. Every client requests network
  62. statuses from the directories, so that there are no anonymity-related
  63. concerns to gather these statistics. It might be, though, that clients
  64. wish to hide the fact that they are connecting to the Tor network.
  65. Therefore, IP addresses are resolved to country codes in memory,
  66. events are accumulated over 24 hours, and numbers are rounded up to
  67. multiples of 4 or 8.
  68. "dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
  69. [At most once.]
  70. YYYY-MM-DD HH:MM:SS defines the end of the included measurement
  71. interval of length NSEC seconds (86400 seconds by default).
  72. A "dirreq-stats-end" line, as well as any other "dirreq-*" line,
  73. is only added when the relay has opened its Dir port and after 24
  74. hours of measuring directory requests.
  75. "dirreq-v2-ips" CC=N,CC=N,... NL
  76. [At most once.]
  77. "dirreq-v3-ips" CC=N,CC=N,... NL
  78. [At most once.]
  79. List of mappings from two-letter country codes to the number of
  80. unique IP addresses that have connected from that country to
  81. request a v2/v3 network status, rounded up to the nearest multiple
  82. of 8. Only those IP addresses are counted that the directory can
  83. answer with a 200 OK status code.
  84. "dirreq-v2-reqs" CC=N,CC=N,... NL
  85. [At most once.]
  86. "dirreq-v3-reqs" CC=N,CC=N,... NL
  87. [At most once.]
  88. List of mappings from two-letter country codes to the number of
  89. requests for v2/v3 network statuses from that country, rounded up
  90. to the nearest multiple of 8. Only those requests are counted that
  91. the directory can answer with a 200 OK status code.
  92. "dirreq-v2-share" num% NL
  93. [At most once.]
  94. "dirreq-v3-share" num% NL
  95. [At most once.]
  96. The share of v2/v3 network status requests that the directory
  97. expects to receive from clients based on its advertised bandwidth
  98. compared to the overall network bandwidth capacity. Shares are
  99. formatted in percent with two decimal places. Shares are
  100. calculated as means over the whole 24-hour interval.
  101. "dirreq-v2-resp" status=num,... NL
  102. [At most once.]
  103. "dirreq-v3-resp" status=nul,... NL
  104. [At most once.]
  105. List of mappings from response statuses to the number of requests
  106. for v2/v3 network statuses that were answered with that response
  107. status, rounded up to the nearest multiple of 4. Only response
  108. statuses with at least 1 response are reported. New response
  109. statuses can be added at any time. The current list of response
  110. statuses is as follows:
  111. "ok": a network status request is answered; this number
  112. corresponds to the sum of all requests as reported in
  113. "dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before
  114. rounding up.
  115. "not-enough-sigs: a version 3 network status is not signed by a
  116. sufficient number of requested authorities.
  117. "unavailable": a requested network status object is unavailable.
  118. "not-found": a requested network status is not found.
  119. "not-modified": a network status has not been modified since the
  120. If-Modified-Since time that is included in the request.
  121. "busy": the directory is busy.
  122. "dirreq-v2-direct-dl" key=val,... NL
  123. [At most once.]
  124. "dirreq-v3-direct-dl" key=val,... NL
  125. [At most once.]
  126. "dirreq-v2-tunneled-dl" key=val,... NL
  127. [At most once.]
  128. "dirreq-v3-tunneled-dl" key=val,... NL
  129. [At most once.]
  130. List of statistics about possible failures in the download process
  131. of v2/v3 network statuses. Requests are either "direct"
  132. HTTP-encoded requests over the relay's directory port, or
  133. "tunneled" requests using a BEGIN_DIR cell over the relay's OR
  134. port. The list of possible statistics can change, and statistics
  135. can be left out from reporting. The current list of statistics is
  136. as follows:
  137. Successful downloads and failures:
  138. "complete": a client has finished the download successfully.
  139. "timeout": a download did not finish within 10 minutes after
  140. starting to send the response.
  141. "running": a download is still running at the end of the
  142. measurement period for less than 10 minutes after starting to
  143. send the response.
  144. Download times:
  145. "min", "max": smallest and largest measured bandwidth in B/s.
  146. "d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured
  147. bandwidth in B/s. For a given decile i, i/10 of all downloads
  148. had a smaller bandwidth than di, and (10-i)/10 of all downloads
  149. had a larger bandwidth than di.
  150. "q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One
  151. fourth of all downloads had a smaller bandwidth than q1, one
  152. fourth of all downloads had a larger bandwidth than q3, and the
  153. remaining half of all downloads had a bandwidth between q1 and
  154. q3.
  155. "md": median of measured bandwidth in B/s. Half of the downloads
  156. had a smaller bandwidth than md, the other half had a larger
  157. bandwidth than md.
  158. Entry guard statistics:
  159. Entry guard statistics include the number of clients per country and
  160. per day that are connecting directly to an entry guard.
  161. Entry guard statistics are important to learn more about the
  162. distribution of clients to countries. In the future, this knowledge
  163. can be useful to detect if there are or start to be any restrictions
  164. for clients connecting from specific countries.
  165. The information which client connects to a given entry guard is very
  166. sensitive. This information must not be combined with the information
  167. what contents are leaving the network at the exit nodes. Therefore,
  168. entry guard statistics need to be aggregated to prevent them from
  169. becoming useful for de-anonymization. Aggregation includes resolving
  170. IP addresses to country codes, counting events over 24-hour intervals,
  171. and rounding up numbers to the next multiple of 8.
  172. "entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
  173. [At most once.]
  174. YYYY-MM-DD HH:MM:SS defines the end of the included measurement
  175. interval of length NSEC seconds (86400 seconds by default).
  176. An "entry-stats-end" line, as well as any other "entry-*"
  177. line, is first added after the relay has been running for at least
  178. 24 hours.
  179. "entry-ips" CC=N,CC=N,... NL
  180. [At most once.]
  181. List of mappings from two-letter country codes to the number of
  182. unique IP addresses that have connected from that country to the
  183. relay and which are no known other relays, rounded up to the
  184. nearest multiple of 8.
  185. Cell statistics:
  186. The third type of statistics have to do with the time that cells spend
  187. in circuit queues. In order to gather these statistics, the relay
  188. memorizes when it puts a given cell in a circuit queue and when this
  189. cell is flushed. The relay further notes the life time of the circuit.
  190. These data are sufficient to determine the mean number of cells in a
  191. queue over time and the mean time that cells spend in a queue.
  192. Cell statistics are necessary to learn more about possible reasons for
  193. the poor network performance of the Tor network, especially high
  194. latencies. The same statistics are also useful to determine the
  195. effects of design changes by comparing today's data with future data.
  196. There are basically no privacy concerns from measuring cell
  197. statistics, regardless of a node being an entry, middle, or exit node.
  198. "cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
  199. [At most once.]
  200. YYYY-MM-DD HH:MM:SS defines the end of the included measurement
  201. interval of length NSEC seconds (86400 seconds by default).
  202. A "cell-stats-end" line, as well as any other "cell-*" line,
  203. is first added after the relay has been running for at least 24
  204. hours.
  205. "cell-processed-cells" num,...,num NL
  206. [At most once.]
  207. Mean number of processed cells per circuit, subdivided into
  208. deciles of circuits by the number of cells they have processed in
  209. descending order from loudest to quietest circuits.
  210. "cell-queued-cells" num,...,num NL
  211. [At most once.]
  212. Mean number of cells contained in queues by circuit decile. These
  213. means are calculated by 1) determining the mean number of cells in
  214. a single circuit between its creation and its termination and 2)
  215. calculating the mean for all circuits in a given decile as
  216. determined in "cell-processed-cells". Numbers have a precision of
  217. two decimal places.
  218. "cell-time-in-queue" num,...,num NL
  219. [At most once.]
  220. Mean time cells spend in circuit queues in milliseconds. Times are
  221. calculated by 1) determining the mean time cells spend in the
  222. queue of a single circuit and 2) calculating the mean for all
  223. circuits in a given decile as determined in
  224. "cell-processed-cells".
  225. "cell-circuits-per-decile" num NL
  226. [At most once.]
  227. Mean number of circuits that are included in any of the deciles,
  228. rounded up to the next integer.
  229. Exit statistics:
  230. The last type of statistics affects exit nodes counting the number of
  231. bytes written and read and the number of streams opened per port and
  232. per 24 hours. Exit port statistics can be measured from looking at
  233. headers of BEGIN and DATA cells. A BEGIN cell contains the exit port
  234. that is required for the exit node to open a new exit stream.
  235. Subsequent DATA cells coming from the client or being sent back to the
  236. client contain a length field stating how many bytes of application
  237. data are contained in the cell.
  238. Exit port statistics are important to measure in order to identify
  239. possible load-balancing problems with respect to exit policies. Exit
  240. nodes that permit more ports than others are very likely overloaded
  241. with traffic for those ports plus traffic for other ports. Improving
  242. load balancing in the Tor network improves the overall utilization of
  243. bandwidth capacity.
  244. Exit traffic is one of the most sensitive parts of network data in the
  245. Tor network. Even though these statistics do not require looking at
  246. traffic contents, statistics are aggregated so that they are not
  247. useful for de-anonymizing users. Only those ports are reported that
  248. have seen at least 0.1% of exiting or incoming bytes, numbers of bytes
  249. are rounded up to full kibibytes (KiB), and stream numbers are rounded
  250. up to the next multiple of 4.
  251. "exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
  252. [At most once.]
  253. YYYY-MM-DD HH:MM:SS defines the end of the included measurement
  254. interval of length NSEC seconds (86400 seconds by default).
  255. An "exit-stats-end" line, as well as any other "exit-*" line, is
  256. first added after the relay has been running for at least 24 hours
  257. and only if the relay permits exiting (where exiting to a single
  258. port and IP address is sufficient).
  259. "exit-kibibytes-written" port=N,port=N,... NL
  260. [At most once.]
  261. "exit-kibibytes-read" port=N,port=N,... NL
  262. [At most once.]
  263. List of mappings from ports to the number of kibibytes that the
  264. relay has written to or read from exit connections to that port,
  265. rounded up to the next full kibibyte.
  266. "exit-streams-opened" port=N,port=N,... NL
  267. [At most once.]
  268. List of mappings from ports to the number of opened exit streams
  269. to that port, rounded up to the nearest multiple of 4.
  270. Implementation notes:
  271. Right now, relays that are configured accordingly write similar
  272. statistics to those described in this proposal to disk every 24 hours.
  273. With this proposal being implemented, relays include the contents of
  274. these files in extra-info documents.
  275. The following steps are necessary to implement this proposal:
  276. 1. The current format of [dirreq|entry|buffer|exit]-stats files needs
  277. to be adapted to the description in this proposal. This step
  278. basically means renaming keywords.
  279. 2. The timing of writing the four *-stats files should be unified, so
  280. that they are written exactly 24 hours after starting the
  281. relay. Right now, the measurement intervals for dirreq, entry, and
  282. exit stats starts with the first observed request, and files are
  283. written when observing the first request that occurs more than 24
  284. hours after the beginning of the measurement interval. With this
  285. proposal, the measurement intervals should all start at the same
  286. time, and files should be written exactly 24 hours later.
  287. 3. It is advantageous to cache statistics in local files in the data
  288. directory until they are included in extra-info documents. The
  289. reason is that the 24-hour measurement interval can be very
  290. different from the 18-hour publication interval of extra-info
  291. documents. When a relay crashes after finishing a measurement
  292. interval, but before publishing the next extra-info document,
  293. statistics would get lost. Therefore, statistics are written to
  294. disk when finishing a measurement interval and read from disk when
  295. generating an extra-info document. Only the statistics that were
  296. appended to the *-stats files within the past 24 hours are included
  297. in extra-info documents. Further, the contents of the *-stats files
  298. need to be checked in the process of generating extra-info documents.
  299. 4. With the statistics patches being tested, the ./configure options
  300. should be removed and the statistics code be compiled by default.
  301. It is still required for relay operators to add configuration
  302. options (DirReqStatistics, ExitPortStatistics, etc.) to enable
  303. gathering statistics. However, in the near future, statistics shall
  304. be enabled gathered by all relays by default, where requiring a
  305. ./configure option would be a barrier for many relay operators.