| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391 | Filename: 166-statistics-extra-info-docs.txtTitle: Including Network Statistics in Extra-Info DocumentsAuthor: Karsten LoesingCreated: 21-Jul-2009Target: 0.2.2Status: AcceptedChange history:  21-Jul-2009  Initial proposal for or-devOverview:  The Tor network has grown to almost two thousand relays and millions  of casual users over the past few years. With growth has come  increasing performance problems and attempts by some countries to  block access to the Tor network. In order to address these problems,  we need to learn more about the Tor network. This proposal suggests to  measure additional statistics and include them in extra-info documents  to help us understand the Tor network better.Introduction:  As of May 2009, relays, bridges, and directories gather the following  data for statistical purposes:  - Relays and bridges count the number of bytes that they have pushed    in 15-minute intervals over the past 24 hours. Relays and bridges    include these data in extra-info documents that they send to the    directory authorities whenever they publish their server descriptor.  - Bridges further include a rough number of clients per country that    they have seen in the past 48 hours in their extra-info documents.  - Directories can be configured to count the number of clients they    see per country in the past 24 hours and to write them to a local    file.  Since then we extended the network statistics in Tor. These statistics  include:  - Directories now gather more precise statistics about connecting    clients. Fixes include measuring in intervals of exactly 24 hours,    counting unsuccessful requests, measuring download times, etc. The    directories append their statistics to a local file every 24 hours.  - Entry guards count the number of clients per country per day like    bridges do and write them to a local file every 24 hours.  - Relays measure statistics of the number of cells in their circuit    queues and how much time these cells spend waiting there. Relays    write these statistics to a local file every 24 hours.  - Exit nodes count the number of read and written bytes on exit    connections per port as well as the number of opened exit streams    per port in 24-hour intervals. Exit nodes write their statistics to    a local file.  The following four sections contain descriptions for adding these  statistics to the relays' extra-info documents.Directory request statistics:  The first type of statistics aims at measuring directory requests sent  by clients to a directory mirror or directory authority. More  precisely, these statistics aim at requests for v2 and v3 network  statuses only. These directory requests are sent non-anonymously,  either via HTTP-like requests to a directory's Dir port or tunneled  over a 1-hop circuit.  Measuring directory request statistics is useful for several reasons:  First, the number of locally seen directory requests can be used to  estimate the total number of clients in the Tor network. Second, the  country-wise classification of requests using a GeoIP database can  help counting the relative and absolute number of users per country.  Third, the download times can give hints on the available bandwidth  capacity at clients.  Directory requests do not give any hints on the contents that clients  send or receive over the Tor network. Every client requests network  statuses from the directories, so that there are no anonymity-related  concerns to gather these statistics. It might be, though, that clients  wish to hide the fact that they are connecting to the Tor network.  Therefore, IP addresses are resolved to country codes in memory,  events are accumulated over 24 hours, and numbers are rounded up to  multiples of 4 or 8.   "dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL      [At most once.]      YYYY-MM-DD HH:MM:SS defines the end of the included measurement      interval of length NSEC seconds (86400 seconds by default).      A "dirreq-stats-end" line, as well as any other "dirreq-*" line,      is only added when the relay has opened its Dir port and after 24      hours of measuring directory requests.   "dirreq-v2-ips" CC=N,CC=N,... NL      [At most once.]   "dirreq-v3-ips" CC=N,CC=N,... NL      [At most once.]      List of mappings from two-letter country codes to the number of      unique IP addresses that have connected from that country to      request a v2/v3 network status, rounded up to the nearest multiple      of 8. Only those IP addresses are counted that the directory can      answer with a 200 OK status code.   "dirreq-v2-reqs" CC=N,CC=N,... NL      [At most once.]   "dirreq-v3-reqs" CC=N,CC=N,... NL      [At most once.]      List of mappings from two-letter country codes to the number of      requests for v2/v3 network statuses from that country, rounded up      to the nearest multiple of 8. Only those requests are counted that      the directory can answer with a 200 OK status code.   "dirreq-v2-share" num% NL      [At most once.]   "dirreq-v3-share" num% NL      [At most once.]      The share of v2/v3 network status requests that the directory      expects to receive from clients based on its advertised bandwidth      compared to the overall network bandwidth capacity. Shares are      formatted in percent with two decimal places. Shares are      calculated as means over the whole 24-hour interval.   "dirreq-v2-resp" status=num,... NL      [At most once.]   "dirreq-v3-resp" status=nul,... NL      [At most once.]      List of mappings from response statuses to the number of requests      for v2/v3 network statuses that were answered with that response      status, rounded up to the nearest multiple of 4. Only response      statuses with at least 1 response are reported. New response      statuses can be added at any time. The current list of response      statuses is as follows:      "ok": a network status request is answered; this number         corresponds to the sum of all requests as reported in         "dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before         rounding up.      "not-enough-sigs: a version 3 network status is not signed by a         sufficient number of requested authorities.      "unavailable": a requested network status object is unavailable.      "not-found": a requested network status is not found.      "not-modified": a network status has not been modified since the         If-Modified-Since time that is included in the request.      "busy": the directory is busy.   "dirreq-v2-direct-dl" key=val,... NL      [At most once.]   "dirreq-v3-direct-dl" key=val,... NL      [At most once.]   "dirreq-v2-tunneled-dl" key=val,... NL      [At most once.]   "dirreq-v3-tunneled-dl" key=val,... NL      [At most once.]      List of statistics about possible failures in the download process      of v2/v3 network statuses. Requests are either "direct"      HTTP-encoded requests over the relay's directory port, or      "tunneled" requests using a BEGIN_DIR cell over the relay's OR      port. The list of possible statistics can change, and statistics      can be left out from reporting. The current list of statistics is      as follows:      Successful downloads and failures:      "complete": a client has finished the download successfully.      "timeout": a download did not finish within 10 minutes after         starting to send the response.      "running": a download is still running at the end of the         measurement period for less than 10 minutes after starting to         send the response.      Download times:      "min", "max": smallest and largest measured bandwidth in B/s.      "d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured         bandwidth in B/s. For a given decile i, i/10 of all downloads         had a smaller bandwidth than di, and (10-i)/10 of all downloads         had a larger bandwidth than di.      "q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One         fourth of all downloads had a smaller bandwidth than q1, one         fourth of all downloads had a larger bandwidth than q3, and the         remaining half of all downloads had a bandwidth between q1 and         q3.      "md": median of measured bandwidth in B/s. Half of the downloads         had a smaller bandwidth than md, the other half had a larger         bandwidth than md.Entry guard statistics:  Entry guard statistics include the number of clients per country and  per day that are connecting directly to an entry guard.  Entry guard statistics are important to learn more about the  distribution of clients to countries. In the future, this knowledge  can be useful to detect if there are or start to be any restrictions  for clients connecting from specific countries.  The information which client connects to a given entry guard is very  sensitive. This information must not be combined with the information  what contents are leaving the network at the exit nodes. Therefore,  entry guard statistics need to be aggregated to prevent them from  becoming useful for de-anonymization. Aggregation includes resolving  IP addresses to country codes, counting events over 24-hour intervals,  and rounding up numbers to the next multiple of 8.   "entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL      [At most once.]      YYYY-MM-DD HH:MM:SS defines the end of the included measurement      interval of length NSEC seconds (86400 seconds by default).      An "entry-stats-end" line, as well as any other "entry-*"      line, is first added after the relay has been running for at least      24 hours.   "entry-ips" CC=N,CC=N,... NL      [At most once.]      List of mappings from two-letter country codes to the number of      unique IP addresses that have connected from that country to the      relay and which are no known other relays, rounded up to the      nearest multiple of 8.Cell statistics:  The third type of statistics have to do with the time that cells spend  in circuit queues. In order to gather these statistics, the relay  memorizes when it puts a given cell in a circuit queue and when this  cell is flushed. The relay further notes the life time of the circuit.  These data are sufficient to determine the mean number of cells in a  queue over time and the mean time that cells spend in a queue.  Cell statistics are necessary to learn more about possible reasons for  the poor network performance of the Tor network, especially high  latencies. The same statistics are also useful to determine the  effects of design changes by comparing today's data with future data.  There are basically no privacy concerns from measuring cell  statistics, regardless of a node being an entry, middle, or exit node.   "cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL      [At most once.]      YYYY-MM-DD HH:MM:SS defines the end of the included measurement      interval of length NSEC seconds (86400 seconds by default).      A "cell-stats-end" line, as well as any other "cell-*" line,      is first added after the relay has been running for at least 24      hours.   "cell-processed-cells" num,...,num NL      [At most once.]      Mean number of processed cells per circuit, subdivided into      deciles of circuits by the number of cells they have processed in      descending order from loudest to quietest circuits.   "cell-queued-cells" num,...,num NL      [At most once.]      Mean number of cells contained in queues by circuit decile. These      means are calculated by 1) determining the mean number of cells in      a single circuit between its creation and its termination and 2)      calculating the mean for all circuits in a given decile as      determined in "cell-processed-cells". Numbers have a precision of      two decimal places.   "cell-time-in-queue" num,...,num NL      [At most once.]      Mean time cells spend in circuit queues in milliseconds. Times are      calculated by 1) determining the mean time cells spend in the      queue of a single circuit and 2) calculating the mean for all      circuits in a given decile as determined in      "cell-processed-cells".   "cell-circuits-per-decile" num NL      [At most once.]      Mean number of circuits that are included in any of the deciles,      rounded up to the next integer.Exit statistics:  The last type of statistics affects exit nodes counting the number of  bytes written and read and the number of streams opened per port and  per 24 hours. Exit port statistics can be measured from looking at  headers of BEGIN and DATA cells. A BEGIN cell contains the exit port  that is required for the exit node to open a new exit stream.  Subsequent DATA cells coming from the client or being sent back to the  client contain a length field stating how many bytes of application  data are contained in the cell.  Exit port statistics are important to measure in order to identify  possible load-balancing problems with respect to exit policies. Exit  nodes that permit more ports than others are very likely overloaded  with traffic for those ports plus traffic for other ports. Improving  load balancing in the Tor network improves the overall utilization of  bandwidth capacity.  Exit traffic is one of the most sensitive parts of network data in the  Tor network. Even though these statistics do not require looking at  traffic contents, statistics are aggregated so that they are not  useful for de-anonymizing users. Only those ports are reported that  have seen at least 0.1% of exiting or incoming bytes, numbers of bytes  are rounded up to full kibibytes (KiB), and stream numbers are rounded  up to the next multiple of 4.   "exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL      [At most once.]      YYYY-MM-DD HH:MM:SS defines the end of the included measurement      interval of length NSEC seconds (86400 seconds by default).      An "exit-stats-end" line, as well as any other "exit-*" line, is      first added after the relay has been running for at least 24 hours      and only if the relay permits exiting (where exiting to a single      port and IP address is sufficient).   "exit-kibibytes-written" port=N,port=N,... NL      [At most once.]   "exit-kibibytes-read" port=N,port=N,... NL      [At most once.]      List of mappings from ports to the number of kibibytes that the      relay has written to or read from exit connections to that port,      rounded up to the next full kibibyte.   "exit-streams-opened" port=N,port=N,... NL      [At most once.]      List of mappings from ports to the number of opened exit streams      to that port, rounded up to the nearest multiple of 4.Implementation notes:  Right now, relays that are configured accordingly write similar  statistics to those described in this proposal to disk every 24 hours.  With this proposal being implemented, relays include the contents of  these files in extra-info documents.  The following steps are necessary to implement this proposal:  1. The current format of [dirreq|entry|buffer|exit]-stats files needs     to be adapted to the description in this proposal. This step     basically means renaming keywords.  2. The timing of writing the four *-stats files should be unified, so     that they are written exactly 24 hours after starting the     relay. Right now, the measurement intervals for dirreq, entry, and     exit stats starts with the first observed request, and files are     written when observing the first request that occurs more than 24     hours after the beginning of the measurement interval. With this     proposal, the measurement intervals should all start at the same     time, and files should be written exactly 24 hours later.  3. It is advantageous to cache statistics in local files in the data     directory until they are included in extra-info documents. The     reason is that the 24-hour measurement interval can be very     different from the 18-hour publication interval of extra-info     documents. When a relay crashes after finishing a measurement     interval, but before publishing the next extra-info document,     statistics would get lost. Therefore, statistics are written to     disk when finishing a measurement interval and read from disk when     generating an extra-info document. Only the statistics that were     appended to the *-stats files within the past 24 hours are included     in extra-info documents. Further, the contents of the *-stats files     need to be checked in the process of generating extra-info documents.  4. With the statistics patches being tested, the ./configure options     should be removed and the statistics code be compiled by default.     It is still required for relay operators to add configuration     options (DirReqStatistics, ExitPortStatistics, etc.) to enable     gathering statistics. However, in the near future, statistics shall     be enabled gathered by all relays by default, where requiring a     ./configure option would be a barrier for many relay operators.
 |