123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391 |
- Filename: 166-statistics-extra-info-docs.txt
- Title: Including Network Statistics in Extra-Info Documents
- Author: Karsten Loesing
- Created: 21-Jul-2009
- Target: 0.2.2
- Status: Accepted
- Change history:
- 21-Jul-2009 Initial proposal for or-dev
- Overview:
- The Tor network has grown to almost two thousand relays and millions
- of casual users over the past few years. With growth has come
- increasing performance problems and attempts by some countries to
- block access to the Tor network. In order to address these problems,
- we need to learn more about the Tor network. This proposal suggests to
- measure additional statistics and include them in extra-info documents
- to help us understand the Tor network better.
- Introduction:
- As of May 2009, relays, bridges, and directories gather the following
- data for statistical purposes:
- - Relays and bridges count the number of bytes that they have pushed
- in 15-minute intervals over the past 24 hours. Relays and bridges
- include these data in extra-info documents that they send to the
- directory authorities whenever they publish their server descriptor.
- - Bridges further include a rough number of clients per country that
- they have seen in the past 48 hours in their extra-info documents.
- - Directories can be configured to count the number of clients they
- see per country in the past 24 hours and to write them to a local
- file.
- Since then we extended the network statistics in Tor. These statistics
- include:
- - Directories now gather more precise statistics about connecting
- clients. Fixes include measuring in intervals of exactly 24 hours,
- counting unsuccessful requests, measuring download times, etc. The
- directories append their statistics to a local file every 24 hours.
- - Entry guards count the number of clients per country per day like
- bridges do and write them to a local file every 24 hours.
- - Relays measure statistics of the number of cells in their circuit
- queues and how much time these cells spend waiting there. Relays
- write these statistics to a local file every 24 hours.
- - Exit nodes count the number of read and written bytes on exit
- connections per port as well as the number of opened exit streams
- per port in 24-hour intervals. Exit nodes write their statistics to
- a local file.
- The following four sections contain descriptions for adding these
- statistics to the relays' extra-info documents.
- Directory request statistics:
- The first type of statistics aims at measuring directory requests sent
- by clients to a directory mirror or directory authority. More
- precisely, these statistics aim at requests for v2 and v3 network
- statuses only. These directory requests are sent non-anonymously,
- either via HTTP-like requests to a directory's Dir port or tunneled
- over a 1-hop circuit.
- Measuring directory request statistics is useful for several reasons:
- First, the number of locally seen directory requests can be used to
- estimate the total number of clients in the Tor network. Second, the
- country-wise classification of requests using a GeoIP database can
- help counting the relative and absolute number of users per country.
- Third, the download times can give hints on the available bandwidth
- capacity at clients.
- Directory requests do not give any hints on the contents that clients
- send or receive over the Tor network. Every client requests network
- statuses from the directories, so that there are no anonymity-related
- concerns to gather these statistics. It might be, though, that clients
- wish to hide the fact that they are connecting to the Tor network.
- Therefore, IP addresses are resolved to country codes in memory,
- events are accumulated over 24 hours, and numbers are rounded up to
- multiples of 4 or 8.
- "dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
- [At most once.]
- YYYY-MM-DD HH:MM:SS defines the end of the included measurement
- interval of length NSEC seconds (86400 seconds by default).
- A "dirreq-stats-end" line, as well as any other "dirreq-*" line,
- is only added when the relay has opened its Dir port and after 24
- hours of measuring directory requests.
- "dirreq-v2-ips" CC=N,CC=N,... NL
- [At most once.]
- "dirreq-v3-ips" CC=N,CC=N,... NL
- [At most once.]
- List of mappings from two-letter country codes to the number of
- unique IP addresses that have connected from that country to
- request a v2/v3 network status, rounded up to the nearest multiple
- of 8. Only those IP addresses are counted that the directory can
- answer with a 200 OK status code.
- "dirreq-v2-reqs" CC=N,CC=N,... NL
- [At most once.]
- "dirreq-v3-reqs" CC=N,CC=N,... NL
- [At most once.]
- List of mappings from two-letter country codes to the number of
- requests for v2/v3 network statuses from that country, rounded up
- to the nearest multiple of 8. Only those requests are counted that
- the directory can answer with a 200 OK status code.
- "dirreq-v2-share" num% NL
- [At most once.]
- "dirreq-v3-share" num% NL
- [At most once.]
- The share of v2/v3 network status requests that the directory
- expects to receive from clients based on its advertised bandwidth
- compared to the overall network bandwidth capacity. Shares are
- formatted in percent with two decimal places. Shares are
- calculated as means over the whole 24-hour interval.
- "dirreq-v2-resp" status=num,... NL
- [At most once.]
- "dirreq-v3-resp" status=nul,... NL
- [At most once.]
- List of mappings from response statuses to the number of requests
- for v2/v3 network statuses that were answered with that response
- status, rounded up to the nearest multiple of 4. Only response
- statuses with at least 1 response are reported. New response
- statuses can be added at any time. The current list of response
- statuses is as follows:
- "ok": a network status request is answered; this number
- corresponds to the sum of all requests as reported in
- "dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before
- rounding up.
- "not-enough-sigs: a version 3 network status is not signed by a
- sufficient number of requested authorities.
- "unavailable": a requested network status object is unavailable.
- "not-found": a requested network status is not found.
- "not-modified": a network status has not been modified since the
- If-Modified-Since time that is included in the request.
- "busy": the directory is busy.
- "dirreq-v2-direct-dl" key=val,... NL
- [At most once.]
- "dirreq-v3-direct-dl" key=val,... NL
- [At most once.]
- "dirreq-v2-tunneled-dl" key=val,... NL
- [At most once.]
- "dirreq-v3-tunneled-dl" key=val,... NL
- [At most once.]
- List of statistics about possible failures in the download process
- of v2/v3 network statuses. Requests are either "direct"
- HTTP-encoded requests over the relay's directory port, or
- "tunneled" requests using a BEGIN_DIR cell over the relay's OR
- port. The list of possible statistics can change, and statistics
- can be left out from reporting. The current list of statistics is
- as follows:
- Successful downloads and failures:
- "complete": a client has finished the download successfully.
- "timeout": a download did not finish within 10 minutes after
- starting to send the response.
- "running": a download is still running at the end of the
- measurement period for less than 10 minutes after starting to
- send the response.
- Download times:
- "min", "max": smallest and largest measured bandwidth in B/s.
- "d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured
- bandwidth in B/s. For a given decile i, i/10 of all downloads
- had a smaller bandwidth than di, and (10-i)/10 of all downloads
- had a larger bandwidth than di.
- "q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One
- fourth of all downloads had a smaller bandwidth than q1, one
- fourth of all downloads had a larger bandwidth than q3, and the
- remaining half of all downloads had a bandwidth between q1 and
- q3.
- "md": median of measured bandwidth in B/s. Half of the downloads
- had a smaller bandwidth than md, the other half had a larger
- bandwidth than md.
- Entry guard statistics:
- Entry guard statistics include the number of clients per country and
- per day that are connecting directly to an entry guard.
- Entry guard statistics are important to learn more about the
- distribution of clients to countries. In the future, this knowledge
- can be useful to detect if there are or start to be any restrictions
- for clients connecting from specific countries.
- The information which client connects to a given entry guard is very
- sensitive. This information must not be combined with the information
- what contents are leaving the network at the exit nodes. Therefore,
- entry guard statistics need to be aggregated to prevent them from
- becoming useful for de-anonymization. Aggregation includes resolving
- IP addresses to country codes, counting events over 24-hour intervals,
- and rounding up numbers to the next multiple of 8.
- "entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
- [At most once.]
- YYYY-MM-DD HH:MM:SS defines the end of the included measurement
- interval of length NSEC seconds (86400 seconds by default).
- An "entry-stats-end" line, as well as any other "entry-*"
- line, is first added after the relay has been running for at least
- 24 hours.
- "entry-ips" CC=N,CC=N,... NL
- [At most once.]
- List of mappings from two-letter country codes to the number of
- unique IP addresses that have connected from that country to the
- relay and which are no known other relays, rounded up to the
- nearest multiple of 8.
- Cell statistics:
- The third type of statistics have to do with the time that cells spend
- in circuit queues. In order to gather these statistics, the relay
- memorizes when it puts a given cell in a circuit queue and when this
- cell is flushed. The relay further notes the life time of the circuit.
- These data are sufficient to determine the mean number of cells in a
- queue over time and the mean time that cells spend in a queue.
- Cell statistics are necessary to learn more about possible reasons for
- the poor network performance of the Tor network, especially high
- latencies. The same statistics are also useful to determine the
- effects of design changes by comparing today's data with future data.
- There are basically no privacy concerns from measuring cell
- statistics, regardless of a node being an entry, middle, or exit node.
- "cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
- [At most once.]
- YYYY-MM-DD HH:MM:SS defines the end of the included measurement
- interval of length NSEC seconds (86400 seconds by default).
- A "cell-stats-end" line, as well as any other "cell-*" line,
- is first added after the relay has been running for at least 24
- hours.
- "cell-processed-cells" num,...,num NL
- [At most once.]
- Mean number of processed cells per circuit, subdivided into
- deciles of circuits by the number of cells they have processed in
- descending order from loudest to quietest circuits.
- "cell-queued-cells" num,...,num NL
- [At most once.]
- Mean number of cells contained in queues by circuit decile. These
- means are calculated by 1) determining the mean number of cells in
- a single circuit between its creation and its termination and 2)
- calculating the mean for all circuits in a given decile as
- determined in "cell-processed-cells". Numbers have a precision of
- two decimal places.
- "cell-time-in-queue" num,...,num NL
- [At most once.]
- Mean time cells spend in circuit queues in milliseconds. Times are
- calculated by 1) determining the mean time cells spend in the
- queue of a single circuit and 2) calculating the mean for all
- circuits in a given decile as determined in
- "cell-processed-cells".
- "cell-circuits-per-decile" num NL
- [At most once.]
- Mean number of circuits that are included in any of the deciles,
- rounded up to the next integer.
- Exit statistics:
- The last type of statistics affects exit nodes counting the number of
- bytes written and read and the number of streams opened per port and
- per 24 hours. Exit port statistics can be measured from looking at
- headers of BEGIN and DATA cells. A BEGIN cell contains the exit port
- that is required for the exit node to open a new exit stream.
- Subsequent DATA cells coming from the client or being sent back to the
- client contain a length field stating how many bytes of application
- data are contained in the cell.
- Exit port statistics are important to measure in order to identify
- possible load-balancing problems with respect to exit policies. Exit
- nodes that permit more ports than others are very likely overloaded
- with traffic for those ports plus traffic for other ports. Improving
- load balancing in the Tor network improves the overall utilization of
- bandwidth capacity.
- Exit traffic is one of the most sensitive parts of network data in the
- Tor network. Even though these statistics do not require looking at
- traffic contents, statistics are aggregated so that they are not
- useful for de-anonymizing users. Only those ports are reported that
- have seen at least 0.1% of exiting or incoming bytes, numbers of bytes
- are rounded up to full kibibytes (KiB), and stream numbers are rounded
- up to the next multiple of 4.
- "exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
- [At most once.]
- YYYY-MM-DD HH:MM:SS defines the end of the included measurement
- interval of length NSEC seconds (86400 seconds by default).
- An "exit-stats-end" line, as well as any other "exit-*" line, is
- first added after the relay has been running for at least 24 hours
- and only if the relay permits exiting (where exiting to a single
- port and IP address is sufficient).
- "exit-kibibytes-written" port=N,port=N,... NL
- [At most once.]
- "exit-kibibytes-read" port=N,port=N,... NL
- [At most once.]
- List of mappings from ports to the number of kibibytes that the
- relay has written to or read from exit connections to that port,
- rounded up to the next full kibibyte.
- "exit-streams-opened" port=N,port=N,... NL
- [At most once.]
- List of mappings from ports to the number of opened exit streams
- to that port, rounded up to the nearest multiple of 4.
- Implementation notes:
- Right now, relays that are configured accordingly write similar
- statistics to those described in this proposal to disk every 24 hours.
- With this proposal being implemented, relays include the contents of
- these files in extra-info documents.
- The following steps are necessary to implement this proposal:
- 1. The current format of [dirreq|entry|buffer|exit]-stats files needs
- to be adapted to the description in this proposal. This step
- basically means renaming keywords.
- 2. The timing of writing the four *-stats files should be unified, so
- that they are written exactly 24 hours after starting the
- relay. Right now, the measurement intervals for dirreq, entry, and
- exit stats starts with the first observed request, and files are
- written when observing the first request that occurs more than 24
- hours after the beginning of the measurement interval. With this
- proposal, the measurement intervals should all start at the same
- time, and files should be written exactly 24 hours later.
- 3. It is advantageous to cache statistics in local files in the data
- directory until they are included in extra-info documents. The
- reason is that the 24-hour measurement interval can be very
- different from the 18-hour publication interval of extra-info
- documents. When a relay crashes after finishing a measurement
- interval, but before publishing the next extra-info document,
- statistics would get lost. Therefore, statistics are written to
- disk when finishing a measurement interval and read from disk when
- generating an extra-info document. Only the statistics that were
- appended to the *-stats files within the past 24 hours are included
- in extra-info documents. Further, the contents of the *-stats files
- need to be checked in the process of generating extra-info documents.
- 4. With the statistics patches being tested, the ./configure options
- should be removed and the statistics code be compiled by default.
- It is still required for relay operators to add configuration
- options (DirReqStatistics, ExitPortStatistics, etc.) to enable
- gathering statistics. However, in the near future, statistics shall
- be enabled gathered by all relays by default, where requiring a
- ./configure option would be a barrier for many relay operators.
|