| 
					
				 | 
			
			
				@@ -0,0 +1,391 @@ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Filename: 166-statistics-extra-info-docs.txt 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Title: Including Network Statistics in Extra-Info Documents 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Author: Karsten Loesing 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Created: 21-Jul-2009 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Target: 0.2.2 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Status: Open 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Change history: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  21-Jul-2009  Initial proposal for or-dev 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Overview: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The Tor network has grown to almost two thousand relays and millions 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  of casual users over the past few years. With growth has come 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  increasing performance problems and attempts by some countries to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  block access to the Tor network. In order to address these problems, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  we need to learn more about the Tor network. This proposal suggests to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  measure additional statistics and include them in extra-info documents 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  to help us understand the Tor network better. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Introduction: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  As of May 2009, relays, bridges, and directories gather the following 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  data for statistical purposes: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Relays and bridges count the number of bytes that they have pushed 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    in 15-minute intervals over the past 24 hours. Relays and bridges 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    include these data in extra-info documents that they send to the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    directory authorities whenever they publish their server descriptor. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Bridges further include a rough number of clients per country that 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    they have seen in the past 48 hours in their extra-info documents. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Directories can be configured to count the number of clients they 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    see per country in the past 24 hours and to write them to a local 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    file. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Since then we extended the network statistics in Tor. These statistics 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  include: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Directories now gather more precise statistics about connecting 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    clients. Fixes include measuring in intervals of exactly 24 hours, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    counting unsuccessful requests, measuring download times, etc. The 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    directories append their statistics to a local file every 24 hours. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Entry guards count the number of clients per country per day like 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    bridges do and write them to a local file every 24 hours. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Relays measure statistics of the number of cells in their circuit 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    queues and how much time these cells spend waiting there. Relays 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    write these statistics to a local file every 24 hours. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Exit nodes count the number of read and written bytes on exit 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    connections per port as well as the number of opened exit streams 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    per port in 24-hour intervals. Exit nodes write their statistics to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+    a local file. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The following four sections contain descriptions for adding these 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  statistics to the relays' extra-info documents. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Directory request statistics: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The first type of statistics aims at measuring directory requests sent 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  by clients to a directory mirror or directory authority. More 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  precisely, these statistics aim at requests for v2 and v3 network 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  statuses only. These directory requests are sent non-anonymously, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  either via HTTP-like requests to a directory's Dir port or tunneled 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  over a 1-hop circuit. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Measuring directory request statistics is useful for several reasons: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  First, the number of locally seen directory requests can be used to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  estimate the total number of clients in the Tor network. Second, the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  country-wise classification of requests using a GeoIP database can 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  help counting the relative and absolute number of users per country. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Third, the download times can give hints on the available bandwidth 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  capacity at clients. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Directory requests do not give any hints on the contents that clients 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  send or receive over the Tor network. Every client requests network 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  statuses from the directories, so that there are no anonymity-related 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  concerns to gather these statistics. It might be, though, that clients 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  wish to hide the fact that they are connecting to the Tor network. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Therefore, IP addresses are resolved to country codes in memory, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  events are accumulated over 24 hours, and numbers are rounded up to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  multiples of 4 or 8. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      YYYY-MM-DD HH:MM:SS defines the end of the included measurement 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      interval of length NSEC seconds (86400 seconds by default). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      A "dirreq-stats-end" line, as well as any other "dirreq-*" line, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      is only added when the relay has opened its Dir port and after 24 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      hours of measuring directory requests. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v2-ips" CC=N,CC=N,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v3-ips" CC=N,CC=N,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      List of mappings from two-letter country codes to the number of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      unique IP addresses that have connected from that country to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      request a v2/v3 network status, rounded up to the nearest multiple 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      of 8. Only those IP addresses are counted that the directory can 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      answer with a 200 OK status code. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v2-reqs" CC=N,CC=N,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v3-reqs" CC=N,CC=N,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      List of mappings from two-letter country codes to the number of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      requests for v2/v3 network statuses from that country, rounded up 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      to the nearest multiple of 8. Only those requests are counted that 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      the directory can answer with a 200 OK status code. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v2-share" num% NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v3-share" num% NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      The share of v2/v3 network status requests that the directory 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      expects to receive from clients based on its advertised bandwidth 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      compared to the overall network bandwidth capacity. Shares are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      formatted in percent with two decimal places. Shares are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      calculated as means over the whole 24-hour interval. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v2-resp" status=num,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v3-resp" status=nul,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      List of mappings from response statuses to the number of requests 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      for v2/v3 network statuses that were answered with that response 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      status, rounded up to the nearest multiple of 4. Only response 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      statuses with at least 1 response are reported. New response 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      statuses can be added at any time. The current list of response 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      statuses is as follows: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "ok": a network status request is answered; this number 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         corresponds to the sum of all requests as reported in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         "dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         rounding up. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "not-enough-sigs: a version 3 network status is not signed by a 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         sufficient number of requested authorities. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "unavailable": a requested network status object is unavailable. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "not-found": a requested network status is not found. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "not-modified": a network status has not been modified since the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         If-Modified-Since time that is included in the request. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "busy": the directory is busy. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v2-direct-dl" key=val,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v3-direct-dl" key=val,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v2-tunneled-dl" key=val,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "dirreq-v3-tunneled-dl" key=val,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      List of statistics about possible failures in the download process 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      of v2/v3 network statuses. Requests are either "direct" 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      HTTP-encoded requests over the relay's directory port, or 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "tunneled" requests using a BEGIN_DIR cell over the relay's OR 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      port. The list of possible statistics can change, and statistics 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      can be left out from reporting. The current list of statistics is 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      as follows: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      Successful downloads and failures: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "complete": a client has finished the download successfully. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "timeout": a download did not finish within 10 minutes after 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         starting to send the response. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "running": a download is still running at the end of the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         measurement period for less than 10 minutes after starting to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         send the response. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      Download times: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "min", "max": smallest and largest measured bandwidth in B/s. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         bandwidth in B/s. For a given decile i, i/10 of all downloads 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         had a smaller bandwidth than di, and (10-i)/10 of all downloads 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         had a larger bandwidth than di. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         fourth of all downloads had a smaller bandwidth than q1, one 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         fourth of all downloads had a larger bandwidth than q3, and the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         remaining half of all downloads had a bandwidth between q1 and 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         q3. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "md": median of measured bandwidth in B/s. Half of the downloads 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         had a smaller bandwidth than md, the other half had a larger 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+         bandwidth than md. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Entry guard statistics: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Entry guard statistics include the number of clients per country and 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  per day that are connecting directly to an entry guard. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Entry guard statistics are important to learn more about the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  distribution of clients to countries. In the future, this knowledge 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  can be useful to detect if there are or start to be any restrictions 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  for clients connecting from specific countries. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The information which client connects to a given entry guard is very 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  sensitive. This information must not be combined with the information 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  what contents are leaving the network at the exit nodes. Therefore, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  entry guard statistics need to be aggregated to prevent them from 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  becoming useful for de-anonymization. Aggregation includes resolving 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  IP addresses to country codes, counting events over 24-hour intervals, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  and rounding up numbers to the next multiple of 8. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      YYYY-MM-DD HH:MM:SS defines the end of the included measurement 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      interval of length NSEC seconds (86400 seconds by default). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      An "entry-stats-end" line, as well as any other "entry-*" 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      line, is first added after the relay has been running for at least 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      24 hours. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "entry-ips" CC=N,CC=N,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      List of mappings from two-letter country codes to the number of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      unique IP addresses that have connected from that country to the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      relay and which are no known other relays, rounded up to the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      nearest multiple of 8. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Cell statistics: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The third type of statistics have to do with the time that cells spend 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  in circuit queues. In order to gather these statistics, the relay 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  memorizes when it puts a given cell in a circuit queue and when this 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  cell is flushed. The relay further notes the life time of the circuit. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  These data are sufficient to determine the mean number of cells in a 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  queue over time and the mean time that cells spend in a queue. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Cell statistics are necessary to learn more about possible reasons for 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  the poor network performance of the Tor network, especially high 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  latencies. The same statistics are also useful to determine the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  effects of design changes by comparing today's data with future data. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  There are basically no privacy concerns from measuring cell 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  statistics, regardless of a node being an entry, middle, or exit node. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      YYYY-MM-DD HH:MM:SS defines the end of the included measurement 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      interval of length NSEC seconds (86400 seconds by default). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      A "cell-stats-end" line, as well as any other "cell-*" line, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      is first added after the relay has been running for at least 24 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      hours. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "cell-processed-cells" num,...,num NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      Mean number of processed cells per circuit, subdivided into 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      deciles of circuits by the number of cells they have processed in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      descending order from loudest to quietest circuits. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "cell-queued-cells" num,...,num NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      Mean number of cells contained in queues by circuit decile. These 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      means are calculated by 1) determining the mean number of cells in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      a single circuit between its creation and its termination and 2) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      calculating the mean for all circuits in a given decile as 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      determined in "cell-processed-cells". Numbers have a precision of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      two decimal places. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "cell-time-in-queue" num,...,num NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      Mean time cells spend in circuit queues in milliseconds. Times are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      calculated by 1) determining the mean time cells spend in the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      queue of a single circuit and 2) calculating the mean for all 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      circuits in a given decile as determined in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      "cell-processed-cells". 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "cell-circuits-per-decile" num NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      Mean number of circuits that are included in any of the deciles, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      rounded up to the next integer. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Exit statistics: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The last type of statistics affects exit nodes counting the number of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  bytes written and read and the number of streams opened per port and 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  per 24 hours. Exit port statistics can be measured from looking of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  headers of BEGIN and DATA cells. A BEGIN cell contains the exit port 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  that is required for the exit node to open a new exit stream. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Subsequent DATA cells coming from the client or being sent back to the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  client contain a length field stating how many bytes of application 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  data are contained in the cell. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Exit port statistics are important to measure in order to identify 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  possible load-balancing problems with respect to exit policies. Exit 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  nodes that permit more ports than others are very likely overloaded 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  with traffic for those ports plus traffic for other ports. Improving 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  load balancing in the Tor network improves the overall utilization of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  bandwidth capacity. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Exit traffic is one of the most sensitive parts of network data in the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Tor network. Even though these statistics do not require looking at 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  traffic contents, statistics are aggregated so that they are not 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  useful for de-anonymizing users. Only those ports are reported that 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  have seen at least 0.1% of exiting or incoming bytes, numbers of bytes 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  are rounded up to full kibibytes (KiB), and stream numbers are rounded 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  up to the next multiple of 4. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      YYYY-MM-DD HH:MM:SS defines the end of the included measurement 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      interval of length NSEC seconds (86400 seconds by default). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      An "exit-stats-end" line, as well as any other "exit-*" line, is 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      first added after the relay has been running for at least 24 hours 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      and only if the relay permits exiting (where exiting to a single 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      port and IP address is sufficient). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "exit-kibibytes-written" port=N,port=N,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "exit-kibibytes-read" port=N,port=N,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      List of mappings from ports to the number of kibibytes that the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      relay has written to or read from exit connections to that port, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      rounded up to the next full kibibyte. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+   "exit-streams-opened" port=N,port=N,... NL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      [At most once.] 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      List of mappings from ports to the number of opened exit streams 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+      to that port, rounded up to the nearest multiple of 4. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Implementation notes: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  Right now, relays that are configured accordingly write similar 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  statistics to those described in this proposal to disk every 24 hours. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  With this proposal being implemented, relays include the contents of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  these files in extra-info documents. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  The following steps are necessary to implement this proposal: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  1. The current format of [dirreq|entry|buffer|exit]-stats files needs 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     to be adapted to the description in this proposal. This step 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     basically means renaming keywords. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  2. The timing of writing the four *-stats files should be unified, so 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     that they are written exactly after 24 hours after starting the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     relay. Right now, the measurement intervals for dirreq, entry, and 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     exit stats starts with the first observed request, and files are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     written when observing the first request that occurs more than 24 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     hours after the beginning of the measurement interval. With this 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     proposal, the measurement intervals should all start at the same 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     time, and files should be written exactly 24 hours later. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  3. It is advantageous to cache statistics in local files in the data 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     directory until they are included in extra-info documents. The 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     reason is that the 24-hour measurement interval can be very 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     different from the 18-hour publication interval of extra-info 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     documents. When a relay crashed after finishing a measurement 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     interval, but before publishing the next extra-info document, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     statistics would get lost. Therefore, statistics are written to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     disk when finishing a measurement interval and read from disk when 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     generating an extra-info document. As a result, the *-stats files 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     need to be overwritten after 24 hours, rather than appending new 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     statistics to them. Further, the contents of the *-stats files need 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     to be checked in the process of generating extra-info documents. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  4. With the statistics patches being tested, the ./configure options 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     should be removed and the statistics code be compiled by default. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     It is still required for relay operators to add configuration 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     options (DirReqStatistics, ExitPortStatistics, etc.) to enable 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     gathering statistics. However, in the near future, statistics shall 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     be enabled gathered by all relays by default, where requiring a 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+     ./configure option would be a barrier for many relay operators. 
			 |