126-geoip-reporting.txt 5.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
  1. Filename: 126-geoip-fetching.txt
  2. Title: Fetching GeoIP databases for clients, relays, and bridges
  3. Version: $Revision: 11988 $
  4. Last-Modified: $Date: 2007-10-16 12:59:42 -0400 (Tue, 16 Oct 2007) $
  5. Author: Roger Dingledine
  6. Created: 2007-11-24
  7. Status: Open
  8. 1. Background and motivation
  9. Right now we can keep a rough count of Tor users, both total and by
  10. country, by watching connections to a single directory mirror. Being
  11. able to get usage estimates is useful both for our funders (to
  12. demonstrate progress) and for our own development (so we know how
  13. quickly we're scaling and can design accordingly, and so we know which
  14. countries and communities to focus on more). This need for information
  15. is the only reason we haven't deployed "directory guards" (think of
  16. them like entry guards but for directory information; in practice,
  17. it would seem that Tor clients should simply use their entry guards
  18. as their directory guards).
  19. With the move toward bridges, we will no longer be able to track Tor
  20. clients that use bridges, since they use their bridges as directory
  21. guards. Further, we need to be able to learn which bridges stop seeing
  22. use from certain countries (and are thus likely blocked), so we can
  23. avoid giving them out to other users in those countries.
  24. Right now we support GeoIP lookups through Vidalia: Vidalia draws relays
  25. and circuits on its 'network map', and it performs anonymized GeoIP
  26. lookups to its central servers to know where to put the dots. Vidalia
  27. caches answers it gets -- to reduce delay, to reduce overhead on
  28. the network, and to reduce anonymity issues where users reveal their
  29. behavior through which IP addresses they ask about.
  30. But with the advent of bridges, Tor clients are asking about IP
  31. addresses that aren't in the main directory. In particular, bridge
  32. users tell the central Vidalia servers about each bridge as they
  33. discover it and their Vidalia tries to map it.
  34. Also, we wouldn't mind letting Vidalia do a GeoIP lookup on the client's
  35. own IP address, so it can provide a more useful map.
  36. Also, Vidalia's central servers leave users open to partitioning
  37. attacks, even if they can't target specific users. Further, as we
  38. start using GeoIP results for more operational or security-relevant
  39. goals, such as avoiding or including particular countries in circuits,
  40. it becomes more important that users can't be singled out in terms of
  41. their IP-to-country mapping beliefs.
  42. This proposal describes a way for Tor relays, bridges, and clients to
  43. download a local copy of a GeoIP database, so they can do local private
  44. queries. Thus we can avoid sending detailed queries to central servers.
  45. 2. Publishing and caching the GeoIP database
  46. We assume that we use a free GeoIP db, like ip2country. We will need
  47. to standardize on its format; see Section 5.
  48. Each v3 directory authority should put a copy of the "geoip" file in
  49. its datadirectory. Then its votes should include a hash of this file,
  50. and the resulting consensus directory should specify the consensus hash.
  51. There should be a new URL for fetching this geoip db (by "current.z"
  52. for testing purposes, and by hash.z for typical downloads). Authorities
  53. should fetch and serve the one listed in the consensus, even when they
  54. vote for their own. This would argue for storing the cached version
  55. in a better filename than "geoip".
  56. Directory mirrors should keep a copy of this file available via the
  57. same URLs.
  58. We assume that the file would change at most a few times a month. Should
  59. Tor ship with a bootstrap geoip file?
  60. 3. Clients use it for Vidalia
  61. Tor fetches the geoip file as above, and puts it in Tor's DataDirectory.
  62. Then we could have a status event that tells controllers that a new
  63. geoip file has arrived.
  64. Then Vidalia would either read the file directly, or we would add
  65. a control protocol interface for querying. Since Tor probably needs
  66. to parse the file itself (see Section 4 below), offering the control
  67. interface is probably cleanest.
  68. There should be a config option to disable updating the geoip file,
  69. in case users want to use their own file (e.g. they have a proprietary
  70. GeoIP file they prefer to use). In that case we leave it up to the
  71. user to update his geoip file out-of-band.
  72. 4. Bridges use it for usage summaries
  73. Once bridges have a GeoIP database locally, they can start to publish
  74. sanitized summaries of client usage -- how many users they see and from
  75. what countries. This might also be a more useful way for ordinary Tor
  76. relays to convey the level of usage they see.
  77. But how to safely summarize this information without opening too many
  78. anonymity leaks seems hard, so I'm going to leave it for a different
  79. proposal.
  80. 5. Which db to use?
  81. A recent ip-to-country.csv is 3421362 bytes. Compressed, it is 564252
  82. bytes. This isn't so bad. But we can easily cut it down further; some
  83. sample lines are:
  84. "205500992","208605279","US","USA","UNITED STATES"
  85. "208605280","208605311","CA","CAN","CANADA"
  86. "208605312","210784255","US","USA","UNITED STATES"
  87. My guess is the compression will solve most of the redundancy, so we
  88. can stick with the default format.
  89. http://ip-to-country.webhosting.info/node/view/5
  90. The maxmind GeoLite Country database is also about 500KB compressed.
  91. http://www.maxmind.com/app/geolitecountry
  92. The maxmind GeoLite City database gives more finegrained detail, such
  93. as geo coordinates and city name. Vidalia currently makes use of this
  94. information. On the other hand it's 16MB compressed, which would seem
  95. to be out of our reach.
  96. http://www.maxmind.com/app/geolitecity
  97. What other options are there?