114-distributed-storage.txt 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441
  1. Filename: 114-distributed-storage.txt
  2. Title: Distributed Storage for Tor Hidden Service Descriptors
  3. Version: $Revision$
  4. Last-Modified: $Date$
  5. Author: Karsten Loesing
  6. Created: 13-May-2007
  7. Status: Closed
  8. Implemented-In: 0.2.0.x
  9. Change history:
  10. 13-May-2007 Initial proposal
  11. 14-May-2007 Added changes suggested by Lasse Øverlier
  12. 30-May-2007 Changed descriptor format, key length discussion, typos
  13. 09-Jul-2007 Incorporated suggestions by Roger, added status of specification
  14. and implementation for upcoming GSoC mid-term evaluation
  15. 11-Aug-2007 Updated implementation statuses, included non-consecutive
  16. replication to descriptor format
  17. 20-Aug-2007 Renamed config option HSDir as HidServDirectoryV2
  18. 02-Dec-2007 Closed proposal
  19. Overview:
  20. The basic idea of this proposal is to distribute the tasks of storing and
  21. serving hidden service descriptors from currently three authoritative
  22. directory nodes among a large subset of all onion routers. The three
  23. reasons to do this are better robustness (availability), better
  24. scalability, and improved security properties. Further,
  25. this proposal suggests changes to the hidden service descriptor format to
  26. prevent new security threats coming from decentralization and to gain even
  27. better security properties.
  28. Status:
  29. As of December 2007, the new hidden service descriptor format is implemented
  30. and usable. However, servers and clients do not yet make use of descriptor
  31. cookies, because there are open usability issues of this feature that might
  32. be resolved in proposal 121. Further, hidden service directories do not
  33. perform replication by themselves, because (unauthorized) replica fetch
  34. requests would allow any attacker to fetch all hidden service descriptors in
  35. the system. As neither issue is critical to the functioning of v2
  36. descriptors and their distribution, this proposal is considered as Closed.
  37. Motivation:
  38. The current design of hidden services exhibits the following performance and
  39. security problems:
  40. First, the three hidden service authoritative directories constitute a
  41. performance bottleneck in the system. The directory nodes are responsible for
  42. storing and serving all hidden service descriptors. As of May 2007 there are
  43. about 1000 descriptors at a time, but this number is assumed to increase in
  44. the future. Further, there is no replication protocol for descriptors between
  45. the three directory nodes, so that hidden services must ensure the
  46. availability of their descriptors by manually publishing them on all
  47. directory nodes. Whenever a fourth or fifth hidden service authoritative
  48. directory is added, hidden services will need to maintain an equally
  49. increasing number of replicas. These scalability issues have an impact on the
  50. current usage of hidden services and put an even higher burden on the
  51. development of new kinds of applications for hidden services that might
  52. require storing even more descriptors.
  53. Second, besides posing a limitation to scalability, storing all hidden
  54. service descriptors on three directory nodes also constitutes a security
  55. risk. The directory node operators could easily analyze the publish and fetch
  56. requests to derive information on service activity and usage and read the
  57. descriptor contents to determine which onion routers work as introduction
  58. points for a given hidden service and need to be attacked or threatened to
  59. shut it down. Furthermore, the contents of a hidden service descriptor offer
  60. only minimal security properties to the hidden service. Whoever gets aware of
  61. the service ID can easily find out whether the service is active at the
  62. moment and which introduction points it has. This applies to (former)
  63. clients, (former) introduction points, and of course to the directory nodes.
  64. It requires only to request the descriptor for the given service ID, which
  65. can be performed by anyone anonymously.
  66. This proposal suggests two major changes to approach the described
  67. performance and security problems:
  68. The first change affects the storage location for hidden service descriptors.
  69. Descriptors are distributed among a large subset of all onion routers instead
  70. of three fixed directory nodes. Each storing node is responsible for a subset
  71. of descriptors for a limited time only. It is not able to choose which
  72. descriptors it stores at a certain time, because this is determined by its
  73. onion ID which is hard to change frequently and in time (only routers which
  74. are stable for a given time are accepted as storing nodes). In order to
  75. resist single node failures and untrustworthy nodes, descriptors are
  76. replicated among a certain number of storing nodes. A first replication
  77. protocol makes sure that descriptors don't get lost when the node population
  78. changes; therefore, a storing node periodically requests the descriptors from
  79. its siblings. A second replication protocol distributes descriptors among
  80. non-consecutive nodes of the ID ring to prevent a group of adversaries from
  81. generating new onion keys until they have consecutive IDs to create a 'black
  82. hole' in the ring and make random services unavailable. Connections to
  83. storing nodes are established by extending existing circuits by one hop to
  84. the storing node. This also ensures that contents are encrypted. The effect
  85. of this first change is that the probability that a single node operator
  86. learns about a certain hidden service is very small and that it is very hard
  87. to track a service over time, even when it collaborates with other node
  88. operators.
  89. The second change concerns the content of hidden service descriptors.
  90. Obviously, security problems cannot be solved only by decentralizing storage;
  91. in fact, they could also get worse if done without caution. At first, a
  92. descriptor ID needs to change periodically in order to be stored on changing
  93. nodes over time. Next, the descriptor ID needs to be computable only for the
  94. service's clients, but should be unpredictable for all other nodes. Further,
  95. the storing node needs to be able to verify that the hidden service is the
  96. true originator of the descriptor with the given ID even though it is not a
  97. client. Finally, a storing node should learn as little information as
  98. necessary by storing a descriptor, because it might not be as trustworthy as
  99. a directory node; for example it does not need to know the list of
  100. introduction points. Therefore, a second key is applied that is only known to
  101. the hidden service provider and its clients and that is not included in the
  102. descriptor. It is used to calculate descriptor IDs and to encrypt the
  103. introduction points. This second key can either be given to all clients
  104. together with the hidden service ID, or to a group or a single client as
  105. an authentication token. In the future this second key could be the result of
  106. some key agreement protocol between the hidden service and one or more
  107. clients. A new text-based format is proposed for descriptors instead of an
  108. extension of the existing binary format for reasons of future extensibility.
  109. Design:
  110. The proposed design is described by the required changes to the current
  111. design. These requirements are grouped by content, rather than by affected
  112. specification documents or code files, and numbered for reference below.
  113. Hidden service clients, servers, and directories:
  114. /1/ Create routing list
  115. All participants can filter the consensus status document received from the
  116. directory authorities to one routing list containing only those servers
  117. that store and serve hidden service descriptors and which are running for
  118. at least 24 hours. A participant only trusts its own routing list and never
  119. learns about routing information from other parties.
  120. /2/ Determine responsible hidden service directory
  121. All participants can determine the hidden service directory that is
  122. responsible for storing and serving a given ID, as well as the hidden
  123. service directories that replicate its content. Every hidden service
  124. directory is responsible for the descriptor IDs in the interval from
  125. its predecessor, exclusive, to its own ID, inclusive. Further, a hidden
  126. service directory holds replicas for its n predecessors, where n denotes
  127. the number of consecutive replicas. (requires /1/)
  128. [/3/ and /4/ were requirements to use BEGIN_DIR cells for directory
  129. requests which have not been fulfilled in the course of the implementation
  130. of this proposal, but elsewhere.]
  131. Hidden service directory nodes:
  132. /5/ Advertise hidden service directory functionality
  133. Every onion router that has its directory port open can decide whether it
  134. wants to store and serve hidden service descriptors by setting a new config
  135. option "HidServDirectoryV2" 0|1 to 1. An onion router with this config
  136. option being set includes the flag "hidden-service-dir" in its router
  137. descriptors that it sends to directory authorities.
  138. /6/ Accept v2 publish requests, parse and store v2 descriptors
  139. Hidden service directory nodes accept publish requests for hidden service
  140. descriptors and store them to their local memory. (It is not necessary to
  141. make descriptors persistent, because after disconnecting, the onion router
  142. would not be accepted as storing node anyway, because it has not been
  143. running for at least 24 hours.) All requests and replies are formatted as
  144. HTTP messages. Requests are directed to the router's directory port and are
  145. contained within BEGIN_DIR cells. A hidden service directory node stores a
  146. descriptor only when it thinks that it is responsible for storing that
  147. descriptor based on its own routing table. Every hidden service directory
  148. node is responsible for the descriptor IDs in the interval of its n-th
  149. predecessor in the ID circle up to its own ID (n denotes the number of
  150. consecutive replicas). (requires /1/)
  151. /7/ Accept v2 fetch requests
  152. Same as /6/, but with fetch requests for hidden service descriptors.
  153. (requires /2/)
  154. /8/ Replicate descriptors with neighbors
  155. A hidden service directory node replicates descriptors from its two
  156. predecessors by downloading them once an hour. Further, it checks its
  157. routing table periodically for changes. Whenever it realizes that a
  158. predecessor has left the network, it establishes a connection to the new
  159. n-th predecessor and requests its stored descriptors in the interval of its
  160. (n+1)-th predecessor and the requested n-th predecessor. Whenever it
  161. realizes that a new onion router has joined with an ID higher than its
  162. former n-th predecessor, it adds it to its predecessors and discards all
  163. descriptors in the interval of its (n+1)-th and its n-th predecessor.
  164. (requires /1/)
  165. [Dec 02: This function has not been implemented, because arbitrary nodes
  166. what have been able to download the entire set of v2 descriptors. An
  167. authorized replication request would be necessary. For the moment, the
  168. system runs without any directory-side replication. -KL]
  169. Authoritative directory nodes:
  170. /9/ Confirm a router's hidden service directory functionality
  171. Directory nodes include a new flag "HSDir" for routers that decided to
  172. provide storage for hidden service descriptors and that are running for at
  173. least 24 hours. The last requirement prevents a node from frequently
  174. changing its onion key to become responsible for an identifier it wants to
  175. target.
  176. Hidden service provider:
  177. /10/ Configure v2 hidden service
  178. Each hidden service provider that has set the config option
  179. "PublishV2HidServDescriptors" 0|1 to 1 is configured to publish v2
  180. descriptors and conform to the v2 connection establishment protocol. When
  181. configuring a hidden service, a hidden service provider checks if it has
  182. already created a random secret_cookie and a hostname2 file; if not, it
  183. creates both of them. (requires /2/)
  184. /11/ Establish introduction points with fresh key
  185. If configured to publish only v2 descriptors and no v0/v1 descriptors any
  186. more, a hidden service provider that is setting up the hidden service at
  187. introduction points does not pass its own public key, but the public key
  188. of a freshly generated key pair. It also includes these fresh public keys
  189. in the hidden service descriptor together with the other introduction point
  190. information. The reason is that the introduction point does not need to and
  191. therefore should not know for which hidden service it works, so as to
  192. prevent it from tracking the hidden service's activity. (If a hidden
  193. service provider supports both, v0/v1 and v2 descriptors, v0/v1 clients
  194. rely on the fact that all introduction points accept the same public key,
  195. so that this new feature cannot be used.)
  196. /12/ Encode v2 descriptors and send v2 publish requests
  197. If configured to publish v2 descriptors, a hidden service provider
  198. publishes a new descriptor whenever its content changes or a new
  199. publication period starts for this descriptor. If the current publication
  200. period would only last for less than 60 minutes (= 2 x 30 minutes to allow
  201. the server to be 30 minutes behind and the client 30 minutes ahead), the
  202. hidden service provider publishes both a current descriptor and one for
  203. the next period. Publication is performed by sending the descriptor to all
  204. hidden service directories that are responsible for keeping replicas for
  205. the descriptor ID. This includes two non-consecutive replicas that are
  206. stored at 3 consecutive nodes each. (requires /1/ and /2/)
  207. Hidden service client:
  208. /13/ Send v2 fetch requests
  209. A hidden service client that has set the config option
  210. "FetchV2HidServDescriptors" 0|1 to 1 handles SOCKS requests for v2 onion
  211. addresses by requesting a v2 descriptor from a randomly chosen hidden
  212. service directory that is responsible for keeping replica for the
  213. descriptor ID. In total there are six replicas of which the first and the
  214. last three are stored on consecutive nodes. The probability of picking one
  215. of the three consecutive replicas is 1/6, 2/6, and 3/6 to incorporate the
  216. fact that the availability will be the highest on the node with next higher
  217. ID. A hidden service client relies on the hidden service provider to store
  218. two sets of descriptors to compensate clock skew between service and
  219. client. (requires /1/ and /2/)
  220. /14/ Process v2 fetch reply and parse v2 descriptors
  221. A hidden service client that has sent a request for a v2 descriptor can
  222. parse it and store it to the local cache of rendezvous service descriptors.
  223. /15/ Establish connection to v2 hidden service
  224. A hidden service client can establish a connection to a hidden service
  225. using a v2 descriptor. This includes using the secret cookie for decrypting
  226. the introduction points contained in the descriptor. When contacting an
  227. introduction point, the client does not use the public key of the hidden
  228. service provider, but the freshly-generated public key that is included in
  229. the hidden service descriptor. Whether or not a fresh key is used instead
  230. of the key of the hidden service depends on the available protocol versions
  231. that are included in the descriptor; by this, connection establishment is
  232. to a certain extend decoupled from fetching the descriptor.
  233. Hidden service descriptor:
  234. (Requirements concerning the descriptor format are contained in /6/ and /7/.)
  235. The new v2 hidden service descriptor format looks like this:
  236. onion-address = h(public-key) + cookie
  237. descriptor-id = h(h(public-key) + h(time-period + cookie + relica))
  238. descriptor-content = {
  239. descriptor-id,
  240. version,
  241. public-key,
  242. h(time-period + cookie + replica),
  243. timestamp,
  244. protocol-versions,
  245. { introduction-points } encrypted with cookie
  246. } signed with private-key
  247. The "descriptor-id" needs to change periodically in order for the
  248. descriptor to be stored on changing nodes over time. It may only be
  249. computable by a hidden service provider and all of his clients to prevent
  250. unauthorized nodes from tracking the service activity by periodically
  251. checking whether there is a descriptor for this service. Finally, the
  252. hidden service directory needs to be able to verify that the hidden service
  253. provider is the true originator of the descriptor with the given ID.
  254. Therefore, "descriptor-id" is derived from the "public-key" of the hidden
  255. service provider, the current "time-period" which changes every 24 hours,
  256. a secret "cookie" shared between hidden service provider and clients, and
  257. a "replica" denoting the number of this non-consecutive replica. (The
  258. "time-period" is constructed in a way that time periods do not change at
  259. the same moment for all descriptors by deriving a value between 0:00 and
  260. 23:59 hours from h(public-key) and making the descriptors of this hidden
  261. service provider expire at that time of the day.) The "descriptor-id" is
  262. defined to be 160 bits long. [extending the "descriptor-id" length
  263. suggested by LØ]
  264. Only the hidden service provider and the clients are able to generate
  265. future "descriptor-ID"s. Hence, the "onion-address" is extended from now
  266. the hash value of "public-key" by the secret "cookie". The "public-key" is
  267. determined to be 80 bits long, whereas the "cookie" is dimensioned to be
  268. 120 bits long. This makes a total of 200 bits or 40 base32 chars, which is
  269. quite a lot to handle for a human, but necessary to provide sufficient
  270. protection against an adversary from generating a key pair with same
  271. "public-key" hash or guessing the "cookie".
  272. A hidden service directory can verify that a descriptor was created by the
  273. hidden service provider by checking if the "descriptor-id" corresponds to
  274. the "public-key" and if the signature can be verified with the
  275. "public-key".
  276. The "introduction-points" that are included in the descriptor are encrypted
  277. using the same "cookie" that is shared between hidden service provider and
  278. clients. [correction to use another key than h(time-period + cookie) as
  279. encryption key for introduction points made by LØ]
  280. A new text-based format is proposed for descriptors instead of an extension
  281. of the existing binary format for reasons of future extensibility.
  282. Security implications:
  283. The security implications of the proposed changes are grouped by the roles of
  284. nodes that could perform attacks or on which attacks could be performed.
  285. Attacks by authoritative directory nodes
  286. Authoritative directory nodes are no longer the single places in the
  287. network that know about a hidden service's activity and introduction
  288. points. Thus, they cannot perform attacks using this information, e.g.
  289. track a hidden service's activity or usage pattern or attack its
  290. introduction points. Formerly, it would only require a single corrupted
  291. authoritative directory operator to perform such an attack.
  292. Attacks by hidden service directory nodes
  293. A hidden service directory node could misuse a stored descriptor to track a
  294. hidden service's activity and usage pattern by clients. Though there is no
  295. countermeasure against this kind of attack, it is very expensive to track a
  296. certain hidden service over time. An attacker would need to run a large
  297. number of stable onion routers that work as hidden service directory nodes
  298. to have a good probability to become responsible for its changing
  299. descriptor IDs. For each period, the probability is:
  300. 1-(N-c choose r)/(N choose r) for N-c>=r and 1 otherwise, with N
  301. as total
  302. number of hidden service directories, c as compromised nodes, and r as
  303. number of replicas
  304. The hidden service directory nodes could try to make a certain hidden
  305. service unavailable to its clients. Therefore, they could discard all
  306. stored descriptors for that hidden service and reply to clients that there
  307. is no descriptor for the given ID or return an old or false descriptor
  308. content. The client would detect a false descriptor, because it could not
  309. contain a correct signature. But an old content or an empty reply could
  310. confuse the client. Therefore, the countermeasure is to replicate
  311. descriptors among a small number of hidden service directories, e.g. 5.
  312. The probability of a group of collaborating nodes to make a hidden service
  313. completely unavailable is in each period:
  314. (c choose r)/(N choose r) for c>=r and N>=r, and 0 otherwise,
  315. with N as total
  316. number of hidden service directories, c as compromised nodes, and r as
  317. number of replicas
  318. A hidden service directory could try to find out which introduction points
  319. are working on behalf of a hidden service. In contrast to the previous
  320. design, this is not possible anymore, because this information is encrypted
  321. to the clients of a hidden service.
  322. Attacks on hidden service directory nodes
  323. An anonymous attacker could try to swamp a hidden service directory with
  324. false descriptors for a given descriptor ID. This is prevented by requiring
  325. that descriptors are signed.
  326. Anonymous attackers could swamp a hidden service directory with correct
  327. descriptors for non-existing hidden services. There is no countermeasure
  328. against this attack. However, the creation of valid descriptors is more
  329. expensive than verification and storage in local memory. This should make
  330. this kind of attack unattractive.
  331. Attacks by introduction points
  332. Current or former introduction points could try to gain information on the
  333. hidden service they serve. But due to the fresh key pair that is used by
  334. the hidden service, this attack is not possible anymore.
  335. Attacks by clients
  336. Current or former clients could track a hidden service's activity, attack
  337. its introduction points, or determine the responsible hidden service
  338. directory nodes and attack them. There is nothing that could prevent them
  339. from doing so, because honest clients need the full descriptor content to
  340. establish a connection to the hidden service. At the moment, the only
  341. countermeasure against dishonest clients is to change the secret cookie and
  342. pass it only to the honest clients.
  343. Compatibility:
  344. The proposed design is meant to replace the current design for hidden service
  345. descriptors and their storage in the long run.
  346. There should be a first transition phase in which both, the current design
  347. and the proposed design are served in parallel. Onion routers should start
  348. serving as hidden service directories, and hidden service providers and
  349. clients should make use of the new design if both sides support it. Hidden
  350. service providers should be allowed to publish descriptors of the current
  351. format in parallel, and authoritative directories should continue storing and
  352. serving these descriptors.
  353. After the first transition phase, hidden service providers should stop
  354. publishing descriptors on authoritative directories, and hidden service
  355. clients should not try to fetch descriptors from the authoritative
  356. directories. However, the authoritative directories should continue serving
  357. hidden service descriptors for a second transition phase. As of this point,
  358. all v2 config options should be set to a default value of 1.
  359. After the second transition phase, the authoritative directories should stop
  360. serving hidden service descriptors.