114-distributed-storage.txt 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415
  1. Filename: 114-distributed-storage.txt
  2. Title: Distributed Storage for Tor Hidden Service Descriptors
  3. Version: $Revision$
  4. Last-Modified: $Date$
  5. Author: Karsten Loesing
  6. Created: 13-May-2007
  7. Status: Open
  8. Change history:
  9. 13-May-2007 Initial proposal
  10. 14-May-2007 Added changes suggested by Lasse Overlier
  11. Overview:
  12. The basic idea of this proposal is to distribute the tasks of storing and
  13. serving hidden service descriptors from currently three authoritative
  14. directory nodes among a large subset of all onion routers. The two reasons
  15. to do this are better scalability and improved security properties. Further,
  16. this proposal suggests changes to the hidden service descriptor format to
  17. prevent from new security threads coming from decentralization and to gain
  18. even better security properties.
  19. Motivation:
  20. The current design of hidden services exhibits the following performance and
  21. security problems:
  22. First, the three hidden service authoritative directories constitute a
  23. performance bottleneck in the system. The directory nodes are responsible
  24. for storing and serving all hidden service descriptors. At the moment there
  25. are about 1000 descriptors at a time, but this number is assumed to increase
  26. in the future. Further, there is no replication protocol for descriptors
  27. between the three directory nodes, so that hidden services must ensure the
  28. availability of their descriptors by manually publishing them on all
  29. directory nodes. Whenever a fourth or fifth hidden service authoritative
  30. directory was added, hidden services would need to maintain an equally
  31. increasing number of replicas. These scalability issues have an impact on
  32. the current usage of hidden services and put an even higher burden on the
  33. development of new kinds of applications for hidden services that might
  34. require to store even bigger numbers of descriptors.
  35. Second, besides of posing a limitation to scalability, storing all hidden
  36. service descriptors on three directory nodes also constitutes a security
  37. risk. The directory node operators could easily analyze the publish and fetch
  38. requests to derive information on service activity and usage and read the
  39. descriptor contents to determine which onion routers work as introduction
  40. points for a given hidden service and needed to be attacked or threatened to
  41. shut it down. Furthermore, the contents of a hidden service descriptor offer
  42. only minimal security properties to the hidden service. Whoever gets aware
  43. of the service ID can easily find out whether the service is active at the
  44. moment and which introduction points it has. This applies to (former)
  45. clients, (former) introduction points, and of course to the directory nodes.
  46. It requires only to request the descriptor for the given service ID which
  47. can be performed by anyone anonymously.
  48. This proposal suggests two major changes to approach the described
  49. performance and security problems:
  50. The first change affects the storage location for hidden service
  51. descriptors. Descriptors are distributed among a large subset of all onion
  52. router instead of three fixed directory nodes. Each storing node is
  53. responsible for a subset of descriptors for a limited time only. It is not
  54. able to choose which descriptors it stores at a certain time, because this
  55. is determined by its onion ID which is hard to change frequently and in time
  56. (only routers which are stable for a given time are accepted as storing
  57. nodes). In order to resist single node failures and untrustworthy nodes,
  58. descriptors are replicated among a certain number of storing nodes. A simple
  59. replication protocol makes sure that descriptors don't get lost when the
  60. node population changes. Therefore, a storing node periodically requests the
  61. descriptors from its siblings. Connections to storing nodes are established
  62. by extending existing circuits by one hop to the storing node. This also
  63. ensures that contents are encrypted. The effect of this first change is that
  64. the probability that a single node operator learns about a certain hidden
  65. service is very small and that it is very hard to track a service over time,
  66. even when it collaborates with other node operators.
  67. The second change concerns the content of hidden service descriptors.
  68. Obviously, security problems cannot be solved only by decentralizing
  69. storage; in fact, they could also get worse if done without caution. At
  70. first, a descriptor ID needs to change periodically in order to be stored on
  71. changing nodes over time. Next, the descriptor ID needs to be computable only
  72. for the service's clients, but should be unpredictable for all other nodes.
  73. Further, the storing node needs to be able to verify that the hidden service
  74. is the true originator of the descriptor with the given ID even though it is
  75. not a client. Finally, a storing node shall only learn as few information as
  76. necessary by storing a descriptor, because it might not be as trustworthy as
  77. a directory node; for example it does not need to know the list of
  78. introduction points. Therefore, a second key is applied that is only known
  79. to the hidden service provider and its clients and that is not included in
  80. the descriptor. It is used to calculate descriptor IDs and to encrypt the
  81. introduction points. This second key can either be given to all clients
  82. together with the hidden service ID, or to a group or a single client as
  83. authentication token. In the future this second key could be the result of
  84. some key agreement protocol between the hidden service and one or more
  85. clients. A new text-based format is proposed for descriptors instead of an
  86. extension of the existing binary format for reasons of future extensibility.
  87. Design:
  88. The proposed design is described by the changes that are necessary to the
  89. current design. Changes are grouped by content, rather than by affected
  90. specification documents.
  91. All nodes:
  92. All nodes can combine the network lists received from all directory nodes
  93. to one routing list containing only those nodes that store and serve
  94. hidden service descriptors and which are contained in the majority of
  95. network lists. A node only trusts its own routing list and never learns
  96. about routing information from other nodes. This list should only be
  97. created on demand by those nodes that are involved in the new hidden
  98. service protocol, i.e. hidden service directory node, hidden service
  99. provider, and hidden service client.
  100. All nodes that are involved in the new hidden service protocol calculate
  101. the clock skew between their local time and the times of directory
  102. authorities. If the clock skew exceeds 1 minute (as opposed to 30 minutes
  103. as in the current implementation), the user is warned upon performing the
  104. first operation that is related to hidden services. However, the local
  105. time is not adjusted automatically to prevent attacks based on false times
  106. from directory authorities.
  107. Hidden service directory nodes:
  108. Every onion router can decide whether it wants to store and serve hidden
  109. service descriptors by setting a new config option HiddenServiceDirectory
  110. 0|1 to 1. This option should be 1 by default for those onion routers that
  111. have their directory port open, because the smaller the group of storing
  112. nodes is, the poorer the security properties are.
  113. HS directory nodes include the fact that they store and serve hidden
  114. service descriptors in router descriptors that they send to directory
  115. authorities.
  116. HS directory nodes accept publish and fetch requests for hidden service
  117. descriptors and store/retrieve them to/from their local memory. (It is not
  118. necessary to make descriptors persistent, because after disconnecting, the
  119. onion router would not be accepted as storing node anyway, because it is
  120. not stable.) All requests and replies are formatted as HTTP messages.
  121. Requests are directed to the router's directory port and are contained
  122. within BEGIN_DIR cells. A HS directory node stores a descriptor only, when
  123. it thinks that it is responsible for storing that descriptor based on its
  124. own routing table. Every HS directory node is responsible for the
  125. descriptor IDs in the interval of its n-th predecessor in the ID circle up
  126. to its own ID (n denotes the number of replicas).
  127. A HS directory node replicates descriptors for which it is responsible by
  128. downloading them from other HS directory nodes. Therefore, it checks its
  129. routing table periodically every 10 minutes for changes. Whenever it
  130. realizes that a predecessor has left the network, it establishes a
  131. connection to the new n-th predecessor and requests its stored descriptors
  132. in the interval of its (n+1)-th predecessor and the requested n-th
  133. predecessor. Whenever it realizes that a new onion router has joined with
  134. an ID higher than its former n-th predecessor, it adds it to its
  135. predecessors and discards all descriptors in the interval of its (n+1)-th
  136. and its n-th predecessor.
  137. Authoritative directory nodes:
  138. Directory nodes include a new flag for routers that decided to provide
  139. storage for hidden service descriptors and that are stable for a given
  140. time. The requirement to be stable prevents a node from frequently
  141. changing its onion key to become responsible for a freely chosen
  142. identifier.
  143. Hidden service provider:
  144. When setting up the hidden service at introduction points, a hidden service
  145. provider does not pass its own public key, but the public key of a freshly
  146. generated key pair. It also includes this public key in the hidden service
  147. descriptor together with the other introduction point information. The
  148. reason is that the introduction point does not need to know for which
  149. hidden service it works, and should not know it to prevent it from
  150. tracking the hidden service's activity.
  151. Hidden service providers publishes a new descriptor whenever its content
  152. changes or a new publication period starts for this descriptor. If the
  153. current publication period would only last for less than 60 minutes, the
  154. hidden service provider publishes both, a current descriptor and one for
  155. the next period. Publication is performed by sending the descriptor to all
  156. hidden service directories that are responsible for keeping replicas for
  157. the descriptor ID.
  158. Hidden service client:
  159. Instead of downloading descriptors from a hidden service authoritative
  160. directory, a hidden service client downloads it from a randomly chosen
  161. hidden service directory that is responsible for keeping replica for the
  162. descriptor ID.
  163. When contacting an introduction point, the client does not use the
  164. public key of the hidden service provider, but the freshly-generated public
  165. key that is included in the hidden service descriptor.
  166. Hidden service descriptor:
  167. The descriptor ID needs to change periodically in order for the descriptor
  168. to be stored on changing nodes over time. It further may only be computable
  169. by a hidden service provider and all of his clients to prevent unauthorized
  170. nodes from tracking the service activity by periodically checking whether
  171. there is a descriptor for this service. Finally, the hidden service
  172. directory needs to be able to verify that the hidden service provider is
  173. the true originator of the descriptor with the given ID. Therefore, the
  174. ID is derived from the public key of the hidden service provider, the
  175. current time period, and a shared secret between hidden service provider
  176. and clients. Only the hidden service provider and the clients are able to
  177. generate future IDs, but together with the descriptor content the hidden
  178. service directory is able to verify its origin. The formula for calculating
  179. a descriptor ID is as follows:
  180. descriptor-id = h(permanent-id + h(time-period + cookie))
  181. "permanent-id" is the hashed value of the public key of the hidden service
  182. provider, "time-period" is a periodically changing value, e.g. the current
  183. date, and "cookie" is a shared secret between the hidden service provider
  184. and its clients. (The "time-period" should be constructed in a way that
  185. periods do not change at the same moment for all descriptors by including
  186. the "permanent-id" in the construction.) Amonst other things, the
  187. descriptor contains the public key of the hidden service provider, the
  188. value of h(time-period + cookie), and the signature of the descriptor
  189. content with the private key of the hidden service provider.
  190. The introduction points that are included in the descriptor are encrypted
  191. using a key that is derived from the same shared key that is used to
  192. generate the descriptor ID. [usage of a derived key as encryption key
  193. instead of the shared key itself suggested by LO]
  194. A new text-based format is proposed for descriptors instead of an
  195. extension of the existing binary format for reasons of future
  196. extensibility.
  197. The complete hidden service descriptor format looks like this:
  198. {
  199. descriptor-id = h(permanent-id + h(time-period + cookie))
  200. permanent-public-key (with permanent-id = h(permanent-public-key))
  201. h(time-period + cookie)
  202. timestamp
  203. {
  204. list of (introduction point IP, port, public service key)
  205. } encrypted with h(time-period + cookie + 'introduction')
  206. } signed with permanent-private-key
  207. A hidden service directory can verify that a descriptor was created by the
  208. hidden service provider by checking if the descriptor-id corresponds to
  209. the permanent-public-key and if the signature can be verified with the
  210. permanent-public-key.
  211. A client can download the descriptor by creating the same descriptor-id
  212. and verify its origin by performing the same operations as the hidden
  213. service directory.
  214. Security implications:
  215. The security implications of the proposed changes are grouped by the roles
  216. of nodes that could perform attacks or on which attacks could be performed.
  217. Attacks by authoritative directory nodes
  218. Authoritative directory nodes are not anymore the single places in the
  219. network that know about a hidden service's activity and introduction
  220. points. Thus, they cannot perform attacks using this information, e.g.
  221. track a hidden service's activity or usage pattern or attack its
  222. introduction points. Formerly, it would only require a single corrupted
  223. authoritative directory operator to perform such an attack.
  224. Attacks by hidden service directory nodes
  225. A hidden service directory node could misuse a stored descriptor to track
  226. a hidden service's activity and usage pattern by clients. Though there is
  227. no countermeasure against this kind of attack, it is very expensive to
  228. track a certain hidden service over time. An attacker would need to run a
  229. large number of stable onion routers that work as hidden service directory
  230. nodes to have a good probability to become responsible for its changing
  231. descriptor IDs. For each period, the probability is:
  232. 1-(N-c choose r)/(N choose r) for N-c>=r and 1 else with N as total
  233. number of hidden service directories, c as compromised nodes, and r as
  234. number of replicas
  235. The hidden service directory nodes could try to make a certain hidden
  236. service unavailable to its clients. Therefore, they could discard all
  237. stored descriptors for that hidden service and reply to clients that there
  238. is no descriptor for the given ID or return an old or false descriptor
  239. content. The client would detect a false descriptor, because it could not
  240. contain a correct signature. But an old content or an empty reply could
  241. confuse the client. Therefore, the countermeasure is to replicate
  242. descriptors among a small number of hidden service directories, e.g. 5.
  243. The probability of a group of collaborating nodes to make a hidden service
  244. completely unavailable is in each period:
  245. (c choose r)/(N choose r) for c>=r and N>=r, and 0 else with N as total
  246. number of hidden service directories, c as compromised nodes, and r as
  247. number of replicas
  248. A hidden service directory could try to find out which introduction points
  249. are working on behalf of a hidden service. In contrast to the previous
  250. design, this is not possible anymore, because this information is encrypted
  251. to the clients of a hidden service.
  252. Attacks on hidden service directory nodes
  253. An anonymous attacker could try to swamp a hidden service directory with
  254. false descriptors for a given descriptor ID. This is prevented by requiring
  255. that descriptors are signed.
  256. Anonymous attackers could swamp a hidden service directory with correct
  257. descriptors for non-existing hidden services. There is no countermeasure
  258. against this attack. However, the creation of valid descriptors is more
  259. expensive than verification and storage in local memory. This should make
  260. this kind of attack unattractive.
  261. Attacks by introduction points
  262. Current or former introduction points could try to gain information on the
  263. hidden service they serve. But due to the fresh key pair that is used by
  264. the hidden service, this attack is not possible anymore.
  265. Attacks by clients
  266. Current or former clients could track a hidden service's activity, attack
  267. its introduction points, or determine the responsible hidden service
  268. directory nodes and attack them. There is nothing that could prevent them
  269. from doing so, because honest clients need the full descriptor content to
  270. establish a connection to the hidden service. At the moment, the only
  271. countermeasure against dishonest clients is to change the secret cookie
  272. and pass it only to the honest clients.
  273. Specification:
  274. The proposed changes affect multiple sections in several specification
  275. documents that are only mentioned in the following. The detailed
  276. specification will follow as soon as the design decision above are final.
  277. dir-spec-v2.txt
  278. 2.1 The router descriptor format needs to include an additional flag to
  279. denote that a router is a hidden service directory.
  280. 3 The network status format needs to be extended by a new status flag to
  281. denote that a router is a hidden service directory.
  282. 4 The sections on directory caches need to be extended by new sections for
  283. the operation of hidden service directories, including replication of
  284. descriptors.
  285. rend-spec.txt
  286. 1.2 The new descriptor format needs to be added.
  287. 1.3 Instead of Bob's public key, the hidden service provider uses a
  288. freshly generated public key for every introduction point.
  289. 1.4 Bob's OP does not upload his service descriptor to the authoritative
  290. directories, but to the hidden service directories.
  291. 1.6 Alice's OP downloads the service descriptors similarly as Bob
  292. published them in 1.4.
  293. 1.8 Alice uses the public key that is included in the descriptor instead
  294. of Bob's permanent service key.
  295. tor-spec.txt
  296. 6.2.1 Directory streams need to be used for connections to hidden service
  297. directories.
  298. Compatibility:
  299. The proposed design is meant to replace the current design for hidden service
  300. descriptors and their storage in the long run.
  301. There should be a first transition phase in which both, the current design
  302. and the proposed design are served in parallel. Onion routers should start
  303. serving as hidden service directories, and hidden service providers and
  304. clients should make use of the new design if both sides support it. But
  305. hidden service providers should continue publishing descriptors of the
  306. current format, and authoritative directories should store and serve these
  307. descriptors.
  308. After the first transition phase, hidden service providers should stop
  309. publishing descriptors on authoritative directories, and hidden service
  310. clients should not try to fetch descriptors from the authoritative
  311. directories. However, the authoritative directories should continue serving
  312. hidden service descriptors for a second transition phase.
  313. After the second transition phase, the authoritative directories should stop
  314. serving hidden service descriptors.
  315. Implementation:
  316. There are three key lengths that might need some discussion:
  317. 1) desciptor-id, formerly known as onion address: It is generated by OPs
  318. internally and used for storing and looking up descriptors. There is no
  319. need to remember a descriptor-id for a human. In order to reduce
  320. the success rate of collisions it could be extended to 256 bits instead
  321. of 80 bits. This requires a secure hash function with an output of 256
  322. instead of 160 bits, e.g. SHA-256. [extending the descriptor-id length
  323. from 80 to 256 bits suggested by LO]
  324. 2) permanent-id: This is the first half of the onion address that a client
  325. passes to his OP. The onion address should be easy to memorize.
  326. Therefore, the overall length of an onion address should not be
  327. extended over the existing 80 bits, so that 40 bits is the maximum
  328. length of the permanent-id. However, the question remains open, if an
  329. onion address of 40+40=80 bits can generate a descriptor-id with enough
  330. entropy to justify 256 instead of 80 bits. Otherwise, the onion address
  331. would need to be extended to 128, 160, 224, or 256 bits, making it
  332. harder to memorize for human-beings.
  333. 3) cookie: This is the second half of the onion address that is passed to
  334. an OP. It should have the same size as permanent-id.