| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441 | Filename: 114-distributed-storage.txtTitle: Distributed Storage for Tor Hidden Service DescriptorsVersion: $Revision$Last-Modified: $Date$Author: Karsten LoesingCreated: 13-May-2007Status: ClosedImplemented-In: 0.2.0.xChange history:  13-May-2007  Initial proposal  14-May-2007  Added changes suggested by Lasse Øverlier  30-May-2007  Changed descriptor format, key length discussion, typos  09-Jul-2007  Incorporated suggestions by Roger, added status of specification               and implementation for upcoming GSoC mid-term evaluation  11-Aug-2007  Updated implementation statuses, included non-consecutive               replication to descriptor format  20-Aug-2007  Renamed config option HSDir as HidServDirectoryV2  02-Dec-2007  Closed proposalOverview:  The basic idea of this proposal is to distribute the tasks of storing and  serving hidden service descriptors from currently three authoritative  directory nodes among a large subset of all onion routers. The three  reasons to do this are better robustness (availability), better  scalability, and improved security properties. Further,  this proposal suggests changes to the hidden service descriptor format to  prevent new security threats coming from decentralization and to gain even  better security properties.Status:  As of December 2007, the new hidden service descriptor format is implemented  and usable. However, servers and clients do not yet make use of descriptor  cookies, because there are open usability issues of this feature that might  be resolved in proposal 121. Further, hidden service directories do not  perform replication by themselves, because (unauthorized) replica fetch  requests would allow any attacker to fetch all hidden service descriptors in  the system. As neither issue is critical to the functioning of v2  descriptors and their distribution, this proposal is considered as Closed.  Motivation:  The current design of hidden services exhibits the following performance and  security problems:  First, the three hidden service authoritative directories constitute a  performance bottleneck in the system. The directory nodes are responsible for  storing and serving all hidden service descriptors. As of May 2007 there are  about 1000 descriptors at a time, but this number is assumed to increase in  the future. Further, there is no replication protocol for descriptors between  the three directory nodes, so that hidden services must ensure the  availability of their descriptors by manually publishing them on all  directory nodes. Whenever a fourth or fifth hidden service authoritative  directory is added, hidden services will need to maintain an equally  increasing number of replicas. These scalability issues have an impact on the  current usage of hidden services and put an even higher burden on the  development of new kinds of applications for hidden services that might  require storing even more descriptors.  Second, besides posing a limitation to scalability, storing all hidden  service descriptors on three directory nodes also constitutes a security  risk. The directory node operators could easily analyze the publish and fetch  requests to derive information on service activity and usage and read the  descriptor contents to determine which onion routers work as introduction  points for a given hidden service and need to be attacked or threatened to  shut it down. Furthermore, the contents of a hidden service descriptor offer  only minimal security properties to the hidden service. Whoever gets aware of  the service ID can easily find out whether the service is active at the  moment and which introduction points it has. This applies to (former)  clients, (former) introduction points, and of course to the directory nodes.  It requires only to request the descriptor for the given service ID, which  can be performed by anyone anonymously.  This proposal suggests two major changes to approach the described  performance and security problems:  The first change affects the storage location for hidden service descriptors.  Descriptors are distributed among a large subset of all onion routers instead  of three fixed directory nodes. Each storing node is responsible for a subset  of descriptors for a limited time only. It is not able to choose which  descriptors it stores at a certain time, because this is determined by its  onion ID which is hard to change frequently and in time (only routers which  are stable for a given time are accepted as storing nodes). In order to  resist single node failures and untrustworthy nodes, descriptors are  replicated among a certain number of storing nodes. A first replication  protocol makes sure that descriptors don't get lost when the node population  changes; therefore, a storing node periodically requests the descriptors from  its siblings. A second replication protocol distributes descriptors among  non-consecutive nodes of the ID ring to prevent a group of adversaries from  generating new onion keys until they have consecutive IDs to create a 'black  hole' in the ring and make random services unavailable. Connections to  storing nodes are established by extending existing circuits by one hop to  the storing node. This also ensures that contents are encrypted. The effect  of this first change is that the probability that a single node operator  learns about a certain hidden service is very small and that it is very hard  to track a service over time, even when it collaborates with other node  operators.    The second change concerns the content of hidden service descriptors.  Obviously, security problems cannot be solved only by decentralizing storage;  in fact, they could also get worse if done without caution. At first, a  descriptor ID needs to change periodically in order to be stored on changing  nodes over time. Next, the descriptor ID needs to be computable only for the  service's clients, but should be unpredictable for all other nodes. Further,  the storing node needs to be able to verify that the hidden service is the  true originator of the descriptor with the given ID even though it is not a  client. Finally, a storing node should learn as little information as  necessary by storing a descriptor, because it might not be as trustworthy as  a directory node; for example it does not need to know the list of  introduction points. Therefore, a second key is applied that is only known to  the hidden service provider and its clients and that is not included in the  descriptor. It is used to calculate descriptor IDs and to encrypt the  introduction points. This second key can either be given to all clients  together with the hidden service ID, or to a group or a single client as  an authentication token. In the future this second key could be the result of  some key agreement protocol between the hidden service and one or more  clients. A new text-based format is proposed for descriptors instead of an  extension of the existing binary format for reasons of future extensibility.Design:  The proposed design is described by the required changes to the current  design. These requirements are grouped by content, rather than by affected  specification documents or code files, and numbered for reference below.  Hidden service clients, servers, and directories:  /1/ Create routing list    All participants can filter the consensus status document received from the    directory authorities to one routing list containing only those servers    that store and serve hidden service descriptors and which are running for    at least 24 hours. A participant only trusts its own routing list and never    learns about routing information from other parties.  /2/ Determine responsible hidden service directory    All participants can determine the hidden service directory that is    responsible for storing and serving a given ID, as well as the hidden    service directories that replicate its content. Every hidden service    directory is responsible for the descriptor IDs in the interval from    its predecessor, exclusive, to its own ID, inclusive. Further, a hidden    service directory holds replicas for its n predecessors, where n denotes    the number of consecutive replicas. (requires /1/)  [/3/ and /4/ were requirements to use BEGIN_DIR cells for directory   requests which have not been fulfilled in the course of the implementation   of this proposal, but elsewhere.]  Hidden service directory nodes:      /5/ Advertise hidden service directory functionality    Every onion router that has its directory port open can decide whether it    wants to store and serve hidden service descriptors by setting a new config    option "HidServDirectoryV2" 0|1 to 1. An onion router with this config    option being set includes the flag "hidden-service-dir" in its router    descriptors that it sends to directory authorities.  /6/ Accept v2 publish requests, parse and store v2 descriptors    Hidden service directory nodes accept publish requests for hidden service    descriptors and store them to their local memory. (It is not necessary to    make descriptors persistent, because after disconnecting, the onion router    would not be accepted as storing node anyway, because it has not been    running for at least 24 hours.) All requests and replies are formatted as    HTTP messages. Requests are directed to the router's directory port and are    contained within BEGIN_DIR cells. A hidden service directory node stores a    descriptor only when it thinks that it is responsible for storing that    descriptor based on its own routing table. Every hidden service directory    node is responsible for the descriptor IDs in the interval of its n-th    predecessor in the ID circle up to its own ID (n denotes the number of    consecutive replicas). (requires /1/)  /7/ Accept v2 fetch requests    Same as /6/, but with fetch requests for hidden service descriptors.    (requires /2/)  /8/ Replicate descriptors with neighbors    A hidden service directory node replicates descriptors from its two    predecessors by downloading them once an hour. Further, it checks its    routing table periodically for changes. Whenever it realizes that a    predecessor has left the network, it establishes a connection to the new    n-th predecessor and requests its stored descriptors in the interval of its    (n+1)-th predecessor and the requested n-th predecessor. Whenever it    realizes that a new onion router has joined with an ID higher than its    former n-th predecessor, it adds it to its predecessors and discards all    descriptors in the interval of its (n+1)-th and its n-th predecessor.    (requires /1/)    [Dec 02: This function has not been implemented, because arbitrary nodes     what have been able to download the entire set of v2 descriptors. An     authorized replication request would be necessary. For the moment, the     system runs without any directory-side replication. -KL]  Authoritative directory nodes:  /9/ Confirm a router's hidden service directory functionality    Directory nodes include a new flag "HSDir" for routers that decided to    provide storage for hidden service descriptors and that are running for at    least 24 hours. The last requirement prevents a node from frequently    changing its onion key to become responsible for an identifier it wants to    target.  Hidden service provider:  /10/ Configure v2 hidden service    Each hidden service provider that has set the config option    "PublishV2HidServDescriptors" 0|1 to 1 is configured to publish v2    descriptors and conform to the v2 connection establishment protocol. When    configuring a hidden service, a hidden service provider checks if it has    already created a random secret_cookie and a hostname2 file; if not, it    creates both of them. (requires /2/)  /11/ Establish introduction points with fresh key    If configured to publish only v2 descriptors and no v0/v1 descriptors any    more, a hidden service provider that is setting up the hidden service at    introduction points does not pass its own public key, but the public key    of a freshly generated key pair. It also includes these fresh public keys    in the hidden service descriptor together with the other introduction point    information. The reason is that the introduction point does not need to and    therefore should not know for which hidden service it works, so as to    prevent it from tracking the hidden service's activity. (If a hidden    service provider supports both, v0/v1 and v2 descriptors, v0/v1 clients    rely on the fact that all introduction points accept the same public key,    so that this new feature cannot be used.)  /12/ Encode v2 descriptors and send v2 publish requests    If configured to publish v2 descriptors, a hidden service provider    publishes a new descriptor whenever its content changes or a new    publication period starts for this descriptor. If the current publication    period would only last for less than 60 minutes (= 2 x 30 minutes to allow    the server to be 30 minutes behind and the client 30 minutes ahead), the    hidden service provider publishes both a current descriptor and one for    the next period. Publication is performed by sending the descriptor to all    hidden service directories that are responsible for keeping replicas for    the descriptor ID. This includes two non-consecutive replicas that are    stored at 3 consecutive nodes each. (requires /1/ and /2/)  Hidden service client:  /13/ Send v2 fetch requests    A hidden service client that has set the config option    "FetchV2HidServDescriptors" 0|1 to 1 handles SOCKS requests for v2 onion    addresses by requesting a v2 descriptor from a randomly chosen hidden    service directory that is responsible for keeping replica for the    descriptor ID. In total there are six replicas of which the first and the    last three are stored on consecutive nodes. The probability of picking one    of the three consecutive replicas is 1/6, 2/6, and 3/6 to incorporate the    fact that the availability will be the highest on the node with next higher    ID. A hidden service client relies on the hidden service provider to store    two sets of descriptors to compensate clock skew between service and    client. (requires /1/ and /2/)  /14/ Process v2 fetch reply and parse v2 descriptors    A hidden service client that has sent a request for a v2 descriptor can    parse it and store it to the local cache of rendezvous service descriptors.  /15/ Establish connection to v2 hidden service    A hidden service client can establish a connection to a hidden service    using a v2 descriptor. This includes using the secret cookie for decrypting    the introduction points contained in the descriptor. When contacting an    introduction point, the client does not use the public key of the hidden    service provider, but the freshly-generated public key that is included in    the hidden service descriptor. Whether or not a fresh key is used instead    of the key of the hidden service depends on the available protocol versions    that are included in the descriptor; by this, connection establishment is    to a certain extend decoupled from fetching the descriptor.  Hidden service descriptor:  (Requirements concerning the descriptor format are contained in /6/ and /7/.)      The new v2 hidden service descriptor format looks like this:      onion-address = h(public-key) + cookie      descriptor-id = h(h(public-key) + h(time-period + cookie + relica))      descriptor-content = {        descriptor-id,        version,        public-key,        h(time-period + cookie + replica),        timestamp,        protocol-versions,        { introduction-points } encrypted with cookie      } signed with private-key    The "descriptor-id" needs to change periodically in order for the    descriptor to be stored on changing nodes over time. It may only be    computable by a hidden service provider and all of his clients to prevent    unauthorized nodes from tracking the service activity by periodically    checking whether there is a descriptor for this service. Finally, the    hidden service directory needs to be able to verify that the hidden service    provider is the true originator of the descriptor with the given ID.        Therefore, "descriptor-id" is derived from the "public-key" of the hidden    service provider, the current "time-period" which changes every 24 hours,    a secret "cookie" shared between hidden service provider and clients, and    a "replica" denoting the number of this non-consecutive replica. (The    "time-period" is constructed in a way that time periods do not change at    the same moment for all descriptors by deriving a value between 0:00 and    23:59 hours from h(public-key) and making the descriptors of this hidden    service provider expire at that time of the day.) The "descriptor-id" is    defined to be 160 bits long. [extending the "descriptor-id" length    suggested by LØ]        Only the hidden service provider and the clients are able to generate    future "descriptor-ID"s. Hence, the "onion-address" is extended from now     the hash value of "public-key" by the secret "cookie". The "public-key" is    determined to be 80 bits long, whereas the "cookie" is dimensioned to be    120 bits long. This makes a total of 200 bits or 40 base32 chars, which is    quite a lot to handle for a human, but necessary to provide sufficient    protection against an adversary from generating a key pair with same    "public-key" hash or guessing the "cookie".        A hidden service directory can verify that a descriptor was created by the    hidden service provider by checking if the "descriptor-id" corresponds to    the "public-key" and if the signature can be verified with the    "public-key".    The "introduction-points" that are included in the descriptor are encrypted    using the same "cookie" that is shared between hidden service provider and    clients. [correction to use another key than h(time-period + cookie) as    encryption key for introduction points made by LØ]    A new text-based format is proposed for descriptors instead of an extension    of the existing binary format for reasons of future extensibility.Security implications:  The security implications of the proposed changes are grouped by the roles of  nodes that could perform attacks or on which attacks could be performed.  Attacks by authoritative directory nodes    Authoritative directory nodes are no longer the single places in the    network that know about a hidden service's activity and introduction    points. Thus, they cannot perform attacks using this information, e.g.    track a hidden service's activity or usage pattern or attack its    introduction points. Formerly, it would only require a single corrupted    authoritative directory operator to perform such an attack.  Attacks by hidden service directory nodes    A hidden service directory node could misuse a stored descriptor to track a    hidden service's activity and usage pattern by clients. Though there is no    countermeasure against this kind of attack, it is very expensive to track a    certain hidden service over time. An attacker would need to run a large    number of stable onion routers that work as hidden service directory nodes    to have a good probability to become responsible for its changing    descriptor IDs. For each period, the probability is:      1-(N-c choose r)/(N choose r) for N-c>=r and 1 otherwise, with N      as total      number of hidden service directories, c as compromised nodes, and r as      number of replicas    The hidden service directory nodes could try to make a certain hidden    service unavailable to its clients. Therefore, they could discard all    stored descriptors for that hidden service and reply to clients that there    is no descriptor for the given ID or return an old or false descriptor    content. The client would detect a false descriptor, because it could not    contain a correct signature. But an old content or an empty reply could    confuse the client. Therefore, the countermeasure is to replicate    descriptors among a small number of hidden service directories, e.g. 5.    The probability of a group of collaborating nodes to make a hidden service    completely unavailable is in each period:      (c choose r)/(N choose r) for c>=r and N>=r, and 0 otherwise,      with N as total      number of hidden service directories, c as compromised nodes, and r as      number of replicas    A hidden service directory could try to find out which introduction points    are working on behalf of a hidden service. In contrast to the previous    design, this is not possible anymore, because this information is encrypted    to the clients of a hidden service.  Attacks on hidden service directory nodes    An anonymous attacker could try to swamp a hidden service directory with    false descriptors for a given descriptor ID. This is prevented by requiring    that descriptors are signed.    Anonymous attackers could swamp a hidden service directory with correct    descriptors for non-existing hidden services. There is no countermeasure    against this attack. However, the creation of valid descriptors is more    expensive than verification and storage in local memory. This should make    this kind of attack unattractive.  Attacks by introduction points    Current or former introduction points could try to gain information on the    hidden service they serve. But due to the fresh key pair that is used by    the hidden service, this attack is not possible anymore.  Attacks by clients    Current or former clients could track a hidden service's activity, attack    its introduction points, or determine the responsible hidden service    directory nodes and attack them. There is nothing that could prevent them    from doing so, because honest clients need the full descriptor content to    establish a connection to the hidden service. At the moment, the only    countermeasure against dishonest clients is to change the secret cookie and    pass it only to the honest clients.Compatibility:  The proposed design is meant to replace the current design for hidden service  descriptors and their storage in the long run.  There should be a first transition phase in which both, the current design  and the proposed design are served in parallel. Onion routers should start  serving as hidden service directories, and hidden service providers and  clients should make use of the new design if both sides support it. Hidden  service providers should be allowed to publish descriptors of the current  format in parallel, and authoritative directories should continue storing and  serving these descriptors.  After the first transition phase, hidden service providers should stop  publishing descriptors on authoritative directories, and hidden service  clients should not try to fetch descriptors from the authoritative  directories. However, the authoritative directories should continue serving  hidden service descriptors for a second transition phase. As of this point,  all v2 config options should be set to a default value of 1.  After the second transition phase, the authoritative directories should stop  serving hidden service descriptors.
 |