| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196 | 
							- Filename: 143-distributed-storage-improvements.txt
 
- Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors
 
- Version: $Revision$
 
- Last-Modified: $Date$
 
- Author: Karsten Loesing
 
- Created: 28-Jun-2008
 
- Status: Open
 
- Target: 0.2.1.x
 
- Change history:
 
-   28-Jun-2008  Initial proposal for or-dev
 
- Overview:
 
-   An evaluation of the distributed storage for Tor hidden service
 
-   descriptors and subsequent discussions have brought up a few improvements
 
-   to proposal 114. All improvements are backwards compatible to the
 
-   implementation of proposal 114.
 
- Design:
 
-   1. Report Bad Directory Nodes
 
-   Bad hidden service directory nodes could deny existence of previously
 
-   stored descriptors. A bad directory node that does this with all stored
 
-   descriptors causes harm to the distributed storage in general, but
 
-   replication will cope with this problem in most cases. However, an
 
-   adversary that attempts to make a specific hidden service unavailable by
 
-   running relays that become responsible for all of a service's
 
-   descriptors poses a more serious threat. The distributed storage needs to
 
-   defend against this attack by detecting and removing bad directory nodes.
 
-   As a countermeasure hidden services try to download their descriptors
 
-   every hour at random times from the hidden service directories that are
 
-   responsible for storing it. If a directory node replies with 404 (Not
 
-   found), the hidden service reports the supposedly bad directory node to
 
-   a random selection of half of the directory authorities (with version
 
-   numbers equal to or higher than the first version that implements this
 
-   proposal). The hidden service posts a complaint message using HTTP 'POST'
 
-   to a URL "/tor/rendezvous/complain" with the following message format:
 
-     "hidden-service-directory-complaint" identifier NL
 
-       [At start, exactly once]
 
-       The identifier of the hidden service directory node to be
 
-       investigated.
 
-     "rendezvous-service-descriptor" descriptor NL
 
-       [At end, Excatly once]
 
-       The hidden service descriptor that the supposedly bad directory node
 
-       does not serve.
 
-   The directory authority checks if the descriptor is valid and the hidden
 
-   service directory responsible for storing it. It waits for a random time
 
-   of up to 30 minutes before posting the descriptor to the hidden service
 
-   directory. If the publication is acknowledged, the directory authority
 
-   waits another random time of up to 30 minutes before attempting to
 
-   request the descriptor that it has posted. If the directory node replies
 
-   with 404 (Not found), it will be blacklisted for being a hidden service
 
-   directory node for the next 48 hours.
 
-   A blacklisted hidden service directory is assigned the new flag BadHSDir
 
-   instead of the HSDir flag in the vote that a directory authority creates.
 
-   In a consensus a relay is only assigned a HSDir flag if the majority of
 
-   votes contains a HSDir flag and no more than one third of votes contains
 
-   a BadHSDir flag. As a result, clients do not have to learn about the
 
-   BadHSDir flag. A blacklisted directory node will simply not be assigned
 
-   the HSDir flag in the consensus.
 
-   In order to prevent an attacker from setting up new nodes as replacement
 
-   for blacklisted directory nodes, all directory nodes in the same /24
 
-   subnet are blacklisted, too. Furthermore, if two or more directory nodes
 
-   are blacklisted in the same /16 subnet concurrently, all other directory
 
-   nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at
 
-   most 48 hours.
 
-   2. Publish Fewer Replicas
 
-   The evaluation has shown that the probability of a directory node to
 
-   serve a previously stored descriptor is 85.7% (more precisely, this is
 
-   the 0.001-quantile of the empirical distribution with the rationale that
 
-   it holds for 99.9% of all empirical cases). If descriptors are replicated
 
-   to x directory nodes, the probability of at least one of the replicas to
 
-   be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an
 
-   overall availability of 99.9%, x = 3.55 replicas need to be stored. From
 
-   this follows that 4 replicas are sufficient, rather than the currently
 
-   stored 6 replicas.
 
-   Further, the current design stores 2 sets of descriptors on 3 directory
 
-   nodes with consecutive identities. Originally, this was meant to
 
-   facilitate replication between directory nodes, which has not been and
 
-   will not be implemented (the selection criterion of 24 hours uptime does
 
-   not make it necessary). As a result, storing descriptors on directory
 
-   nodes with consecutive identities is not required. In fact it should be
 
-   avoided to enable an attacker to create "black holes" in the identifier
 
-   ring.
 
-   Hidden services should store their descriptors on 4 non-consecutive
 
-   directory nodes, and clients should request descriptors from these
 
-   directory nodes only. For compatibility reasons, hidden services also
 
-   store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x
 
-   clients will be able to retrieve 4 out of 6 descriptors, but will fail
 
-   for the remaining 2 descriptors, which is sufficient for reliability. As
 
-   soon as 0.2.0.x is deprecated, hidden services can stop publishing the
 
-   additional 2 replicas.
 
-   3. Change Default Value of Being Hidden Service Directory
 
-   The requirements for becoming a hidden service directory node are an open
 
-   directory port and an uptime of at least 24 hours. The evaluation has
 
-   shown that there are 300 hidden service directory candidates in the mean,
 
-   but only 6 of them are configured to act as hidden service directories.
 
-   This is bad, because those 6 nodes need to serve a large share of all
 
-   hidden service descriptors. Optimally, there should be hundreds of hidden
 
-   service directories. Having a large number of 0.2.1.x directory nodes
 
-   also has a positive effect on 0.2.0.x hidden services and clients.
 
-   Therefore, the new default of HidServDirectoryV2 should be 1, so that a
 
-   Tor relay that has an open directory port automatically accepts and
 
-   serves v2 hidden service descriptors. A relay operator can still opt-out
 
-   running a hidden service directory by changing HidServDirectoryV2 to 0.
 
-   The additional bandwidth requirements for running a hidden service
 
-   directory node in addition to being a directory cache are negligible.
 
-   4. Make Descriptors Persistent on Directory Nodes
 
-   Hidden service directories that are restarted by their operators or after
 
-   a failure will not be selected as hidden service directories within the
 
-   next 24 hours. However, some clients might still think that these nodes
 
-   are responsible for certain descriptors, because they work on the basis
 
-   of network consensuses that are up to three hours old. The directory
 
-   nodes should be able to serve the previously received descriptors to
 
-   these clients. Therefore, directory nodes make all received descriptors
 
-   persistent and load previously received descriptors on startup.
 
-   5. Store and Serve Descriptors Regardless of Responsibility
 
-   Currently, directory nodes only accept descriptors for which they think
 
-   they are responsible. This may lead to problems when a directory node
 
-   uses an older or newer network consensus than hidden service or client
 
-   or when a directory node has been restarted recently. In fact, there are
 
-   no security issues in storing or serving descriptors for which a
 
-   directory node thinks it is not responsible. To the contrary, doing so
 
-   may improve reliability in border cases. As a result, a directory node
 
-   does not pay attention to responsibilty when receiving a publication or
 
-   fetch request, but stores or serves the requested descriptor. Likewise,
 
-   the directory node does not remove descriptors when it thinks it is not
 
-   responsible for them any more.
 
-   6. Avoid Periodic Descriptor Re-Publication
 
-   In the current implementation a hidden service re-publishes its
 
-   descriptor either when its content changes or an hour elapses. However,
 
-   the evaluation has shown that failures of hidden service directory nodes,
 
-   i.e. of nodes that have not failed within the last 24 hours, are very
 
-   rare. Together with making descriptors persistent on directory nodes,
 
-   there is no necessity to re-publish descriptors hourly.
 
-   The only two events leading to descriptor re-publication should be a
 
-   change of the descriptor content and a new directory node becoming
 
-   responsible for the descriptor. Hidden services should therefore consider
 
-   re-publication every time they learn about a new network consensus
 
-   instead of hourly.
 
-   7. Discard Expired Descriptors
 
-   The current implementation lets directory nodes keep a descriptor for two
 
-   days before discarding it. However, with the v2 design, descriptors are
 
-   only valid for at most one day. Directory nodes should determine the
 
-   validity of stored descriptors and discard them one hour after they have
 
-   expired (to compensate wrong clocks on clients).
 
-   8. Shorten Client-Side Descriptor Fetch History
 
-   When clients try to download a hidden service descriptor, they memorize
 
-   fetch requests to directory nodes for up to 15 minutes. This allows them
 
-   to request all replicas of a descriptor to avoid bad or failing directory
 
-   nodes, but without querying the same directory node twice.
 
-   The downside is that a client that has requested a descriptor without
 
-   success, will not be able to find a hidden service that has been started
 
-   during the following 15 minutes after the client's last request.
 
-   This can be improved by shortening the fetch history to only 5 minutes.
 
-   This time should be sufficient to complete requests for all replicas of a
 
-   descriptor, but without ending in an infinite request loop.
 
- Compatibility:
 
-   All proposed improvements are compatible to the currently implemented
 
-   design as described in proposal 114.
 
 
  |