123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194 |
- Filename: 143-distributed-storage-improvements.txt
- Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors
- Author: Karsten Loesing
- Created: 28-Jun-2008
- Status: Open
- Target: 0.2.1.x
- Change history:
- 28-Jun-2008 Initial proposal for or-dev
- Overview:
- An evaluation of the distributed storage for Tor hidden service
- descriptors and subsequent discussions have brought up a few improvements
- to proposal 114. All improvements are backwards compatible to the
- implementation of proposal 114.
- Design:
- 1. Report Bad Directory Nodes
- Bad hidden service directory nodes could deny existence of previously
- stored descriptors. A bad directory node that does this with all stored
- descriptors causes harm to the distributed storage in general, but
- replication will cope with this problem in most cases. However, an
- adversary that attempts to make a specific hidden service unavailable by
- running relays that become responsible for all of a service's
- descriptors poses a more serious threat. The distributed storage needs to
- defend against this attack by detecting and removing bad directory nodes.
- As a countermeasure hidden services try to download their descriptors
- every hour at random times from the hidden service directories that are
- responsible for storing it. If a directory node replies with 404 (Not
- found), the hidden service reports the supposedly bad directory node to
- a random selection of half of the directory authorities (with version
- numbers equal to or higher than the first version that implements this
- proposal). The hidden service posts a complaint message using HTTP 'POST'
- to a URL "/tor/rendezvous/complain" with the following message format:
- "hidden-service-directory-complaint" identifier NL
- [At start, exactly once]
- The identifier of the hidden service directory node to be
- investigated.
- "rendezvous-service-descriptor" descriptor NL
- [At end, Excatly once]
- The hidden service descriptor that the supposedly bad directory node
- does not serve.
- The directory authority checks if the descriptor is valid and the hidden
- service directory responsible for storing it. It waits for a random time
- of up to 30 minutes before posting the descriptor to the hidden service
- directory. If the publication is acknowledged, the directory authority
- waits another random time of up to 30 minutes before attempting to
- request the descriptor that it has posted. If the directory node replies
- with 404 (Not found), it will be blacklisted for being a hidden service
- directory node for the next 48 hours.
- A blacklisted hidden service directory is assigned the new flag BadHSDir
- instead of the HSDir flag in the vote that a directory authority creates.
- In a consensus a relay is only assigned a HSDir flag if the majority of
- votes contains a HSDir flag and no more than one third of votes contains
- a BadHSDir flag. As a result, clients do not have to learn about the
- BadHSDir flag. A blacklisted directory node will simply not be assigned
- the HSDir flag in the consensus.
- In order to prevent an attacker from setting up new nodes as replacement
- for blacklisted directory nodes, all directory nodes in the same /24
- subnet are blacklisted, too. Furthermore, if two or more directory nodes
- are blacklisted in the same /16 subnet concurrently, all other directory
- nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at
- most 48 hours.
- 2. Publish Fewer Replicas
- The evaluation has shown that the probability of a directory node to
- serve a previously stored descriptor is 85.7% (more precisely, this is
- the 0.001-quantile of the empirical distribution with the rationale that
- it holds for 99.9% of all empirical cases). If descriptors are replicated
- to x directory nodes, the probability of at least one of the replicas to
- be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an
- overall availability of 99.9%, x = 3.55 replicas need to be stored. From
- this follows that 4 replicas are sufficient, rather than the currently
- stored 6 replicas.
- Further, the current design stores 2 sets of descriptors on 3 directory
- nodes with consecutive identities. Originally, this was meant to
- facilitate replication between directory nodes, which has not been and
- will not be implemented (the selection criterion of 24 hours uptime does
- not make it necessary). As a result, storing descriptors on directory
- nodes with consecutive identities is not required. In fact it should be
- avoided to enable an attacker to create "black holes" in the identifier
- ring.
- Hidden services should store their descriptors on 4 non-consecutive
- directory nodes, and clients should request descriptors from these
- directory nodes only. For compatibility reasons, hidden services also
- store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x
- clients will be able to retrieve 4 out of 6 descriptors, but will fail
- for the remaining 2 descriptors, which is sufficient for reliability. As
- soon as 0.2.0.x is deprecated, hidden services can stop publishing the
- additional 2 replicas.
- 3. Change Default Value of Being Hidden Service Directory
- The requirements for becoming a hidden service directory node are an open
- directory port and an uptime of at least 24 hours. The evaluation has
- shown that there are 300 hidden service directory candidates in the mean,
- but only 6 of them are configured to act as hidden service directories.
- This is bad, because those 6 nodes need to serve a large share of all
- hidden service descriptors. Optimally, there should be hundreds of hidden
- service directories. Having a large number of 0.2.1.x directory nodes
- also has a positive effect on 0.2.0.x hidden services and clients.
- Therefore, the new default of HidServDirectoryV2 should be 1, so that a
- Tor relay that has an open directory port automatically accepts and
- serves v2 hidden service descriptors. A relay operator can still opt-out
- running a hidden service directory by changing HidServDirectoryV2 to 0.
- The additional bandwidth requirements for running a hidden service
- directory node in addition to being a directory cache are negligible.
- 4. Make Descriptors Persistent on Directory Nodes
- Hidden service directories that are restarted by their operators or after
- a failure will not be selected as hidden service directories within the
- next 24 hours. However, some clients might still think that these nodes
- are responsible for certain descriptors, because they work on the basis
- of network consensuses that are up to three hours old. The directory
- nodes should be able to serve the previously received descriptors to
- these clients. Therefore, directory nodes make all received descriptors
- persistent and load previously received descriptors on startup.
- 5. Store and Serve Descriptors Regardless of Responsibility
- Currently, directory nodes only accept descriptors for which they think
- they are responsible. This may lead to problems when a directory node
- uses an older or newer network consensus than hidden service or client
- or when a directory node has been restarted recently. In fact, there are
- no security issues in storing or serving descriptors for which a
- directory node thinks it is not responsible. To the contrary, doing so
- may improve reliability in border cases. As a result, a directory node
- does not pay attention to responsibilty when receiving a publication or
- fetch request, but stores or serves the requested descriptor. Likewise,
- the directory node does not remove descriptors when it thinks it is not
- responsible for them any more.
- 6. Avoid Periodic Descriptor Re-Publication
- In the current implementation a hidden service re-publishes its
- descriptor either when its content changes or an hour elapses. However,
- the evaluation has shown that failures of hidden service directory nodes,
- i.e. of nodes that have not failed within the last 24 hours, are very
- rare. Together with making descriptors persistent on directory nodes,
- there is no necessity to re-publish descriptors hourly.
- The only two events leading to descriptor re-publication should be a
- change of the descriptor content and a new directory node becoming
- responsible for the descriptor. Hidden services should therefore consider
- re-publication every time they learn about a new network consensus
- instead of hourly.
- 7. Discard Expired Descriptors
- The current implementation lets directory nodes keep a descriptor for two
- days before discarding it. However, with the v2 design, descriptors are
- only valid for at most one day. Directory nodes should determine the
- validity of stored descriptors and discard them one hour after they have
- expired (to compensate wrong clocks on clients).
- 8. Shorten Client-Side Descriptor Fetch History
- When clients try to download a hidden service descriptor, they memorize
- fetch requests to directory nodes for up to 15 minutes. This allows them
- to request all replicas of a descriptor to avoid bad or failing directory
- nodes, but without querying the same directory node twice.
- The downside is that a client that has requested a descriptor without
- success, will not be able to find a hidden service that has been started
- during the following 15 minutes after the client's last request.
- This can be improved by shortening the fetch history to only 5 minutes.
- This time should be sufficient to complete requests for all replicas of a
- descriptor, but without ending in an infinite request loop.
- Compatibility:
- All proposed improvements are compatible to the currently implemented
- design as described in proposal 114.
|