Filename: 104-short-descriptors.txt
Title: Long and Short Router Descriptors
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created:
Status: Open

Overview:

  This document proposes moving unused-by-clients information from regular
  router descriptors into a new "extra info" router descriptor.

Proposal:

  Some of the costliest fields in the current directory protocol are ones
  that no client actually uses.  In particular, the "read-history" and
  "write-history" fields are used only by the authorities for monitoring the
  status of the network.  If we took them out, the size of a compressed list
  of all the routers would fall by about 60%.  (No other disposable field
  would save much more than 2%.)

  We propose to remove these fields from descriptors, and and have them
  uploaded as a part of a separate signed "extra info" to the authorities.
  This document will be signed.  A hash of this document will be included in
  the regular descriptors.

  (We considered another design, where routers would generate and upload a
  short-form and a long-form descriptor.  Only the short-form descriptor would
  ever be used by anybody for routing.  The long-form descriptor would be
  used only for analytics and other tools.   We decided against this because
  well-behaved tools would need to download short-form descriptors too (as
  these would be the only ones indexed), and hence get redundant info. Badly
  behaved tools would download only long-form descriptors, and expose
  themselves to partitioning attacks and that like.)

Other disposable fields:

  Clients don't need these fields, but removing them doesn't help bandwidth
  enough to be worthwhile.
    contact (save about 1%)
    fingerprint (save about 3%)

  We could represent these fields more succinctly, but removing them would
  only save 1%.  (!)
    reject
    accept
  (Apparently, exit polices are highly compressible.)

  [Does size-on-disk matter to anybody? Some clients and servers don't
   have much disk, or have really slow disk (e.g. USB). And we don't
   store caches compressed right now. -RD]

Specification:

  1. Extra Info Format.

    A "extra info" descriptor contains the following fields:

    "extra-info" Nickname IP FINGERPRINT
        Identifies what router this is an extra info descriptor for.
        FINGERPRINT is encoded in hex, with no spaces.

    "published"
        As currently documented in dir-spec.txt

    "read-history"
    "write-history"
        As currently documented in dir-spec.txt

    "router-signature" NL Signature NL

        A signature of the PKCS1-padded hash of the entire extra info
        document, taken from the beginning of the "extra-info" line, through
        the newline after the "router-signature" line.  An extra info
        document is not valid unless the signature is performed with the
        identity key whose digest matches FINGERPRINT.

    The "extra-info" field is required and MUST appear first.  The
    router-signature field is required and MUST appear last.  All others are
    optional.  As for other documents, unrecognized fields must be ignored.

  2. Existing formats

     Implementations that use "read-history" and "write-history" SHOULD
     accept router descriptors that contain them.  (Prior to 0.2.0.x, this
     information was encoded in ordinary router descriptors.)

     Add these field to router descriptors:
       "extra-info-digest" DIGEST
          DIGEST is a hex-encoded digest of the router's extra-info document,
          as signed in the router's extra-info.  (If this field is absent,
          no extra-info-digest exists.)

       "caches-extra-info"
          Present if this router is a directory cache that provides
          extra-info documents.

  3. New communications rules

     Clients SHOULD generate and upload an extra-info document after each
     descriptor they generate and upload; no more, no less.  Clients MUST
     upload the new descriptor before they upload the new extra-info.

     Authorities receiving an extra-info documents SHOULD verify all of the
     following:
       * They have a router descriptor for some server with a matching
         nickname, IP, and identity fingerprint.
       * That server's identity key has been used to sign the extra-info
         document.
       * The extra-info-digest field in the router descriptor matches
         the digest of the extra-info document.

     Authorities SHOULD try to fetch extra-info documents from one another if
     they do not have one matching the digest declared in a router
     descriptor.

     Caches that are running locally with a tool that needs to use extra-info
     documents MAY download and store extra-info documents.  They should do
     so when they notice that the recommended descriptor has an
     extra-info-digest not matching any extra-info document they currently
     have.  (Caches not running on a host that needs to use extra-info
     documents SHOULD NOT download or cache them.)

  4. New URLs

     http://<hostname>/tor/extra/d/...
     http://<hostname>/tor/extra/fp/...
     http://<hostname>/tor/extra/all.z
        (As for /tor/server/ URLs: supports fetching extra-info documents
        by their digest, by the fingerprint of their servers, or all at
        once.  Only directory authorities are guaranteed to support these
        URLs.)

     http://<hostname>/tor/extra/authority.z
        (The extra-info document for this router.)

     Extra-info documents are uploaded to the same URLs as regular
     router descriptors.


Migration:

  For extra info approach:
     * First:
       * Authorities should accept extra info, and support downloading it.
       * Routers should upload bandwidth info once authorities accept it.
       * Caches should support an option to download and cache it, once
         authorities serve it.
       * Tools should be updated to use locally cached information.
         These tools include:
           lefkada's exit.py script.
           tor26's noreply script and general directory cache.
           https://nighteffect.us/tns/ for its graphs
           and check with or-talk for the rest, once it's time.

     * Once tools that want bandwidth info support fetching it:
       * Have routers stop including bandwidth info in their router
         descriptors.