Filename: 104-short-descriptors.txt Title: Long and Short Router Descriptors Author: Nick Mathewson Created: Jan 2007 Status: Closed Implemented-In: 0.2.0.x Overview: This document proposes moving unused-by-clients information from regular router descriptors into a new "extra info" router descriptor. Proposal: Some of the costliest fields in the current directory protocol are ones that no client actually uses. In particular, the "read-history" and "write-history" fields are used only by the authorities for monitoring the status of the network. If we took them out, the size of a compressed list of all the routers would fall by about 60%. (No other disposable field would save much more than 2%.) We propose to remove these fields from descriptors, and and have them uploaded as a part of a separate signed "extra info" to the authorities. This document will be signed. A hash of this document will be included in the regular descriptors. (We considered another design, where routers would generate and upload a short-form and a long-form descriptor. Only the short-form descriptor would ever be used by anybody for routing. The long-form descriptor would be used only for analytics and other tools. We decided against this because well-behaved tools would need to download short-form descriptors too (as these would be the only ones indexed), and hence get redundant info. Badly behaved tools would download only long-form descriptors, and expose themselves to partitioning attacks.) Other disposable fields: Clients don't need these fields, but removing them doesn't help bandwidth enough to be worthwhile. contact (save about 1%) fingerprint (save about 3%) We could represent these fields more succinctly, but removing them would only save 1%. (!) reject accept (Apparently, exit polices are highly compressible.) [Does size-on-disk matter to anybody? Some clients and servers don't have much disk, or have really slow disk (e.g. USB). And we don't store caches compressed right now. -RD] Specification: 1. Extra Info Format. An "extra info" descriptor contains the following fields: "extra-info" Nickname Fingerprint Identifies what router this is an extra info descriptor for. Fingerprint is encoded in hex (using upper-case letters), with no spaces. "published" As currently documented in dir-spec.txt. It MUST match the "published" field of the descriptor published at the same time. "read-history" "write-history" As currently documented in dir-spec.txt. Optional. "router-signature" NL Signature NL A signature of the PKCS1-padded hash of the entire extra info document, taken from the beginning of the "extra-info" line, through the newline after the "router-signature" line. An extra info document is not valid unless the signature is performed with the identity key whose digest matches FINGERPRINT. The "extra-info" field is required and MUST appear first. The router-signature field is required and MUST appear last. All others are optional. As for other documents, unrecognized fields must be ignored. 2. Existing formats Implementations that use "read-history" and "write-history" SHOULD continue accepting router descriptors that contain them. (Prior to 0.2.0.x, this information was encoded in ordinary router descriptors; in any case they have always been listed as opt, so they should be accepted anyway.) Add these fields to router descriptors: "extra-info-digest" Digest "Digest" is a hex-encoded digest (using upper-case characters) of the router's extra-info document, as signed in the router's extra-info. (If this field is absent, no extra-info-digest exists.) "caches-extra-info" Present if this router is a directory cache that provides extra-info documents, or an authority that handles extra-info documents. (Since implementations before 0.1.2.5-alpha required that the "opt" keyword precede any unrecognized entry, these keys MUST be preceded with "opt" until 0.1.2.5-alpha is obsolete.) 3. New communications rules Servers SHOULD generate and upload one extra-info document after each descriptor they generate and upload; no more, no less. Servers MUST upload the new descriptor before they upload the new extra-info. Authorities receiving an extra-info document SHOULD verify all of the following: * They have a router descriptor for some server with a matching nickname and identity fingerprint. * That server's identity key has been used to sign the extra-info document. * The extra-info-digest field in the router descriptor matches the digest of the extra-info document. * The published fields in the two documents match. Authorities SHOULD drop extra-info documents that do not meet these criteria. Extra-info documents MAY be uploaded as part of the same HTTP post as the router descriptor, or separately. Authorities MUST accept both methods. Authorities SHOULD try to fetch extra-info documents from one another if they do not have one matching the digest declared in a router descriptor. Caches that are running locally with a tool that needs to use extra-info documents MAY download and store extra-info documents. They should do so when they notice that the recommended descriptor has an extra-info-digest not matching any extra-info document they currently have. (Caches not running on a host that needs to use extra-info documents SHOULD NOT download or cache them.) 4. New URLs http:///tor/extra/d/... http:///tor/extra/fp/... http:///tor/extra/all[.z] (As for /tor/server/ URLs: supports fetching extra-info documents by their digest, by the fingerprint of their servers, or all at once. When serving by fingerprint, we serve the extra-info that corresponds to the descriptor we would serve by that fingerprint. Only directory authorities are guaranteed to support these URLs.) http:///tor/extra/authority[.z] (The extra-info document for this router.) Extra-info documents are uploaded to the same URLs as regular router descriptors. Migration: For extra info approach: * First: * Authorities should accept extra info, and support serving it. * Routers should upload extra info once authorities accept it. * Caches should support an option to download and cache it, once authorities serve it. * Tools should be updated to use locally cached information. These tools include: lefkada's exit.py script. tor26's noreply script and general directory cache. https://nighteffect.us/tns/ for its graphs and check with or-talk for the rest, once it's time. * Set a cutoff time for including bandwidth in router descriptors, so that tools that use bandwidth info know that they will need to fetch extra info documents. * Once tools that want bandwidth info support fetching extra info: * Have routers stop including bandwidth info in their router descriptors.