| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697 | 
							
- Right now as I understand it, there are n big scaling problems heading
 
- our way:
 
- 1) Clients need to learn all the relay descriptors they could use. That's
 
- a lot of bytes through a potentially small pipe.
 
- 2) Relays need to hold open TCP connections to most other relays.
 
- 3) Clients need to learn the whole networkstatus. Even using v3, as
 
- the network grows that will become unwieldy.
 
- 4) Dir mirrors need to mirror all the relay descriptors; eventually this
 
- will get big too.
 
- Here's my plan.
 
- --------------------------------------------------------------------
 
- Piece one: download O(1) descriptors rather than O(n) descriptors.
 
- We need to change our circuit extend protocol so it fetches a relay
 
- descriptor at every 'extend' operation:
 
-   - Client fetches networkstatus, picks guards, connects to one.
 
-   - Client picks middle hop out of networkstatus, asks guard for
 
-     its descriptor, then extends to it.
 
-   - Clients picks exit hop out of networkstatus, asks middle hop
 
-     for its descriptor, then extends to it. Done.
 
- The client needs to ask for the descriptor even if it already has a
 
- copy, because otherwise we leak too much. Also, the descriptor needs to
 
- be padded to some large (but not too large) size to prevent the middle
 
- hops from guessing about it.
 
- The first step towards this is to instrument the current code to see
 
- how much of a win this would actually be -- I am guessing it is already
 
- a win even with the current number of descriptors.
 
- We also would need to assign the 'Exit' flag more usefully, and make
 
- clients pay attention to it when picking their last hop, since they
 
- don't actually know the exit policies of the relays they're choosing from.
 
- We also need to think harder about other implications -- for example,
 
- a relay with a tiny exit policy won't get the Exit flag, and thus won't
 
- ever get picked as an exit relay. Plus, our "enclave exit" model is out
 
- the window unless we figure out a cool trick.
 
- More generally, we'll probably want to compress the descriptors that we
 
- send back; maybe 8k is a good upper bound? I wonder if we could ask for
 
- several descriptors, and bundle back all of the ones that fit in the 8k?
 
- We'd also want to put the load balancing weights into the networkstatus,
 
- so clients can choose fast nodes more often without needing to see the
 
- descriptors. This is a good opportunity for the authorities to be able
 
- to put "more accurate" weights in if they learn to detect attacks. It
 
- also means we should consider running automated audits to make sure the
 
- authorities aren't trying to snooker everybody.
 
- I'm aiming to get Peter Palfrader to tackle this problem in mid 2008,
 
- but I bet he could use some help.
 
- --------------------------------------------------------------------
 
- Piece two: inter-relay communication uses UDP
 
- If relays send packets to/from other relays via UDP, they don't need a
 
- new descriptor for each such link. Thus we'll still need to keep state
 
- for each link, but we won't max out on sockets.
 
- Clearly a lot more work needs to be done here. Ian Goldberg has a student
 
- who has been working on it, and if all goes well we'll be chipping in
 
- some funding to continue that. Also, Camilo Viecco has been doing his
 
- PhD thesis on it.
 
- --------------------------------------------------------------------
 
- Piece three: networkstatus documents get partitioned
 
- While the authorities should be expected to be able to handle learning
 
- about all the relays, there's no reason the clients or the mirrors need
 
- to. Authorities should put a cap on the number of relays listed in a
 
- single networkstatus, and split them when they get too big.
 
- We'd need a good way to have each authority come to the same conclusion
 
- about which partition a given relay goes into.
 
- Directory mirrors would then mirror all the relay descriptors in their
 
- partition. This is compatible with 'piece one' above, since clients in
 
- a given partition will only ask about descriptors in that partition.
 
- More complex versions of this design would involve overlapping partitions,
 
- but that would seem to start contradicting other parts of this proposal
 
- right quick.
 
- Nobody is working on this piece yet. It's hard to say when we'll need
 
- it, but it would be nice to have some more thought on it before the week
 
- that we need it.
 
- --------------------------------------------------------------------
 
 
  |