12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697 |
- \documentclass{article}
- %\usepackage{palatcm}
- \usepackage{fancyhdr}
- \usepackage{color}
- \usepackage{graphicx}
- \usepackage{hyperref}
- \hypersetup{colorlinks, citecolor=MyDarkRed, filecolor=MyDarkBlue, linkcolor=MyDarkRed, urlcolor=MyDarkBlue}
- \definecolor{MyDarkBlue}{rgb}{0, 0.0, 0.45}
- \definecolor{MyDarkRed}{rgb}{0.45, 0.0, 0}
- \definecolor{MyDarkGreen}{rgb}{0, 0.45, 0}
- \definecolor{MyLightGray}{gray}{.90}
- \definecolor{MyLightGreen}{rgb}{0.5, 0.99, 0.5}
- \newcommand{\thetitle}{Performance Improvements on Tor}
- \title{\thetitle}
- %% Please add your name in here if you contribute
- \author{Steven J. Murdoch}
- \pagestyle{fancy}
- \fancyhf{}
- \fancyhead[C]{\thetitle}
- \fancyfoot[C]{\thepage}
- \begin{document}
- \thispagestyle{plain}
-
- \maketitle
- \section{Altering node selection algorithm}
- Currently Tor selects nodes with a probability proportional to their bandwidth contribution to the network, however this may not be the optimal algorithm.
- Murdoch and Watson investigated the performance impact of different node selection algorithms, and derived a formula for estimating average latency $T$:
- \begin{equation}
- T = \sum_{i=1}^n q_i t_i = \sum_{i=1}^n \frac{q_i x_i (2 - q_i x_i \Lambda)}{2 (1 - q_i x_i \Lambda)}
- \label{eqn:waiting}
- \end{equation}
- Where $q_i$ is the probability of the $i$th node (out of $n$ nodes) being selected, $t_i$ is the average latency at the $i$th node, $x_i$ is the reciprocal of the $i$th node's bandwidth, and $\Lambda$ is the total network load.
- This calculation is subject to a number of assumptions.
- In particular, it assumes that Tor nodes have infinite length queues and input traffic is Poisson distributed.
- Whereas in practise Tor nodes have finite length queues (which controls network load), and the distribution of input cells is not known.
- Unfortunately, these assumptions are necessary to apply standard queueing theory results.
- Despite the simplifications made to the network model, results derived from it may still be useful.
- This is especially the case because it models the entire network, whereas experiments can feasibly change only a few of the clients' behaviour.
- The formula is also amenable to mathematical analysis such as non-linear optimization.
- To try and find the optimum node selection probabilities, I used a hill-climbing algorithm to minimize network latency, with a Tor directory snapshot as input.
- The result depends on the network load relative to overall capacity.
- As load approaches capacity, the optimum selection probabilities converge to the one used by Tor: node bandwidth proportional to network capacity.
- However, as load drops, the optimized selection algorithm favours slow nodes less and faster nodes more; many nodes are not used at all.
- \begin{figure}
- \includegraphics[width=\textwidth]{node-selection/optimum-selection-probabilities}
- \caption{Optimum node selection probabilities for a variety of network loads. Tor is currently at around 50\% utilization. The node selection probabilities currently used by Tor are shown in black.}
- \end{figure}
- \begin{figure}
- \includegraphics[width=\textwidth]{node-selection/relative-selection-probabilities}
- \caption{Difference between Tor's current node selection probabilities and the optimum, for a variety of network loads. For Tor's current network load ($\approx 50$\%) shown in pink, the slowest nodes are not used at all, and the slower nodes are favoured less.}
- \end{figure}
- \section{TLS application record overhead reduction}
- OpenSSL will, by default, insert an empty TLS application record before any one which contains data.
- This is to prevent an attack, by which someone who has partial control over the plaintext of a TLS stream, can also confirm guesses as to the plaintext which he does not control.
- By including an empty application record, which incorporates a MAC, the attacker is made unable to control the CBC initialization vector, and hence does not have control of the input to the encryption function\footnote{\url{http://www.openssl.org/~bodo/tls-cbc.txt}}.
- This application record does introduce an appreciable overhead.
- Most Tor cells are sent in application records of their own, giving application records of 512 bytes (cell) $+$ 20 bytes (MAC) $+$ 12 bytes (TLS padding) $+$ 5 bytes (TLS application record header) $=$ 549 bytes.
- The empty application records contain only 20 bytes (MAC) $+$ 12 bytes (TLS padding) $+$ 5 bytes (TLS application record header) $=$ 37 bytes.
- There is also a 20 byte IP header and 32 byte TCP header.
- Thus the overhead saved by removing the empty TLS application record itself is $37 / (549 + 37 + 20 + 32) = 5.8\%$.
- This calculation is assuming that the same number of IP packets will be sent, because currently Tor sends packets, with only one cell, far smaller than the path MTU.
- If Tor were to pack cells optimally efficiently into packets, then removing the empty application records would also reduce the number of packets, and hence TCP/IP headers, that needed to be sent.
- The reduction in TCP/IP header overhead would be $37/(549 + 37) = 6.3\%$.
- Of course, the empty application record was inserted for a reason -- to prevent an attack on the CBC mode of operation used by TLS, so before removing it we must be confident the attack does not apply to Tor.
- Ben Laurie (one of the OpenSSL developers), concluded that in his opinion Tor could safely remove the insertion of empty TLS application records\footnote{\url{http://archives.seul.org/or/dev/Dec-2008/msg00005.html}}.
- I was able to come up with only certificational weaknesses (discussed in the above analysis), which are expensive to exploit and give little information to the attacker.
- To be successful, the attacker must have full control of the plaintext application record before the one he wishes to guess.
- Tor makes this difficult because all cells where the payload is controlled by the attacker are prepended with a two byte circuit ID, unknown to the attacker.
- Also, because the majority of cells sent in Tor are encrypted by a key not known by the attacker, the probability that an attacker can guess what a cell might be is extremely small.
- The exception is a padding cell, which has no circuit ID and a zero length payload, however Tor does not currently send padding cells, other than as a periodic keep-alive.
- \end{document}
|