documentation.tex 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386
  1. \documentclass[11pt]{article}
  2. \hoffset=-.25in
  3. \voffset=-.25in
  4. \oddsidemargin=-.4in
  5. \evensidemargin=0in
  6. \topmargin=-.5in
  7. \textwidth=7.8in
  8. \textheight=8.5in
  9. %\usepackage[a4paper,landscape]{geometry}
  10. \usepackage{amsmath,amsthm,amsfonts,amssymb}
  11. \usepackage{bussproofs}
  12. \usepackage{pdflscape}
  13. \usepackage{hyperref}
  14. \usepackage{booktabs}
  15. \usepackage{subcaption}
  16. \usepackage[parfill]{parskip}
  17. \usepackage{mathtools}
  18. \DeclarePairedDelimiter{\ceil}{\lceil}{\rceil}
  19. \usepackage{algorithm2e}
  20. \usepackage{times}
  21. %%Tikz code%%
  22. \usepackage{tikz}
  23. \usetikzlibrary{arrows,snakes,shapes, decorations.markings}
  24. \tikzset{quartarr/.style={decoration={markings, mark=at position 0.2 with {\arrow[line width=1.5pt]{angle 90}}}, postaction={decorate}}}
  25. \tikzset{midarr/.style={decoration={markings, mark=at position 0.5 with {\arrow[line width=1.5pt]{angle 90}}}, postaction={decorate}}}
  26. \newtheorem{thm}{Theorem}
  27. \theoremstyle{definittion}
  28. \newtheorem{defn}{Definition}
  29. \def\Z{\mathbb Z}
  30. \def\R{\mathbb R}
  31. \def\F{\mathbb F}
  32. \def\C{\mathbb C}
  33. \def\A{\mathbb A}
  34. \def\N{\mathbb N}
  35. \def\Q{\mathbb Q}
  36. %% HoTT shortcuts %%
  37. \def\ctx{\texttt{ctx}}
  38. \def\ap{\texttt{ap}}
  39. \def\pred{\texttt{pred}}
  40. \def\succ{\texttt{succ}}
  41. \def\map{\texttt{map}}
  42. \def\nil{\texttt{nil}}
  43. \def\cons{\texttt{cons}}
  44. \def\List{\texttt{List}}
  45. \def\type{\texttt{Type}}
  46. \def\bool{\texttt{Bool}}
  47. \def\leaves{\texttt{leaves}}
  48. \def\inl{\texttt{inl}}
  49. \def\inr{\texttt{inr}}
  50. \def\0{{\bf 0}}
  51. \def\1{{\bf 1}}
  52. \def\ind{\texttt{ind}}
  53. \def\rec{\texttt{rec}}
  54. \def\refl{\texttt{refl}}
  55. \def\id{\texttt{id}}
  56. \def\db{\texttt{db}}
  57. %\def\span{{\bf span}}
  58. \def\U{\mathcal{U}}
  59. \begin{document}
  60. \noindent
  61. {\sffamily CrySP Research Lab \hfill \bf\sffamily Cecylia Bocovich}
  62. \bigskip\medskip
  63. \centerline{{\large\bf\sffamily Slitheen Documentation} \hfill Last updated: \today}
  64. \section{High Level Overview}
  65. \subsection{Tagging Procedure}
  66. Slitheen uses a tagging procedure very similar to that of Telex~\cite{wustrow2011}, but with a small modification to detect a MiTM or RAD attack. This tagging procedure requires a slight modification to TLSv1.2, outlined as follows:
  67. \begin{figure}[h]
  68. \centering
  69. \includegraphics[width=.75\textwidth]{tlsmods}
  70. \caption{Modifications to TLSv1.2 handshake}
  71. \end{figure}
  72. To implement these changes, we have made slight modifications to the OpenSSL source code:
  73. \texttt{git clone -b slitheen git://git-crysp.uwaterloo.ca/openssl}.
  74. These modifications consist mostly of optional callbacks are were made by modifying as little source code as possible, to ease code maintenance.
  75. \subsubsection{Tag generation}
  76. The tag is generated by computing a 21-byte ECDH private key $s$ for the client (along with the corresponding 21-byte public key, $g^s$. We used a custom curve to generate public keys that fit in 21 bytes of the ClientHello random nonce. First, the client randomly chooses between the curve and its twist, in order to maximize the sampling of points that fill the first 21 bytes of the tag. The relay station will check the tag for both the orignal curve and the twist when verifying the tag.
  77. The client's public key is concatenated with a 7-byte value, $H_1(g^{rs} || \chi)$ where $g^{rs}$ is computed by raising the relay station's public key $g^r$ to the client's private key $s$. The context string $\chi$ is just the 4-byte server IP address concatenated with the first 4 bytes of the ClientHello random nonce (in network byte order). The hash function, $H_1$, is the first 7 bytes of SHA256 output, while $H_2$ is the last 16 bytes. The shared secret key between the client and the relay station is computed as $k_{sh} = H_2(g^{rs}||\chi)$.
  78. \subsubsection{DH parameter generation}
  79. To derive the TLS client key exchange parameters, the client computes their secret key as the output of $$PRF_{k_{sh}}(``\texttt{SLITHEEN\_KEYGEN}")$$.
  80. The PRF function here is as defined in the TLSv1.2 RFC\footnote{https://tools.ietf.org/html/rfc5246}, where the hash function used is SHA256 for all cipher suites.
  81. \subsubsection{Modified Finished message}
  82. We change the context of the downstream Finished message, sent from the server to the client. The relay station computes a new Finished message, replacing the previous hash with a MAC keyed with the client-relay shared secret $k_{sh}$:
  83. $$MAC_{k_{sh}}(\texttt{Finished\_hash} || \chi) $$
  84. \subsection{Data Replacement}
  85. Data from the client is modified in both the upstream and downstream direction. Upstream from the client, the client's Slitheen ID along with data to the covert site is included in an additional X-Slitheen header to be extracted and replaced with an X-Ignore header and garbage by the relay station.
  86. Downstream data from the covert site to the client is stored at the relay station until it can be inserted in place of resources that have a leaf content-type (the img supertype).
  87. \subsubsection{Upstream Data from client}
  88. At the start of a flow, the user sends their slitheen ID as the first information in an X-Slitheen header of outgoing HTTP requests. This is followed by a slitheen\_upstream\_header with the stream id of data and the data length. Each stream id indicates a different connection from the client to a covert server.
  89. If a stream is being opened for the first time, the data that follows the slitheen upstream header includes a SOCKS Connect request to the censored site and the first few upstream bytes from the client. Otherwise it includes upstream bytes from the client to be relayed to the censored site. All of the above information is base64 encoded and inserted into an X-Slitheen header.
  90. When the relay station receives the upstream data from a tagged flow, it searches for an X-Slitheen header. If found, it decodes the base64 strings, extracts the information, attributes the flow to the client specified by their slitheen ID.
  91. It then finds connection for the indicated stream or creates a new one and opens a connection the covert site. The station relays data between the client and the covert site by sending bytes through this connection, and saves the responses in a downstream queue for the client specified by the slitheen ID.
  92. When the covert site responds with information, this data is stored in blocks at the relay station as part of the client's downstream queue. Each block contains information about the stream ID, and the length of the block.
  93. \subsubsection{Downstream Data to client}
  94. When resources come back from the overt site, they are decrypted when allowed, and their content type and length is stored in the HTTP state of the flow. To fill the contents of a packet of length $n$, the relay station determines the HTTP state, the content type of the resource, and the garbage bytes needed to pad the AES-CBC encrypted data.
  95. The relay station then updates the HTTP state of the flow and processes the next packet.
  96. \section{Protocol Details}
  97. \subsection{TLS v1.2 Handshake Modifications}
  98. Slitheen uses a slightly modified TLS handshake, similar to Telex~\cite{wustrow2011}. In order to properly process handshake messages, the relay must reconstruct them before processing. In the case of all messages preceding the Finished message, this can be done without blocking. Since we are modifying the Finished message hash, this message must be held at the station until it is reconstructed, verified, and replaced.
  99. We fully support session resumption through session IDs and session tickets. The state machine for the relay station is very similar to the OpenSSL state machine, I will only describe here the modifications we make.
  100. We assume that the relay station has generated an elliptic curve public-private key pair $(r, g^r)$, and has distributed its public key, $g^r$, to the client. %TODO: include details on curve
  101. Definitions for PRF and unaltered handshake messages are as defined in RFC 5246~\cite{rfc5246}.
  102. \subsubsection{Client}
  103. The states we care about for the client are:
  104. \begin{itemize}
  105. \item SEND\_CLNT\_HELLO
  106. \item SEND\_CLNT\_KEY\_EXCHANGE
  107. \item RECV\_SRVR\_FINISHED
  108. \item SEND\_CLNT\_FINISHED
  109. \end{itemize}
  110. \textbf{SEND\_CLNT\_HELLO}
  111. A client first generates an elliptic curve public-private key pair $(s, g^s)$.
  112. When the client sends the ClientHello message, they replace the 32-byte random nonce with their elliptic curve pubic key point and a verification hash. These values concatenated comprise a 28-byte \emph{tag}.
  113. \begin{figure}[h]
  114. \begin{subfigure}{\textwidth}
  115. \centering
  116. \includegraphics{old_client_hello}
  117. \caption{Original TLS CientHello random nonce}
  118. \end{subfigure}
  119. \vspace{5mm}
  120. \begin{subfigure}{\textwidth}
  121. \centering
  122. \includegraphics{new_client_hello}
  123. \caption{Modified (tagged) TLS CientHello random nonce}
  124. \end{subfigure}
  125. \caption{Modifications to the ClientHello message}
  126. \end{figure}
  127. The client then computes a shared secret with the relay station of the form $g^{rs}$. The 7-byte verification hash is the first 56 bits of SHA$256(g^{rs||\chi})$ where $\chi = $ \texttt{server\_ip||UNIX\_timestamp||TLS\_session\_id}.
  128. \textbf{SEND\_CLNT\_KEY\_EXCHANGE}
  129. The client computes their key exchange parameters from the shared secret with the relay station. In particular, their private key is the output of a PRF on the client-relay shared secret $h^{rs}$, and a constant string.
  130. For example, in a ciphersuite using EDH or ECDH, the client generates their private key as follows:
  131. $$a = \texttt{PRF}(h^{rs} || \texttt{"SLITHEEN\_KEYGEN"})$$
  132. Afterwards, $h^a$ is sent to the server in the usual manner.
  133. \textbf{RECV\_SRVR\_FINISHED}
  134. The client should verify the Finished MAC of the server's Finished message after decrypting it. The MAC can be of one of two possible acceptable forms:
  135. \begin{enumerate}
  136. \item The usual PRF output of the hashed, previously seen handshake messages:
  137. $$\texttt{Finished\_MAC} = \texttt{PRF}(\texttt{FINISHED\_CONST}||\texttt{Hash(handshake\_messages}))$$
  138. \item A modified MAC with an extra input (based on the client-relay shared secret) hashed in with the handshake messages:
  139. $$\texttt{extra\_input} = \texttt{PRF}(g^{rs} || \texttt{"SLITHEEN\_FINISH"})$$
  140. $$\texttt{Finished\_MAC} = \texttt{PRF}(\texttt{FINISHED\_CONST}||\texttt{Hash(handshake\_messages} || \texttt{extra\_input}))$$
  141. \end{enumerate}
  142. If the client receives the first hash, it should proceed loading the page, but not use the session for decoy routing. The receipt of a normal finished message indicates possible problems with the relay station, a MiTM attack, or a RAD attack.
  143. If the client receives the second hash, it knows that the relay station has intercepted the flow and it is safe to use for decoy routing.
  144. \textbf{SEND\_CLNT\_FINISHED}
  145. The client should send a Finished message back to the server with the previously seen messages hashed into the Finished messages hash, with one exception. The client should compute the MAC expected from the server, before modification by the relay station (MAC 1 above). A Finished message with this MAC should be hashed into the client's Finished message MAC instead of the one the client received.
  146. \subsubsection{Relay Station}
  147. Upon receipt the receipt of any handshake message, the relay station hashes it into it's Finished MAC computation. Additionally, if the ClientHello and ServerHello messages indicate a session resumption, the relay station computes the Finished message MACS and reads the TLS master secret for the session from the new random nonces and its saved session store, respectively.
  148. The states we care about for the relay are:
  149. \begin{itemize}
  150. \item RECV\_CLNT\_HELLO
  151. \item RECV\_SRVR\_KEY\_EXCHANGE
  152. \item RECV\_SRVR\_FINISHED
  153. \item RECV\_CLNT\_FINISHED
  154. \end{itemize}
  155. \textbf{RECV\_CLNT\_HELLO}
  156. When the relay station receives a ClientHello message from any client, it first checks to see if the random nonce is tagged. To do so, it extracts the client's 21-byte public key, $g^s$, and computes the shared secret $g^{sr} = g^{rs}$. It uses this value to compute the verification hash SHA$256(g^{rs||\chi})$ and checks this against the last 7 bytes of the random nonce. If the two values are equal, the relay station saves the shared key and flow information (ip addresses and port numbers) to identify later handshake messages in the same flow.
  157. \textbf{RECV\_SRVR\_KEY\_EXCHANGE}
  158. Upon receipt of the server key exchange parameters, the relay station is able to compute the TLS master secret. Since the client's private key is computed from the client-relay shared secret, the relay station can compute the client's private key parameter:
  159. $$a = \texttt{PRF}(g^{rs} || \texttt{"SLITHEEN\_KEYGEN"})$$
  160. The server's public key, $h^b$, along with the client's private key $a$ allows the relay station to compute the TLS master secret:
  161. $$\texttt{master\_secret} = \texttt{PRF} (h^{ab} || \texttt{MASTER\_SECRET\_CONST} || \texttt{client\_random} || \texttt{server\_random})$$
  162. \textbf{RECV\_SRVR\_FINISHED}
  163. When the relay station receives an encrypted Finished message for a tagged flow, it should have already computed the TLS master secret for the flow.
  164. The relay station then attempts to decrypt the Finished message.
  165. If the decryption was unsuccessful, the relay station removes the flow from its memory and forwards the message to the client unchanged.
  166. If the decryption was successful, the relay station verifies the Finish message hash with against the handshake messages seen so far. If the hash doesn't equal what the relay station expects, the flow is removed from its memory and the message is forwarded unchanged.
  167. %TODO: update with new method, when implemented
  168. If the Finished message verifies correctly, the relay station replaces it with a different, modified hash. It feeds the output of a PRF seeded with the client-relay shared secret and a constant into the Finished hash computation and computes the final MAC for the Finished message. The purpose of this step is to alert the client to the fact that the flow has been successfully tagged and intercepted by a relay station. The Finished MAC sent to the client is computed as follows:
  169. $$\texttt{extra\_input} = \texttt{PRF}(g^{rs} || \texttt{"SLITHEEN\_FINISH"})$$
  170. $$\texttt{Finished\_MAC} = \texttt{PRF}(\texttt{FINISHED\_CONST}||\texttt{Hash(handshake\_messages} || \texttt{extra\_input}))$$
  171. \textbf{RECV\_CLNT\_FINISHED}
  172. The relay station should verify that the client's Finished message MAC contains a hash of all handshake messages seen so far (without the addition of the extra input based on the client-relay shared secret).
  173. \subsection{Upstream Application Data (Client $\rightarrow$ Covert)}
  174. SOCKS data from the client to the covert site is sent in upstream slitheen blocks. These blocks contain a 4-byte header that indicates the 2-byte stream ID of the data (which SOCKS connection it belongs to), and the length.
  175. \subsubsection{client}
  176. When the client first starts a new browsing session, they generate a 28-byte slitheen\_ID. This ID is sent to the Overt User Simulator (OUS) to be included in the X-Slitheen header of every outgoing HTTP request from the client to the overt server in a tagged flow.
  177. When the client's browser requests a new connection, the SOCKS frontend generates a new stream ID for that connection. It then sends the SOCKS connect request and upstream bytes from the client to the relay station in base64-encoded upstream slitheen blocks to the OUS. The OUS then sends these space-delimited, encoded blocks in the X-Slitheen header of outgoing overt HTTP requests, along with the client's slitheen\_ID.
  178. \subsubsection{relay}
  179. Upon receiving a new HTTP request from the client, the relay station decrypts the request and searches for an X-Slitheen header. The contents of this header include a series of space-delimited base64 encoded strings. The relay station decodes the strings to retrieve the client's slitheen\_ID and the upstream slitheen blocks containing SOCKS data.
  180. At this point, the relay station can associate the flow with the client's slitheen\_ID and corresponding downstream data queue.
  181. If the blocks contain a SOCKS connect request for a new stream, the relay station spins off a new thread for that stream and makes a TCP connection to the covert server. It then sends the SOCKS data from the client through this connection and saves the covert site's responses in the downstream data queue indicated by the client's slitheen\_ID.
  182. The relay station should then replace the X-Slitheen header with an X-Ignore header and garbage bytes to protect the client's identity and browsing habits.
  183. \subsection{Downstream Application Data (Covert $\rightarrow$ Client)}
  184. SOCKS data from the covert site is delivered to the client (according to the slitheen\_ID of the flow) in downstream slitheen blocks. Each slitheen block contains a 16-byte header containing the stream ID of the data, a counter to indicate the order of blocks in the same stream, the length of SOCKS data, the length of garbage bytes, and a padding of zeros. This header is AES encrypted in ECB mode with a key generated from the slitheen\_ID. The body of the slitheen block is AES encrypted in CBC mode with a key generated as a part of the same key block:
  185. $$\texttt{slitheen\_key\_block} = \texttt{PRF}(\texttt{slitheen\_ID}) || \texttt{"SLITHEEN\_SUPER\_ENCRYPT"})$$
  186. $$\texttt{slitheen\_header\_key} = \texttt{slitheen\_key\_block}[0:\texttt{key\_len}-1] $$
  187. $$\texttt{slitheen\_body\_key} = \texttt{slitheen\_key\_block}[\texttt{key\_len}:2\texttt{key\_len}-1] $$
  188. If there are any bytes in the packet left over after reducing the size to fit in 16-byte blocks, this is filled with randomly generated garbage bytes and the garbage length field in the downstream slitheen header is set to reflect the amount of garbage padding.
  189. A block that contains only garbage indicates that there was no downstream data for the client queued at the relay station.
  190. \begin{figure}[h]
  191. \begin{subfigure}{\textwidth}
  192. \centering
  193. \includegraphics[width=.75\textwidth]{downstream_slitheen_header}
  194. \caption{Header for downstream slitheen blocks}
  195. \end{subfigure}
  196. \vspace{5mm}
  197. \begin{subfigure}{\textwidth}
  198. \centering
  199. \includegraphics[width=.75\textwidth]{downstream_slitheen_block}
  200. \caption{Full, encrypted downstream slitheen block}
  201. \end{subfigure}
  202. \caption{Format of SOCKS data from the covert site to the client, injected in the place of leaf resources}
  203. \end{figure}
  204. \subsubsection{client}
  205. When the client's Overt User Simular (OUS) receives a resource of Content-Type slitheen, it sends the length of the resource and the HTTP response body to the client's SOCKS proxy front-end to be parsed and sent to the client's browser.
  206. The SOCKS proxy frontend parses these resources to extract downstream slitheen blocks. After receiving the resource length, it parses downstream slitheen blocks one-by-one, first decrypting the header to determine the length, then decrypting the slitheen block body to retrieve the SOCKS data. The SOCKS proxy then writes this data to the browser, using the connection indicated by the stream ID in the header. The proxy skips garbage bytes, and moves on to decrypting the next downstream slitheen block.
  207. It is possible for downstream slitheen blocks to arrive in a resource out of order. If this happens, the SOCKS proxy should consult the counter field of the header and hold premature blocks until their predecessors have arrived before writing data to the browser.
  208. The SOCKS proxy front-end should also verify the zero padding at the end of the downstream header.
  209. \subsubsection{relay}
  210. The relay station saves the TLS record and HTTP state of each flow to determine when the contents of packets from the server to the client can be replaced. The goal is to only replace the body of HTTP responses with a leaf content type, and to change the content type in the header to ``slitheen''. Replaced content is contained in downstream slitheen blocks.
  211. \texttt{Content-Type: slitheen}
  212. A flow can be in one of three TLS record states when a new packet is received:
  213. \begin{itemize}
  214. \item BEGIN\_RECORD
  215. \item MID\_RECORD
  216. \item END\_RECORD
  217. \end{itemize}
  218. If the packet contains an entire TLS record, this record can be decrypted and used to update the HTTP state of the flow.
  219. A flow can be in one of 7 HTTP states when a new packet is received:
  220. \begin{itemize}
  221. \item BEGIN\_HEADER
  222. \item MID\_HEADER
  223. \item BEGIN\_CHUNK
  224. \item MID\_CHUNK
  225. \item END\_CHUNK
  226. \item RESPONSE\_BODY
  227. \item UNKNOWN
  228. \end{itemize}
  229. \textbf{BEGIN\_HEADER and MID\_HEADER}
  230. The header of an HTTP response must be decrypted to determine the HTTP state of the flow in future packets. The header contains the content-type of the response and the length of the response, specified either in chunks as indicated by the transfer encoding header or in the content length header field.
  231. Upon receiving and decrypting an HTTP response header, the relay station should determine the value of the Content-Type header. If it is a leaf content type (e.g., ``image/*'' or ``text/plain''), the relay station should change the value of Content-Type to ``slitheen'' (padded with spaces to preserve length). The length/transfer encoding of the response is then saved in the flow's state.
  232. An HTTP header ends with the string ``CR LF CR LF''.
  233. \textbf{BEGIN\_CHUNK}
  234. If the transfer encoding is chunked, the length of the response is determined by the length of each chunk received. Each chunk begins with a string indicating the length and a newline ``[length] CR LF''. When a new chunk begins, the relay station must decrypt the length to determine the number of bytes in subsequent records that belong to the chunk, before the next chunk begins.
  235. \textbf{MID\_CHUNK and RESPONSE\_BODY}
  236. In the middle of a chunk (or in the response body for resources whose length is specified by the Content-Length header), records can be replaced with downstream content without being decrypted. If a record is longer than the received packet, the relay station should calculate an entire record's worth of encrypted downstream content.
  237. If a newly received record is longer than the packet and the remaining chunk or response body length, the relay station will be unable to properly calculate the entire record. In this case, the relay station should forward the record unchanged, leaving the client's SOCKS5 frontend to parse only legitimate slitheen blocks.
  238. \textbf{END\_CHUNK}
  239. In a transfer-encoded resource, chunks are terminated with a ``CRLF'' before ending the stream or beginning a new chunk. The relay station should verify a chunk has ended by checking for this string. Failure to verify this could indicate that a problem occurred with the HTTP state calculation.
  240. \textbf{UNKNOWN}
  241. A flow's HTTP state can become unknown in the following cases:
  242. \begin{itemize}
  243. \item The relay station received a record too big to decrypt while in the PARSE\_HEADER or MID\_HEADER states.
  244. \item The relay station received a record too big to decrypt that extended past the end of a resource while in the MID\_CHUNK or RESPONSE\_BODY states.
  245. \end{itemize}
  246. A flow can recover from this unknown state by saving the contents of seen packets and parsing them after-the-fact. Knowledge of the state will likely come too late to replace the resource, but may be useful if more than one resource is sent in a single flow.
  247. \begin{figure}[h]
  248. \includegraphics[width=\textwidth]{httpstate}
  249. \caption{State machine of HTTP states for a tagged flow. Packets received in a green state can be replaced without decrypting the TLS record. At a red state, an incoming record must be entirely contained in the packet contents and decrypted to determine future states of the flow and use the resource for downstream data. Failure to decrypt a record in a red state will result in an unknown HTTP state for the flow.}
  250. \end{figure}
  251. \nocite{*}
  252. \bibliographystyle{abbrv}
  253. \bibliography{documentation}
  254. \end{document}