xxx-draft-spec-for-TLS-normalization.txt 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330
  1. Filename: xxx-draft-spec-for-TLS-normalization.txt
  2. Title: Draft spec for TLS certificate and handshake normalization
  3. Author: Jacob Appelbaum
  4. Created: 16-Feb-2011
  5. Status: Draft
  6. Draft spec for TLS certificate and handshake normalization
  7. Overview
  8. Scope
  9. This is a document that proposes improvements to problems with Tor's
  10. current TLS (Transport Layer Security) certificates and handshake that will
  11. reduce the distinguishability of Tor traffic from other encrypted traffic that
  12. uses TLS. It also addresses some of the possible fingerprinting attacks
  13. possible against the current Tor TLS protocol setup process.
  14. Motivation and history
  15. Censorship is an arms race and this is a step forward in the defense
  16. of Tor. This proposal outlines ideas to make it more difficult to
  17. fingerprint and block Tor traffic.
  18. Goals
  19. This proposal intends to normalize or remove easy-to-predict or static
  20. values in the Tor TLS certificates and with the Tor TLS setup process.
  21. These values can be used as criteria for the automated classification of
  22. encrypted traffic as Tor traffic. Network observers should not be able
  23. to trivially detect Tor merely by receiving or observing the certificate
  24. used or advertised by a Tor relay. I also propose the creation of
  25. a hard-to-detect covert channel through which a server can signal that it
  26. supports the third version ("V3") of the Tor handshake protocol.
  27. Non-Goals
  28. This document is not intended to solve all of the possible active or passive
  29. Tor fingerprinting problems. This document focuses on removing distinctive
  30. and predictable features of TLS protocol negotiation; we do not attempt to
  31. make guarantees about resisting other kinds of fingerprinting of Tor
  32. traffic, such as fingerprinting techniques related to timing or volume of
  33. transmitted data.
  34. Implementation details
  35. Certificate Issues
  36. The CN or commonName ASN1 field
  37. Tor generates certificates with a predictable commonName field; the
  38. field is within a given range of values that is specific to Tor.
  39. Additionally, the generated host names have other undesirable properties.
  40. The host names typically do not resolve in the DNS because the domain
  41. names referred to are generated at random. Although they are syntatically
  42. valid, they usually refer to domains that have never been registered by
  43. any domain name registrar.
  44. An example of the current commonName field: CN=www.s4ku5skci.net
  45. An example of OpenSSL’s asn1parse over a typical Tor certificate:
  46. 0:d=0 hl=4 l= 438 cons: SEQUENCE
  47. 4:d=1 hl=4 l= 287 cons: SEQUENCE
  48. 8:d=2 hl=2 l= 3 cons: cont [ 0 ]
  49. 10:d=3 hl=2 l= 1 prim: INTEGER :02
  50. 13:d=2 hl=2 l= 4 prim: INTEGER :4D3C763A
  51. 19:d=2 hl=2 l= 13 cons: SEQUENCE
  52. 21:d=3 hl=2 l= 9 prim: OBJECT :sha1WithRSAEncryption
  53. 32:d=3 hl=2 l= 0 prim: NULL
  54. 34:d=2 hl=2 l= 35 cons: SEQUENCE
  55. 36:d=3 hl=2 l= 33 cons: SET
  56. 38:d=4 hl=2 l= 31 cons: SEQUENCE
  57. 40:d=5 hl=2 l= 3 prim: OBJECT :commonName
  58. 45:d=5 hl=2 l= 24 prim: PRINTABLESTRING :www.vsbsvwu5b4soh4wg.net
  59. 71:d=2 hl=2 l= 30 cons: SEQUENCE
  60. 73:d=3 hl=2 l= 13 prim: UTCTIME :110123184058Z
  61. 88:d=3 hl=2 l= 13 prim: UTCTIME :110123204058Z
  62. 103:d=2 hl=2 l= 28 cons: SEQUENCE
  63. 105:d=3 hl=2 l= 26 cons: SET
  64. 107:d=4 hl=2 l= 24 cons: SEQUENCE
  65. 109:d=5 hl=2 l= 3 prim: OBJECT :commonName
  66. 114:d=5 hl=2 l= 17 prim: PRINTABLESTRING :www.s4ku5skci.net
  67. 133:d=2 hl=3 l= 159 cons: SEQUENCE
  68. 136:d=3 hl=2 l= 13 cons: SEQUENCE
  69. 138:d=4 hl=2 l= 9 prim: OBJECT :rsaEncryption
  70. 149:d=4 hl=2 l= 0 prim: NULL
  71. 151:d=3 hl=3 l= 141 prim: BIT STRING
  72. 295:d=1 hl=2 l= 13 cons: SEQUENCE
  73. 297:d=2 hl=2 l= 9 prim: OBJECT :sha1WithRSAEncryption
  74. 308:d=2 hl=2 l= 0 prim: NULL
  75. 310:d=1 hl=3 l= 129 prim: BIT STRING
  76. I propose that the commonName field be generated to match a specific property
  77. of the server in question. It is reasonable to set the commonName element to
  78. match either the hostname of the relay, the detected IP address of the relay,
  79. or for the relay operator to override certificate generation entirely by
  80. loading a custom certificate. For custom certificates, see the Custom
  81. Certificates section.
  82. I propose that the value for the commonName field be populated with the
  83. fully qualified host name as detected by reverse and forward resolution of the
  84. IP address of the relay. If the host name is in the DNS, this host name should
  85. be set as the common name. When forward and reverse DNS is not available, I
  86. propose that the IP address alone be used.
  87. The commonName field for the issuer should be set to known issuer names,
  88. random words or omitted entirely.
  89. Since some host names may themselves trigger censorship keyword filters,
  90. it may be reasonable to provide an option to override the defaults and
  91. force certain values in the commonName field.
  92. Considerations for commonName normalization
  93. Any host name supplied for the commonName field should resolve - even if it
  94. does not resolve to the IP address of the relay[0]. If the commonName field
  95. does include an IP address, it should be the current IP address of the relay as
  96. seen by other Internet hosts.
  97. Certificate serial numbers
  98. Currently our generated certificate serial number is set to the number of
  99. seconds since the epoch at the time of the certificate's creation. I propose
  100. that we should ensure that our serial numbers are unrelated to the epoch,
  101. since the generation methods are potentially recognizable as Tor-related.
  102. Instead, I propose that we use a randomly generated number that is
  103. subsequently hashed with SHA-512 and then truncated to a length chosen at
  104. random within a finite set of bounds. The length of the serial number should be
  105. chosen randomly at certificate generation time; it should be bound between the
  106. most commonly found bit lengths[1] in the wild. Random sixteen byte values
  107. appear to be the high bound for serial number as issued by Verisign and
  108. DigiCert. RapidSSL appears to be three bytes in length. Others common byte
  109. lengths appear to be between one and four bytes. I propose that we choose a
  110. byte length that is either 3, 4, or 16 bytes at certificate generation time.
  111. This randomly generated field may now serve as a covert channel that signals to
  112. the client that the OR will not support TLS renegotiation; this means that the
  113. client can expect to perform a V3 TLS handshake setup. Otherwise, if the serial
  114. number is a reasonable time since the epoch, we should assume the OR is
  115. using an earlier protocol version and hence that it expects renegotiation.
  116. As a security note, care must be taken to ensure that supporting this
  117. covert channel will not lead to an attacker having a method to downgrade client
  118. behavior.
  119. Certificate fingerprinting issues expressed as base64 encoding
  120. It appears that all deployed Tor certificates have the following strings in
  121. common:
  122. MIIB
  123. CCA
  124. gAwIBAgIETU
  125. ANBgkqhkiG9w0BAQUFADA
  126. YDVQQDEx
  127. 3d3cu
  128. As expected these values correspond to specific ASN.1 OBJECT IDENTIFIER (OID)
  129. properties (sha1WithRSAEncryption, commonName, etc) of how we generate our
  130. certificates.
  131. As an illustrated example of the common bytes of all certificates used within
  132. the Tor network within a single one hour window, I have replaced the actual
  133. value with a wild card ('.') character here:
  134. -----BEGIN CERTIFICATE-----
  135. MIIB..CCA..gAwIBAgIETU....ANBgkqhkiG9w0BAQUFADA.M..w..YDVQQDEx.3
  136. d3cu............................................................
  137. ................................................................
  138. ................................................................
  139. ................................................................
  140. ................................................................
  141. ................................................................
  142. ................................................................
  143. ................................................................
  144. ........................... <--- Variable length and padding
  145. -----END CERTIFICATE-----
  146. This fine ascii art only illustrates the bytes that absolutely match in all
  147. cases. In many cases, it's likely that there is a high probability for a given
  148. byte to be only a small subset of choices.
  149. Using the above strings, the EFF's certificate observatory may trivially
  150. discover all known relays, known bridges and unknown bridges in a single SQL
  151. query. I propose that we ensure that we test our certificates to ensure that
  152. they do not have these kinds of statistical similarities without ensuring
  153. overlap with a very large cross section of the internet's certificates.
  154. Other certificate fields
  155. It may be advantageous to also generate values for the O, L, ST, C, and OU
  156. certificate fields. The C and ST fields may be populated from GeoIP information
  157. that is already available to Tor to reflect a plausible geographic location
  158. for the OR. The other fields should contain some semblance of a word or
  159. grouping of words. It has been suggested[2] that we should look to guides for
  160. certificate generation that use OpenSSL as a reasonable baseline for
  161. understanding these fields, as well as other certificate properties.
  162. Certificate dating and validity issues
  163. TLS certificates found in the wild are generally found to be long-lived;
  164. they are frequently old and often even expired. The current Tor certificate
  165. validity time is a very small time window starting at generation time and
  166. ending shortly thereafter, as defined in or.h by MAX_SSL_KEY_LIFETIME
  167. (2*60*60).
  168. I propose that the certificate validity time length is extended to a period of
  169. twelve Earth months, possibly with a small random skew to be determined by the
  170. implementer. Tor should randomly set the start date in the past or some
  171. currently unspecified window of time before the current date. This would
  172. more closely track the typical distribution of non-Tor TLS certificate
  173. expiration times.
  174. The certificate values, such as expiration, should not be used for anything
  175. relating to security; for example, if the OR presents an expired TLS
  176. certificate, this does not imply that the client should terminate the
  177. connection (as would be appropriate for an ordinary TLS implementation).
  178. Rather, I propose we use a TOFU style expiration policy - the certificate
  179. should never be trusted for more than a two hour window from first sighting.
  180. This policy should have two major impacts. The first is that an adversary will
  181. have to perform a differential analysis of all certificates for a given IP
  182. address rather than a single check. The second is that the server expiration
  183. time is enforced by the client and confirmed by keys rotating in the consensus.
  184. The expiration time should not be a fixed time that is simple to calculate by
  185. any Deep Packet Inspection device or it will become a new Tor TLS setup
  186. fingerprint.
  187. Custom Certificates
  188. It should be possible for a Tor relay operator to use a specifically supplied
  189. certificate and secret key. This will allow a relay or bridge operator to use a
  190. certificate signed by any member of any geographically relevant certificate
  191. authority racket; it will also allow for any other user-supplied certificate.
  192. This may be desirable in some kinds of filtered networks or when attempting to
  193. avoid attracting suspicion by blending in with the TLS web server certificate
  194. crowd.
  195. Problematic Diffie–Hellman parameters
  196. We currently send a static Diffie–Hellman parameter, prime p (or “prime p
  197. outlaw”) as specified in RFC2409 as part of the TLS Server Hello response.
  198. The use of this prime in TLS negotiations may, as a result, be filtered and
  199. effectively banned by certain networks. We do not have to use this particular
  200. prime in all cases.
  201. While amusing to have the power to make specific prime numbers into a new class
  202. of numbers (cf. imaginary, irrational, illegal [3]) - our new friend prime p
  203. outlaw is not required.
  204. The use of this prime in TLS negotiations may, as a result, be filtered and
  205. effectively banned by certain networks. We do not have to use this particular
  206. prime in all cases.
  207. I propose that the function to initialize and generate DH parameters be
  208. split into two functions.
  209. First, init_dh_param() should be used only for OR-to-OR DH setup and
  210. communication. Second, it is proposed that we create a new function
  211. init_tls_dh_param() that will have a two-stage development process.
  212. The first stage init_tls_dh_param() will use the same prime that
  213. Apache2.x [4] sends (or “dh1024_apache_p”), and this change should be
  214. made immediately. This is a known good and safe prime number (p-1 / 2
  215. is also prime) that is currently not known to be blocked.
  216. The second stage init_tls_dh_param() should randomly generate a new prime on a
  217. regular basis; this is designed to make the prime difficult to outlaw or
  218. filter. Call this a shape-shifting or "Rakshasa" prime. This should be added
  219. to the 0.2.3.x branch of Tor. This prime can be generated at setup or execution
  220. time and probably does not need to be stored on disk. Rakshasa primes only
  221. need to be generated by Tor relays as Tor clients will never send them. Such
  222. a prime should absolutely not be shared between different Tor relays nor
  223. should it ever be static after the 0.2.3.x release.
  224. As a security precaution, care must be taken to ensure that we do not generate
  225. weak primes or known filtered primes. Both weak and filtered primes will
  226. undermine the TLS connection security properties. OpenSSH solves this issue
  227. dynamically in RFC 4419 [5] and may provide a solution that works reasonably
  228. well for Tor. More research in this area including the applicability of
  229. Miller-Rabin or AKS primality tests[6] will need to be analyzed and probably
  230. added to Tor.
  231. Practical key size
  232. Currently we use a 1024 bit long RSA modulus. I propose that we increase the
  233. RSA key size to 2048 as an additional channel to signal support for the V3
  234. handshake setup. 2048 appears to be the most common key size[0] above 1024.
  235. Additionally, the increase in modulus size provides a reasonable security boost
  236. with regard to key security properties.
  237. The implementer should increase the 1024 bit RSA modulus to 2048 bits.
  238. Possible future filtering nightmares
  239. At some point it may cost effective or politically feasible for a network
  240. filter to simply block all signed or self-signed certificates without a known
  241. valid CA trust chain. This will break many applications on the internet and
  242. hopefully, our option for custom certificates will ensure that this step is
  243. simply avoided by the censors.
  244. The Rakshasa prime approach may cause censors to specifically allow only
  245. certain known and accepted DH parameters.
  246. Appendix: Other issues
  247. What other obvious TLS certificate issues exist? What other static values are
  248. present in the Tor TLS setup process?
  249. [0] http://archives.seul.org/or/dev/Jan-2011/msg00051.html
  250. [1] http://archives.seul.org/or/dev/Feb-2011/msg00016.html
  251. [2] http://archives.seul.org/or/dev/Feb-2011/msg00039.html
  252. [3] To be fair this is hardly a new class of numbers. History is rife with
  253. similar examples of inane authoritarian attempts at mathematical secrecy.
  254. Probably the most dramatic example is the story of the pupil Hipassus of
  255. Metapontum, pupil of the famous Pythagoras, who, legend goes, proved the
  256. fact that Root2 cannot be expressed as a fraction of whole numbers (now
  257. called an irrational number) and was assassinated for revealing this
  258. secret. Further reading on the subject may be found on the Wikipedia:
  259. http://en.wikipedia.org/wiki/Hippasus
  260. [4] httpd-2.2.17/modules/ss/ssl_engine_dh.c
  261. [5] http://tools.ietf.org/html/rfc4419
  262. [6] http://archives.seul.org/or/dev/Jan-2011/msg00037.html