| 1234567891011121314151617181920212223242526272829303132333435363738394041424344 | 
							- 1. Scanning process
 
-    A. Non-HTML/JS HTTP mime types compared via SHA1 hash
 
-    B. Dynamic HTTP content filtered at 4 levels:
 
-       1. IP change+Tor cookie utilization
 
-          - Tor cookies replayed with new IP in case of changes
 
-       2. HTML Tag+Attribute+JS comparison
 
-          - Comparisons made based only on "relevant" HTML tags
 
-            and attributes 
 
-       3. HTML Tag+Attribute+JS diffing
 
-          - Tags, attributes and JS AST nodes that change during
 
-            Non-Tor fetches pruned from comparison
 
-       4. URLS with > N% of node failures removed
 
-          - results purged from filesystem at end of scan loop
 
-    C. SSL scanning handles some forms of dynamic certs
 
-       1. Catalogs certs for all IPs resolved locally
 
-          by getaddrinfo over the duration of the scan. 
 
-          - Updated each test.
 
-       2. If the domain presents a new cert for each IP, this
 
-          is noted on the failure result for the node
 
-       3. If the same IP presents two different certs locally,
 
-          the cert list is first refreshed, and if it happens
 
-          again, discarded
 
-       4. A N% node failure filter also applies
 
-    D. Scanner can be restarted from any point in the event
 
-       of scanner or system crashes, or graceful shutdown.
 
-       - Results+scan state pickled to filesystem continuously
 
- 2. Cron job checks results periodically for reporting
 
-    A. Divide failures into three types of BadExit based on type
 
-       and frequency over time and incident rate
 
-    B. write reject lines to approved-routers for those three types:
 
-       1. ID Hex based (for misconfig/network problems easily fixed)
 
-       2. IP based (for content modification)
 
-       3. IP+mask based (for continuous/egregious content modification)
 
-    C. Emails results to tor-scanners@freehaven.net
 
- 3. Human Review and Appeal
 
-    A. ID Hex-based BadExit is meant to be possible to removed easily
 
-       without needing to beg us.
 
-       - Should this behavior be encouraged? 
 
-    B. Optionally can reserve IP based badexits for human review
 
-       1. Results are encapsulated fully on the filesystem and can be
 
-          reviewed without network access
 
-       2. Soat has --rescan to rescan failed nodes from a data directory
 
-          - New set of URLs used
 
 
  |