7 years ago · a229d6c2f8
--- a/doc/HACKING/CodingStandardsRust.md
+++ b/doc/HACKING/CodingStandardsRust.md
@@ -0,0 +1,469 @@
 
				+
			
 
				+ Rust Coding Standards
			
 
				+=======================
			
 
				+
			
 
				+You MUST follow the standards laid out in `.../doc/HACKING/CodingStandards.md`,
			
 
				+where applicable.
			
 
				+
			
 
				+ Module/Crate Declarations
			
 
				+---------------------------
			
 
				+
			
 
				+Each Tor C module which is being rewritten MUST be in its own crate.
			
 
				+See the structure of `.../src/rust` for examples.
			
 
				+
			
 
				+In your crate, you MUST use `lib.rs` ONLY for pulling in external
			
 
				+crates (e.g. `extern crate libc;`) and exporting public objects from
			
 
				+other Rust modules (e.g. `pub use mymodule::foo;`).  For example, if
			
 
				+you create a crate in `.../src/rust/yourcrate`, your Rust code should
			
 
				+live in `.../src/rust/yourcrate/yourcode.rs` and the public interface
			
 
				+to it should be exported in `.../src/rust/yourcrate/lib.rs`.
			
 
				+
			
 
				+If your code is to be called from Tor C code, you MUST define a safe
			
 
				+`ffi.rs`.  See the "Safety" section further down for more details.
			
 
				+
			
 
				+For example, in a hypothetical `tor_addition` Rust module:
			
 
				+
			
 
				+In `.../src/rust/tor_addition/addition.rs`:
			
 
				+
			
 
				+    pub fn get_sum(a: i32, b: i32) -> i32 {
			
 
				+        a + b
			
 
				+    }
			
 
				+
			
 
				+In `.../src/rust/tor_addition/lib.rs`:
			
 
				+
			
 
				+    pub use addition::*;
			
 
				+
			
 
				+In `.../src/rust/tor_addition/ffi.rs`:
			
 
				+
			
 
				+    #[no_mangle]
			
 
				+    pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int {
			
 
				+        get_sum(a, b)
			
 
				+    }
			
 
				+
			
 
				+If your Rust code must call out to parts of Tor's C code, you must
			
 
				+declare the functions you are calling in the `external` crate, located
			
 
				+at `.../src/rust/external`.
			
 
				+
			
 
				+<!-- XXX get better examples of how to declare these externs, when/how they -->
			
 
				+<!-- XXX are unsafe, what they are expected to do —isis                     -->
			
 
				+
			
 
				+Modules should strive to be below 500 lines (tests excluded). Single
			
 
				+responsibility and limited dependencies should be a guiding standard.
			
 
				+
			
 
				+If you have any external modules as dependencies (e.g. `extern crate
			
 
				+libc;`), you MUST declare them in your crate's `lib.rs` and NOT in any
			
 
				+other module.
			
 
				+
			
 
				+ Dependencies
			
 
				+--------------
			
 
				+
			
 
				+In general, we use modules from only the Rust standard library
			
 
				+whenever possible. We will review including external crates on a
			
 
				+case-by-case basis.
			
 
				+
			
 
				+ Documentation
			
 
				+---------------
			
 
				+
			
 
				+You MUST include `#[deny(missing_docs)]` in your crate.
			
 
				+
			
 
				+For function/method comments, you SHOULD include a one-sentence, "first person"
			
 
				+description of function behaviour (see requirements for documentation as
			
 
				+described in `.../src/HACKING/CodingStandards.md`), then an `# Inputs` section
			
 
				+for inputs or initialisation values, a `# Returns` section for return
			
 
				+values/types, a `# Warning` section containing warnings for unsafe behaviours or
			
 
				+panics that could happen.  For publicly accessible
			
 
				+types/constants/objects/functions/methods, you SHOULD also include an
			
 
				+`# Examples` section with runnable doctests.
			
 
				+
			
 
				+You MUST document your module with _module docstring_ comments,
			
 
				+i.e. `//!` at the beginning of each line.
			
 
				+
			
 
				+ Style
			
 
				+-------
			
 
				+
			
 
				+You SHOULD consider breaking up large literal numbers with `_` when it makes it
			
 
				+more human readable to do so, e.g. `let x: u64 = 100_000_000_000`.
			
 
				+
			
 
				+ Testing
			
 
				+---------
			
 
				+
			
 
				+All code MUST be unittested and integration tested.
			
 
				+
			
 
				+Public functions/objects exported from a crate SHOULD include doctests
			
 
				+describing how the function/object is expected to be used.
			
 
				+
			
 
				+Integration tests SHOULD go into a `tests/` directory inside your
			
 
				+crate.  Unittests SHOULD go into their own module inside the module
			
 
				+they are testing, e.g. in `.../src/rust/tor_addition/addition.rs` you
			
 
				+should put:
			
 
				+
			
 
				+    #[cfg(test)]
			
 
				+    mod test {
			
 
				+        use super::*;
			
 
				+
			
 
				+        #[test]
			
 
				+        fn addition_with_zero() {
			
 
				+            let sum: i32 = get_sum(5i32, 0i32);
			
 
				+            assert_eq!(sum, 5);
			
 
				+        }
			
 
				+    }
			
 
				+
			
 
				+ Benchmarking
			
 
				+--------------
			
 
				+
			
 
				+The external `test` crate can be used for most benchmarking.  However, using
			
 
				+this crate requires nightly Rust.  Since we may want to switch to a more
			
 
				+stable Rust compiler eventually, we shouldn't do things which will automatically
			
 
				+break builds for stable compilers.  Therefore, you MUST feature-gate your
			
 
				+benchmarks in the following manner.
			
 
				+
			
 
				+If you wish to benchmark some of your Rust code, you MUST put the
			
 
				+following in the `[features]` section of your crate's `Cargo.toml`:
			
 
				+
			
 
				+    [features]
			
 
				+    bench = []
			
 
				+
			
 
				+Next, in your crate's `lib.rs` you MUST put:
			
 
				+
			
 
				+    #[cfg(all(test, feature = "bench"))]
			
 
				+    extern crate test;
			
 
				+
			
 
				+This ensures that the external crate `test`, which contains utilities
			
 
				+for basic benchmarks, is only used when running benchmarks via `cargo
			
 
				+bench --features bench`.
			
 
				+
			
 
				+Finally, to write your benchmark code, in
			
 
				+`.../src/rust/tor_addition/addition.rs` you SHOULD put:
			
 
				+
			
 
				+    #[cfg(all(test, features = "bench"))]
			
 
				+    mod bench {
			
 
				+        use test::Bencher;
			
 
				+        use super::*;
			
 
				+
			
 
				+        #[bench]
			
 
				+        fn addition_small_integers(b: &mut Bencher) {
			
 
				+            b.iter(| | get_sum(5i32, 0i32));
			
 
				+        }
			
 
				+    }
			
 
				+
			
 
				+ Fuzzing
			
 
				+---------
			
 
				+
			
 
				+If you wish to fuzz parts of your code, please see the
			
 
				+[`cargo fuzz`](https://github.com/rust-fuzz/cargo-fuzz) crate, which uses
			
 
				+[libfuzzer-sys](https://github.com/rust-fuzz/libfuzzer-sys).
			
 
				+
			
 
				+ Whitespace & Formatting
			
 
				+-------------------------
			
 
				+
			
 
				+You MUST run `rustfmt` (https://github.com/rust-lang-nursery/rustfmt)
			
 
				+on your code before your code will be merged.  You can install rustfmt
			
 
				+by doing `cargo install rustfmt-nightly` and then run it with `cargo
			
 
				+fmt`.
			
 
				+
			
 
				+ Safety
			
 
				+--------
			
 
				+
			
 
				+You SHOULD read [the nomicon](https://doc.rust-lang.org/nomicon/) before writing
			
 
				+Rust FFI code.  It is *highly advised* that you read and write normal Rust code
			
 
				+before attempting to write FFI or any other unsafe code.
			
 
				+
			
 
				+Here are some additional bits of advice and rules:
			
 
				+
			
 
				+0. Any behaviours which Rust considers to be undefined are forbidden
			
 
				+
			
 
				+   From https://doc.rust-lang.org/reference/behavior-considered-undefined.html:
			
 
				+
			
 
				+   > Behavior considered undefined
			
 
				+   >
			
 
				+   > The following is a list of behavior which is forbidden in all Rust code,
			
 
				+   > including within unsafe blocks and unsafe functions. Type checking provides the
			
 
				+   > guarantee that these issues are never caused by safe code.
			
 
				+   > 
			
 
				+   > * Data races
			
 
				+   > * Dereferencing a null/dangling raw pointer
			
 
				+   > * Reads of [undef](http://llvm.org/docs/LangRef.html#undefined-values)
			
 
				+   >   (uninitialized) memory
			
 
				+   > * Breaking the
			
 
				+   >   [pointer aliasing rules](http://llvm.org/docs/LangRef.html#pointer-aliasing-rules)
			
 
				+   >   with raw pointers (a subset of the rules used by C)
			
 
				+   > * `&mut T` and `&T` follow LLVM’s scoped noalias model, except if the `&T`
			
 
				+   >   contains an `UnsafeCell<U>`. Unsafe code must not violate these aliasing
			
 
				+   >   guarantees.
			
 
				+   > * Mutating non-mutable data (that is, data reached through a shared
			
 
				+   >   reference or data owned by a `let` binding), unless that data is
			
 
				+   >   contained within an `UnsafeCell<U>`.
			
 
				+   > * Invoking undefined behavior via compiler intrinsics:
			
 
				+   >     - Indexing outside of the bounds of an object with
			
 
				+   >       `std::ptr::offset` (`offset` intrinsic), with the exception of
			
 
				+   >       one byte past the end which is permitted.
			
 
				+   >     - Using `std::ptr::copy_nonoverlapping_memory` (`memcpy32`/`memcpy64`
			
 
				+   >       intrinsics) on overlapping buffers
			
 
				+   > * Invalid values in primitive types, even in private fields/locals:
			
 
				+   >     - Dangling/null references or boxes
			
 
				+   >     - A value other than `false` (0) or `true` (1) in a `bool`
			
 
				+   >     - A discriminant in an `enum` not included in the type definition
			
 
				+   >     - A value in a `char` which is a surrogate or above `char::MAX`
			
 
				+   >     - Non-UTF-8 byte sequences in a `str`
			
 
				+   > * Unwinding into Rust from foreign code or unwinding from Rust into foreign
			
 
				+   >   code. Rust's failure system is not compatible with exception handling in other
			
 
				+   >   languages. Unwinding must be caught and handled at FFI boundaries.
			
 
				+
			
 
				+1. `unwrap()`
			
 
				+
			
 
				+   If you call `unwrap()`, anywhere, even in a test, you MUST include
			
 
				+   an inline comment stating how the unwrap will either 1) never fail,
			
 
				+   or 2) should fail (i.e. in a unittest).
			
 
				+
			
 
				+   You SHOULD NOT use `unwrap()` anywhere in which it is possible to handle the
			
 
				+   potential error with either `expect()` or the eel operator, `?`.
			
 
				+   For example, consider a function which parses a string into an integer:
			
 
				+
			
 
				+        fn parse_port_number(config_string: &str) -> u16 {
			
 
				+            u16::from_str_radix(config_string, 10).unwrap()
			
 
				+        }
			
 
				+
			
 
				+   There are numerous ways this can fail, and the `unwrap()` will cause the
			
 
				+   whole program to byte the dust!  Instead, either you SHOULD use `expect()`
			
 
				+   (or another equivalent function which will return an `Option` or a `Result`)
			
 
				+   and change the return type to be compatible:
			
 
				+
			
 
				+        fn parse_port_number(config_string: &str) -> Option<u16> {
			
 
				+            u16::from_str_radix(config_string, 10).expect("Couldn't parse port into a u16")
			
 
				+        }
			
 
				+
			
 
				+   or you SHOULD use `or()` (or another similar method):
			
 
				+
			
 
				+        fn parse_port_number(config_string: &str) -> Option<u16> {
			
 
				+            u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16")
			
 
				+        }
			
 
				+
			
 
				+   Using methods like `or()` can be particularly handy when you must do
			
 
				+   something afterwards with the data, for example, if we wanted to guarantee
			
 
				+   that the port is high.  Combining these methods with the eel operator (`?`)
			
 
				+   makes this even easier:
			
 
				+
			
 
				+        fn parse_port_number(config_string: &str) -> Result<u16, Err> {
			
 
				+            let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?;
			
 
				+
			
 
				+            if port > 1024 {
			
 
				+                return Ok(port);
			
 
				+            } else {
			
 
				+                return Err("Low ports not allowed");
			
 
				+            }
			
 
				+        }
			
 
				+
			
 
				+2. `unsafe`
			
 
				+
			
 
				+   If you use `unsafe`, you MUST describe a contract in your
			
 
				+   documentation which describes how and when the unsafe code may
			
 
				+   fail, and what expectations are made w.r.t. the interfaces to
			
 
				+   unsafe code.  This is also REQUIRED for major pieces of FFI between
			
 
				+   C and Rust.
			
 
				+
			
 
				+   When creating an FFI in Rust for C code to call, it is NOT REQUIRED
			
 
				+   to declare the entire function `unsafe`.  For example, rather than doing:
			
 
				+
			
 
				+        #[no_mangle]
			
 
				+        pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
			
 
				+            for number in &mut numbers {
			
 
				+                *number += 1;
			
 
				+            }
			
 
				+            std::mem::transmute::<[u8; 4], u32>(numbers)
			
 
				+        }
			
 
				+
			
 
				+   You SHOULD instead do:
			
 
				+
			
 
				+        #[no_mangle]
			
 
				+        pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
			
 
				+            for index in 0..numbers.len() {
			
 
				+                numbers[index] += 1;
			
 
				+            }
			
 
				+            unsafe {
			
 
				+                std::mem::transmute::<[u8; 4], u32>(numbers)
			
 
				+            }
			
 
				+        }
			
 
				+
			
 
				+3. Pass only integer types and bytes over the boundary
			
 
				+
			
 
				+   The only non-integer type which may cross the FFI boundary is
			
 
				+   bytes, e.g. `&[u8]`.  This SHOULD be done on the Rust side by
			
 
				+   passing a pointer (`*mut libc::c_char`) and a length
			
 
				+   (`libc::size_t`).
			
 
				+
			
 
				+   One might be tempted to do this via doing
			
 
				+   `CString::new("blah").unwrap().into_raw()`. This has several problems:
			
 
				+
			
 
				+   a) If you do `CString::new("bl\x00ah")` then the unwrap() will fail
			
 
				+      due to the additional NULL terminator, causing a dangling
			
 
				+      pointer to be returned (as well as a potential use-after-free).
			
 
				+
			
 
				+   b) Returning the raw pointer will cause the CString to run its deallocator,
			
 
				+      which causes any C code which tries to access the contents to dereference a
			
 
				+      NULL pointer.
			
 
				+
			
 
				+   c) If we were to do `as_raw()` this would result in a potential double-free
			
 
				+      since the Rust deallocator would run and possibly Tor's deallocator.
			
 
				+
			
 
				+   d) Calling `into_raw()` without later using the same pointer in Rust to call
			
 
				+      `from_raw()` and then deallocate in Rust can result in a
			
 
				+      [memory leak](https://doc.rust-lang.org/std/ffi/struct.CString.html#method.into_raw).
			
 
				+
			
 
				+      [It was determined](https://github.com/rust-lang/rust/pull/41074) that this
			
 
				+      is safe to do if you use the same allocator in C and Rust and also specify
			
 
				+      the memory alignment for CString (except that there is no way to specify
			
 
				+      the alignment for CString).  It is believed that the alignment is always 1,
			
 
				+      which would mean it's safe to dealloc the resulting `*mut c_char` in Tor's
			
 
				+      C code.  However, the Rust developers are not willing to guarantee the
			
 
				+      stability of, or a contract for, this behaviour, citing concerns that this
			
 
				+      is potentially extremely and subtly unsafe.
			
 
				+
			
 
				+4. Perform an allocation on the other side of the boundary
			
 
				+
			
 
				+   After crossing the boundary, the other side MUST perform an
			
 
				+   allocation to copy the data and is therefore responsible for
			
 
				+   freeing that memory later.
			
 
				+
			
 
				+5. No touching other language's enums
			
 
				+
			
 
				+   Rust enums should never be touched from C (nor can they be safely
			
 
				+   `#[repr(C)]`) nor vice versa:
			
 
				+
			
 
				+   >  "The chosen size is the default enum size for the target platform's C
			
 
				+   >  ABI. Note that enum representation in C is implementation defined, so this is
			
 
				+   >  really a "best guess". In particular, this may be incorrect when the C code
			
 
				+   >  of interest is compiled with certain flags."
			
 
				+
			
 
				+   (from https://gankro.github.io/nomicon/other-reprs.html)
			
 
				+
			
 
				+6. Type safety
			
 
				+
			
 
				+   Wherever possible and sensical, you SHOULD create new types in a
			
 
				+   manner which prevents type confusion or misuse.  For example,
			
 
				+   rather than using an untyped mapping between strings and integers
			
 
				+   like so:
			
 
				+
			
 
				+        use std::collections::HashMap;
			
 
				+
			
 
				+        pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> {
			
 
				+            ...
			
 
				+        }
			
 
				+
			
 
				+   It would be safer to define a new type, such that some other usage
			
 
				+   of `HashMap<String, usize>` cannot be confused for this type:
			
 
				+
			
 
				+        pub struct DragonBallZPowers(pub HashMap<String, usize>);
			
 
				+
			
 
				+        impl DragonBallZPowers {
			
 
				+            pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> {
			
 
				+                let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5);
			
 
				+
			
 
				+                for (character, power) in &self.0 {
			
 
				+                    if *power > 9000 {
			
 
				+                        powerful_enough.push(character);
			
 
				+                    }
			
 
				+                }
			
 
				+                powerful_enough
			
 
				+            }
			
 
				+        }
			
 
				+
			
 
				+   Note the following code, which uses Rust's type aliasing, is valid
			
 
				+   but it does NOT meet the desired type safety goals:
			
 
				+
			
 
				+        pub type Power = usize;
			
 
				+
			
 
				+        pub fn over_nine_thousand(power: &Power) -> bool {
			
 
				+            if *power > 9000 {
			
 
				+                return true;
			
 
				+            }
			
 
				+            false
			
 
				+        }
			
 
				+
			
 
				+        // We can still do the following:
			
 
				+        let his_power: usize = 9001;
			
 
				+        over_nine_thousand(&his_power);
			
 
				+
			
 
				+7. Unsafe mucking around with lifetimes
			
 
				+
			
 
				+   Because lifetimes are technically, in type theory terms, a kind, i.e. a
			
 
				+   family of types, individual lifetimes can be treated as types.  For example,
			
 
				+   one can arbitrarily extend and shorten lifetime using `std::mem::transmute`:
			
 
				+
			
 
				+        struct R<'a>(&'a i32);
			
 
				+
			
 
				+        unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> {
			
 
				+            std::mem::transmute::<R<'b>, R<'static>>(r)
			
 
				+        }
			
 
				+
			
 
				+        unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> {
			
 
				+            std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r)
			
 
				+        }
			
 
				+
			
 
				+   Calling `extend_lifetime()` would cause an `R` passed into it to live forever
			
 
				+   for the life of the program (the `'static` lifetime).  Similarly,
			
 
				+   `shorten_invariant_lifetime()` could be used to take something meant to live
			
 
				+   forever, and cause it to disappear!  This is incredibly unsafe.  If you're
			
 
				+   going to be mucking around with lifetimes like this, first, you better have
			
 
				+   an extremely good reason, and second, you may as be honest and explicit about
			
 
				+   it, and for ferris' sake just use a raw pointer.
			
 
				+
			
 
				+   In short, just because lifetimes can be treated like types doesn't mean you
			
 
				+   should do it.
			
 
				+
			
 
				+8. Doing excessively unsafe things when there's a safer alternative
			
 
				+
			
 
				+   Similarly to #7, often there are excessively unsafe ways to do a task and a
			
 
				+   simpler, safer way.  You MUST choose the safer option where possible.
			
 
				+
			
 
				+   For example, `std::mem::transmute` can be abused in ways where casting with
			
 
				+   `as` would be both simpler and safer:
			
 
				+
			
 
				+        // Don't do this
			
 
				+        let ptr = &0;
			
 
				+        let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)};
			
 
				+
			
 
				+        // Use an `as` cast instead
			
 
				+        let ptr_num_cast = ptr as *const i32 as usize;
			
 
				+
			
 
				+   In fact, using `std::mem::transmute` for *any* reason is a code smell and as
			
 
				+   such SHOULD be avoided.
			
 
				+
			
 
				+9. Casting integers with `as`
			
 
				+
			
 
				+   This is generally fine to do, but it has some behaviours which you should be
			
 
				+   aware of.  Casting down chops off the high bits, e.g.:
			
 
				+
			
 
				+        let x: u32 = 4294967295;
			
 
				+        println!("{}", x as u16); // prints 65535
			
 
				+
			
 
				+   Some cases which you MUST NOT do include:
			
 
				+
			
 
				+   * Casting an `u128` down to an `f32` or vice versa (e.g.
			
 
				+     `u128::MAX as f32` but this isn't only a problem with overflowing
			
 
				+     as it is also undefined behaviour for `42.0f32 as u128`),
			
 
				+
			
 
				+   * Casting between integers and floats when the thing being cast
			
 
				+     cannot fit into the type it is being casted into, e.g.:
			
 
				+
			
 
				+         println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release
			
 
				+         println!("{}", 1.04E+17 as u8);   // prints 0 in both modes
			
 
				+         println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants
			
 
				+
			
 
				+     Because this behaviour is undefined, it can even produce segfaults in
			
 
				+     safe Rust code.  For example, the following program built in release
			
 
				+     mode segfaults:
			
 
				+
			
 
				+         #[inline(never)]
			
 
				+         pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] {
			
 
				+             // Note that the float is out of the range of `usize`, invoking UB when casting.
			
 
				+             let idx = 1e99999f64 as usize;
			
 
				+             &sl[idx..] // The bound check is elided due to `idx` being of an undefined value.
			
 
				+         }
			
 
				+
			
 
				+         fn main() {
			
 
				+             println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound
			
 
				+         }
			
 
				+
			
 
				+      And in debug mode panics with:
			
 
				+
			
 
				+         thread 'main' panicked at 'slice index starts at 140721821254240 but ends at 666', /checkout/src/libcore/slice/mod.rs:754:4
			
--- a/doc/HACKING/GettingStartedRust.md
+++ b/doc/HACKING/GettingStartedRust.md
@@ -0,0 +1,158 @@
 
				+
			
 
				+ Hacking on Rust in Tor
			
 
				+========================
			
 
				+
			
 
				+ Getting Started
			
 
				+-----------------
			
 
				+
			
 
				+Please read or review our documentation on Rust coding standards
			
 
				+(`.../doc/HACKING/CodingStandardsRust.md`) before doing anything.
			
 
				+
			
 
				+Please also read
			
 
				+[the Rust Code of Conduct](https://www.rust-lang.org/en-US/conduct.html). We aim
			
 
				+to follow the good example set by the Rust community and be excellent to one
			
 
				+another.  Let's be careful with each other, so we can be memory-safe together!
			
 
				+
			
 
				+Next, please contact us before rewriting anything!  Rust in Tor is still an
			
 
				+experiment.  It is an experiment that we very much want to see succeed, so we're
			
 
				+going slowly and carefully.  For the moment, it's also a completely
			
 
				+volunteer-driven effort: while many, if not most, of us are paid to work on Tor,
			
 
				+we are not yet funded to write Rust code for Tor.  Please be patient with the
			
 
				+other people who are working on getting more Rust code into Tor, because they
			
 
				+are graciously donating their free time to contribute to this effort.
			
 
				+
			
 
				+ Resources for learning Rust
			
 
				+-----------------------------
			
 
				+
			
 
				+**Beginning resources**
			
 
				+
			
 
				+The primary resource for learning Rust is
			
 
				+[The Book](https://doc.rust-lang.org/book/).  If you'd like to start writing
			
 
				+Rust immediately, without waiting for anything to install, there is
			
 
				+[an interactive browser-based playground](https://play.rust-lang.org/).
			
 
				+
			
 
				+**Advanced resources**
			
 
				+
			
 
				+If you're interested in playing with various Rust compilers and viewing a very
			
 
				+nicely displayed output of the generated assembly, there is
			
 
				+[the Godbolt compiler explorer](https://rust.godbolt.org/)
			
 
				+
			
 
				+For learning how to write unsafe Rust, read
			
 
				+[The Rustonomicon](https://doc.rust-lang.org/nomicon/).
			
 
				+
			
 
				+For learning everything you ever wanted to know about Rust macros, there is
			
 
				+[The Little Book of Rust Macros](https://danielkeep.github.io/tlborm/book/index.html).
			
 
				+
			
 
				+ Compiling Tor with Rust enabled
			
 
				+---------------------------------
			
 
				+
			
 
				+You will need to run the `configure` script with the `--enable-rust` flag to
			
 
				+explicitly build with Rust. Additionally, you will need to specify where to
			
 
				+fetch Rust dependencies, as we allow for either fetching dependencies from Cargo
			
 
				+or specifying a local directory.
			
 
				+
			
 
				+**Fetch dependencies from Cargo**
			
 
				+
			
 
				+    ./configure --enable-rust --enable-cargo-online-mode
			
 
				+
			
 
				+**Using a local dependency cache**
			
 
				+
			
 
				+**NOTE**: local dependency caches which were not *originally* created via
			
 
				+  `--enable-cargo-online-mode` are broken. See https://bugs.torproject.org/22907
			
 
				+
			
 
				+To specify a local directory:
			
 
				+
			
 
				+    RUST_DEPENDENCIES='path_to_dependencies_directory' ./configure --enable-rust
			
 
				+
			
 
				+(Note that RUST_DEPENDENCIES must be the full path to the directory; it cannot
			
 
				+be relative.)
			
 
				+
			
 
				+You'll need the following Rust dependencies (as of this writing):
			
 
				+
			
 
				+    libc==0.2.22
			
 
				+
			
 
				+To get them, do:
			
 
				+
			
 
				+    mkdir path_to_dependencies_directory
			
 
				+    cd path_to_dependencies_directory
			
 
				+    git clone https://github.com/rust-lang/libc
			
 
				+    cd libc
			
 
				+    git checkout 0.2.22
			
 
				+    cargo package
			
 
				+    cd ..
			
 
				+    ln -s libc/target/package/libc-0.2.22 libc-0.2.22
			
 
				+
			
 
				+
			
 
				+ Identifying which modules to rewrite
			
 
				+======================================
			
 
				+
			
 
				+The places in the Tor codebase that are good candidates for porting to Rust are:
			
 
				+
			
 
				+1. loosely coupled to other Tor submodules,
			
 
				+2. have high test coverage, and
			
 
				+3. would benefit from being implemented in a memory safe language.
			
 
				+
			
 
				+Help in either identifying places such as this, or working to improve existing
			
 
				+areas of the C codebase by adding regression tests and simplifying dependencies,
			
 
				+would be really helpful.
			
 
				+
			
 
				+Furthermore, as submodules in C are implemented in Rust, this is a good
			
 
				+opportunity to refactor, add more tests, and split modules into smaller areas of
			
 
				+responsibility.
			
 
				+
			
 
				+A good first step is to build a module-level callgraph to understand how
			
 
				+interconnected your target module is.
			
 
				+
			
 
				+    git clone https://git.torproject.org/user/nickm/calltool.git
			
 
				+    cd tor
			
 
				+    CFLAGS=0 ./configure
			
 
				+    ../calltool/src/main.py module_callgraph
			
 
				+
			
 
				+The output will tell you each module name, along with a set of every module that
			
 
				+the module calls.  Modules which call fewer other modules are better targets.
			
 
				+
			
 
				+ Writing your Rust module
			
 
				+==========================
			
 
				+
			
 
				+Strive to change the C API as little as possible.
			
 
				+
			
 
				+We are currently targetting Rust nightly, *for now*. We expect this to change
			
 
				+moving forward, as we understand more about which nightly features we need. It
			
 
				+is on our TODO list to try to cultivate good standing with various distro
			
 
				+maintainers of `rustc` and `cargo`, in order to ensure that whatever version we
			
 
				+solidify on is readily available.
			
 
				+
			
 
				+ Adding your Rust module to Tor's build system
			
 
				+-----------------------------------------------
			
 
				+
			
 
				+0. Your translation of the C module should live in its own crate(s)
			
 
				+   in the `.../tor/src/rust/` directory.
			
 
				+1. Add your crate to `.../tor/src/rust/Cargo.toml`, in the
			
 
				+   `[workspace.members]` section.
			
 
				+2. Append your crate's static library to the `rust_ldadd` definition
			
 
				+   (underneath `if USE_RUST`) in `.../tor/Makefile.am`.
			
 
				+
			
 
				+ How to test your Rust code
			
 
				+----------------------------
			
 
				+
			
 
				+Everything should be tested full stop.  Even non-public functionality.
			
 
				+
			
 
				+Be sure to edit `.../tor/src/test/test_rust.sh` to add the name of your crate to
			
 
				+the `crates` variable! This will ensure that `cargo test` is run on your crate.
			
 
				+
			
 
				+Configure Tor's build system to build with Rust enabled:
			
 
				+
			
 
				+    ./configure --enable-fatal-warnings --enable-rust --enable-cargo-online-mode
			
 
				+
			
 
				+Tor's test should be run by doing:
			
 
				+
			
 
				+    make check
			
 
				+
			
 
				+Tor's integration tests should also pass:
			
 
				+
			
 
				+    make test-stem
			
 
				+
			
 
				+ Submitting a patch
			
 
				+=====================
			
 
				+
			
 
				+Please follow the instructions in `.../doc/HACKING/GettingStarted.md`.