Reasoning About Types

    2024-10-24 (last edit: 2022-11-07)

    Type traits

    Traits are a way to defined common behavior between different types. They can be compared to interfaces from many other mainstream languages or to type classes from Haskell, however, Rust is not an object-oriented language and there are some notable differences between type traits and Java interfaces.

    The way we describe behavior in Rust is through methods. Traits consist of a set of these methods which then should be implemented by a type. We've already encountered examples of these, like the Clone trait which specified that the clone() method can be called on some given type. Now, let's take a deeper look and try defining our own trait.

    #![allow(dead_code)]
    
    trait Summary {
        fn summarize(&self) -> String;
    }
    
    struct NewsArticle {
        headline: String,
        location: String,
        author: String,
        content: String,
    }
    
    impl Summary for NewsArticle {
        fn summarize(&self) -> String {
            format!("{}, by {} ({})", self.headline, self.author, self.location)
        }
    }
    
    struct Tweet {
        username: String,
        content: String,
    }
    
    impl Summary for Tweet {
        fn summarize(&self) -> String {
            format!("{}: {}", self.username, self.content)
        }
    }
    
    fn main() {
        let tweet = Tweet {
            username: String::from("horse_ebooks"),
            content: String::from("of course, as you probably already know, people"),
        };
    
        println!("1 new tweet: {}", tweet.summarize());
    }
    
    

    (Download the source code for this example: basic_trait.rs)

    Default implementations

    Trait definitions can also be provided with default implementations of behaviors.

    #![allow(dead_code)]
    
    struct Upload {
        filename: String,
    }
    
    #[allow(dead_code)]
    struct Photo {
        filename: String,
        width: u32,
        height: u32,
    }
    
    trait Description {
        fn describe(&self) -> String {
            String::from("No description available.")
        }
    }
    
    // All default implementations
    impl Description for Upload {}
    
    // Default implementations can be overwritten
    impl Description for Photo {
        fn describe(&self) -> String {
            format!("{} ({} x {})", self.filename, self.width, self.height)
        }
    }
    
    // Default implementations can rely on methods with no defaults
    trait Size {
        fn width(&self) -> u32;
        fn height(&self) -> u32;
    
        fn size(&self) -> u32 {
            self.width() * self.height()
        }
    }
    
    impl Size for Photo {
        fn width(&self) -> u32 {
            self.width
        }
    
        fn height(&self) -> u32 {
            self.height
        }
    
        // Using default impl of `size()`
    }
    
    fn main() {
        let upload = Upload {
            filename: String::from("notes.txt"),
        };
    
        println!("Upload: {}", upload.describe());
    
        let photo = Photo {
            filename: String::from("stock_crustacean.png"),
            width: 100,
            height: 150,
        };
    
        println!("Photo: {}", photo.describe());
        println!("Size: {}", photo.size());
    }
    
    

    (Download the source code for this example: trait_default.rs)

    What about derive?

    There is a trait-related thing we have used quite extensively and not explained yet, namely the #[derive] attribute. What it does is generate items (in our case a trait implementation) based on the given data definition (here a struct). Below you can find a list of derivable traits from the standard library. Writing derivation rules for user defined traits is also possible, but goes out of the scope of this lesson.

    Derivable traits:

    • Equality traits: Eq, PartialEq and comparison traits: Ord and PartialOrd. The Partial- versions exist because there are types which don't fulfill the reflexivity requirement of equality (NaN != NaN) or do not form a total order ( NaN < 0.0 == false and NaN >= 0.0 == false).

    • Data duplication traits: Clone and Copy

    • Hash - allows using values of that type as keys in a hashmap

    • Default - provides a zero-arg constructor function

    • Debug - provides a formatting of the value which can be used in debugging context. It should NOT be implemented manually. In general, if it's possible to derive the Debug, there are no reasons against doing it.

    When is it possible to derive a trait?

    When all fields of a struct/variants of an enum implement that trait.

    Should all traits always be derived if it is possible?

    No. Although it may be tempting to just slap #[derive(Clone, Copy)] everywhere, it would be counter-effective. For example, at some later point you might add a non-Copy field to the struct and your (or, what's worse, someone else's!) code would break. Another example: it makes little sense to use containers as keys in hashmaps or to compare tweets.

    Generics

    Suppose we want to find the largest element in a sequence and return it. Very much on purpose, we didn't specify what type these elements would be - ideally, we would love it to work on all types that have a defined notion of a largest element. However, to make things simpler for now, let's focus only on two primitive types: i32 and char. Let's try to write the code:

    fn largest_i32(list: &[i32]) -> i32 {
        let mut largest = list[0];
    
        for &item in list {
            if item > largest {
                largest = item;
            }
        }
    
        largest
    }
    
    fn largest_char(list: &[char]) -> char {
        let mut largest = list[0];
    
        for &item in list {
            if item > largest {
                largest = item;
            }
        }
    
        largest
    }
    
    fn main() {
        let number_list = vec![34, 50, 25, 100, 65];
    
        let result = largest_i32(&number_list);
        println!("The largest number is {}", result);
    
        let char_list = vec!['y', 'm', 'a', 'q'];
    
        let result = largest_char(&char_list);
        println!("The largest char is {}", result);
    }
    
    

    (Download the source code for this example: non_generic.rs)

    Perfect, it works! Now only twenty more types to go...

    Fortunately, Rust gives us a way to avoid all this code duplication and generalize the types we're working on.

    fn largest<T>(list: &[T]) -> T {
        let mut largest = list[0];
    
        for &item in list {
            if item > largest {
                largest = item;
            }
        }
    
        largest
    }
    

    Cleaner already - we merged possibly very many implementations into one. But, when we try to compile this:

    error[E0369]: binary operation `>` cannot be applied to type `T`
     --> src/main.rs:5:17
      |
    5 |         if item > largest {
      |            ---- ^ ------- T
      |            |
      |            T
      |
    help: consider restricting type parameter `T`
      |
    1 | fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> T {
      |             ++++++++++++++++++++++
    

    Since T can be of absolutely any type now, the compiler cannot be sure that operator > is defined. This aligns with what we wanted, as without comparing elements we don't have a notion of the largest one either. As always, the compiler comes to our aid:

    fn largest<T: PartialOrd>(list: &[T]) -> T {
        let mut largest = list[0];
    
        for &item in list {
            if item > largest {
                largest = item;
            }
        }
    
        largest
    }
    

    We call this a trait bound, a way to provide constraints on what kind of types we are talking about in a given context. This implementation almost works now. Let's look at the new error.

    error[E0508]: cannot move out of type `[T]`, a non-copy slice
     --> src/main.rs:2:23
      |
    2 |     let mut largest = list[0];
      |                       ^^^^^^^
      |                       |
      |                       cannot move out of here
      |                       move occurs because `list[_]` has type `T`, which does not implement the `Copy` trait
      |                       help: consider borrowing here: `&list[0]`
    
    error[E0507]: cannot move out of a shared reference
     --> src/main.rs:4:18
      |
    4 |     for &item in list {
      |         -----    ^^^^
      |         ||
      |         |data moved here
      |         |move occurs because `item` has type `T`, which does not implement the `Copy` trait
      |         help: consider removing the `&`: `item`
    

    Our function attempts to take ownership, but, again, the compiler doesn't know whether T can just be trivially copied. Rust allows us to combine multiple trait bounds together:

    fn largest<T: PartialOrd + Copy>(list: &[T]) -> T {
        let mut largest = list[0];
    
        for &item in list {
            if item > largest {
                largest = item;
            }
        }
    
        largest
    }
    
    fn main() {
        let number_list = vec![34, 50, 25, 100, 65];
    
        let result = largest(&number_list);
        println!("The largest number is {}", result);
    
        let char_list = vec!['y', 'm', 'a', 'q'];
    
        let result = largest(&char_list);
        println!("The largest char is {}", result);
    }
    
    

    (Download the source code for this example: generic_largest.rs)

    A powerful tool

    There's a lot more that we can do with generics:

    #![allow(dead_code)]
    
    use std::fmt::Debug;
    
    // generic enums
    enum OurOption<T> {
        Some(T),
        None,
    }
    
    // generic structs
    struct Tuple2<T, U> {
        x: T,
        y: U,
    }
    
    // generic implementation
    impl<T, U> Tuple2<T, U> {
        fn new(x: T, y: U) -> Self {
            Self { x, y }
        }
    }
    
    struct Pair<T> {
        x: T,
        y: T,
    }
    
    // conditional implementation
    impl<T: PartialOrd + Copy> Pair<T> {
        fn largest(&self) -> T {
            if self.x > self.y {
                self.x
            } else {
                self.y
            }
        }
    }
    
    // alternative syntax
    impl<T> Pair<T>
    where
        T: PartialOrd + Copy,
    {
        fn smallest(&self) -> T {
            if self.x < self.y {
                self.x
            } else {
                self.y
            }
        }
    }
    
    // Here information about the concrete underlying type is erased
    // We can only either format or clone the result
    fn cloning_machine(item: &(impl Clone + Debug)) -> impl Clone + Debug {
        item.clone()
    }
    
    fn main() {
        let _opt = OurOption::Some(10);
    
        let _p1 = Tuple2 { x: 5, y: 10 };
        let _p2 = Tuple2::new(1, 2.5);
    
        let arr = [1, 2, 3];
        let arr2 = cloning_machine(&arr);
        // arr2[0]; // won't compile: cannot index into a value of type `impl std::clone::Clone + std::fmt::Debug`
        println!("{:?}", arr2)
    }
    
    

    (Download the source code for this example: generics.rs)

    A bit more involved example:

    use std::fmt::{Display, Formatter};
    
    trait DefaultishablyPrintable<T> {
        fn defaultish_print()
        where
            T: Display + Default,
        {
            println!("{}", T::default())
        }
    }
    
    struct Foo;
    
    struct Bar;
    
    impl Display for Bar {
        fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
            f.write_str("this is a bar")
        }
    }
    
    impl Default for Bar {
        fn default() -> Self {
            Bar // well, we have no other choice
        }
    }
    
    impl DefaultishablyPrintable<i32> for Foo {}
    
    impl DefaultishablyPrintable<Bar> for Foo {}
    
    fn main() {
        <Foo as DefaultishablyPrintable<i32>>::defaultish_print();
        <Foo as DefaultishablyPrintable<Bar>>::defaultish_print();
    }
    
    

    (Download the source code for this example: generics_fun.rs)

    Static vs dynamic dispatch

    trait Speak {
        fn speak(&self) -> &'static str;
    }
    
    struct Dog;
    
    impl Speak for Dog {
        fn speak(&self) -> &'static str {
            "Hau hau" // it's a Polish dog!
        }
    }
    
    struct Human;
    
    impl Speak for Human {
        fn speak(&self) -> &'static str {
            "Hello world"
        }
    }
    
    // It works like templates in C++
    // A different function will be generated for each T during compilation
    // This process is called "monomorphization"
    fn static_dispatch<T: Speak>(speaking: &T) {
        println!("{}!", speaking.speak());
    }
    
    // Only one copy of that function will exist in the compiled binary
    fn dynamic_dispatch(speaking: &dyn Speak) {
        println!("{}!", speaking.speak());
    }
    
    fn main() {
        let dog = Dog;
        let human = Human;
    
        static_dispatch(&dog);
        static_dispatch(&human);
    
        dynamic_dispatch(&dog);
        dynamic_dispatch(&human);
    
        // The observable behavior is identical
        // Static dispatch in general is a bit faster,
        // because there is no need to perform a "vtable lookup".
        // But it can also result in bigger binary sizes.
    }
    
    

    (Download the source code for this example: static_dynamic_dispatch.rs)

    Lifetimes

    Going back to the lesson about ownership, if we try to compile the following code:

    {
        let r;
    
        {
            let x = 5;
            r = &x;
        }
    
        println!("r: {}", r);
    }
    

    we should expect to get an error:

    error[E0597]: `x` does not live long enough
      --> src/main.rs:7:17
       |
    7  |             r = &x;
       |                 ^^ borrowed value does not live long enough
    8  |         }
       |         - `x` dropped here while still borrowed
    9  |
    10 |         println!("r: {}", r);
       |                           - borrow later used here
    

    Courtesy of the borrow checker, we didn't end up with a dangling reference. But what exactly is happening behind the scenes? Rust introduces a concept of annotated lifetimes, where the lifetime of each value is being marked and tracked by the checker. Let's look at some examples:

    {
        let r;                  // ---------+-- 'a
                                //          |
        {                       //          |
            let x = 5;          // -+-- 'b  |
            r = &x;             //  |       |
        }                       // -+       |
                                //          |
        println!("r: {}", r);   //          |
    }                           // ---------+
    
    {
        let x = 5;              // ----------+-- 'b
                                //           |
        let r = &x;             // --+-- 'a  |
                                //   |       |
        println!("r: {}", r);   //   |       |
                                // --+       |
    }                           // ----------+
    

    Annotations

    Let's consider the following code finding the longer out of two strings:

    fn longest(x: &str, y: &str) -> &str {
        if x.len() > y.len() {
            x
        } else {
            y
        }
    }
    
    fn main() {
        let string1 = String::from("abcd");
        let string2 = "xyz";
    
        let result = longest(string1.as_str(), string2);
        println!("The longest string is {}", result);
    }
    

    If we try to compile this, we will get an error:

    error[E0106]: missing lifetime specifier
     --> src/main.rs:9:33
      |
    9 | fn longest(x: &str, y: &str) -> &str {
      |               ----     ----     ^ expected named lifetime parameter
      |
      = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`
    help: consider introducing a named lifetime parameter
      |
    9 | fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
      |           ++++     ++          ++          ++
    

    This is because Rust doesn't know which of the two provided strings (x or y) will be returned from the function. And because they potentially have different lifetimes, the lifetime of what we are returning remains unclear to the compiler - it needs our help.

    Rust provides syntax for specifying lifetimes. The lifetime parameter name from the example (a) doesn't have any concrete meaning - it's just an arbitrary name for this one lifetime.

    &i32        // a reference
    &'a i32     // a reference with an explicit lifetime
    &'a mut i32 // a mutable reference with an explicit lifetime
    

    So, knowing this, let's address the compiler's demands.

    fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
        if x.len() > y.len() {
            x
        } else {
            y
        }
    }
    

    When working with lifetimes, our work will usually revolve around specifying relationships between lifetimes of different values so that the compiler can successfully reason about the program's safety. In the context of the example above, this signature means that both of the function's arguments and its output will live at least as long as lifetime 'a. In practice, this means that the output's lifetime will be equal to the smaller of the two inputs' lifetimes.

    fn longest<'a>(first: &'a str, second: &'a str) -> &'a str {
        if first.len() > second.len() {
            first
        } else {
            second
        }
    }
    
    fn main() {
        let string1 = String::from("long string is long");
    
        {
            let string2 = String::from("xyz");
            let result = longest(string1.as_str(), string2.as_str());
            println!("The longest string is {}", result);
        }
    
        // This doesn't compile - incorrect lifetimes
        //
        // let string1 = String::from("long string is long");
        // let result;
        // {
        //     let string2 = String::from("xyz");
        //     result = longest(string1.as_str(), string2.as_str());
        // }
        // println!("The longest string is {}", result);
    }
    
    

    (Download the source code for this example: lifetimes_basic.rs)

    Trying to compile the second variant displeases the compiler (just like we hoped).

    error[E0597]: `string2` does not live long enough
     --> src/main.rs:6:44
      |
    6 |         result = longest(string1.as_str(), string2.as_str());
      |                                            ^^^^^^^^^^^^^^^^ borrowed value does not live long enough
    7 |     }
      |     - `string2` dropped here while still borrowed
    8 |     println!("The longest string is {}", result);
      |                                          ------ borrow later used here
    

    Lifetime elision

    We now know how to explicitly write lifetime parameters, but you might recall that we don't always have to that. Indeed, Rust will first try to figure out the lifetimes itself, applying a set of predefined rules. We call this lifetime elision.

    fn first_two(seq: &[u32]) -> &[u32] {
        if seq.len() < 2 {
            seq
        } else {
            &seq[..2]
        }
    }
    
    fn main() {
        let seq = [1, 2, 3, 4];
    
        println!(
            "First two elements of the sequence: {:?}",
            first_two(&seq[..])
        );
    }
    
    

    (Download the source code for this example: lifetimes_elision.rs)

    The above works, even though we didn't specify any lifetime parameters at all. The reason lies in the rules we mentioned, which are as follows (where input lifetimes are lifetimes on parameters and output lifetimes are lifetimes on return values):

    • Each parameter that is a reference gets its own lifetime parameter.

    • If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters.

    • If there are multiple input lifetime parameters, but one of them is &self or &mut self, the lifetime of self is assigned to all output lifetime parameters.

    Let's try to understand how the compiler inferred the lifetimes of our first_two functions. We start with the following signature:

    fn first_two(seq: &[u32]) -> &[u32] {
    

    Then, we apply the first rule:

    fn first_two<'a>(seq: &'a [u32]) -> &[u32] {
    

    Next, we check the second rule. It applies here as well.

    fn first_two<'a>(seq: &'a [u32]) -> &'a [u32] {
    

    With that, we arrive at a state where all lifetimes are specified.

    Static lifetime

    There exists one special lifetime called 'static, which means that a reference can live for the entire duration of the program. All string literals are annotated with this lifetime as they are stored directly in the program's binary. Full type annotation of a string literal in Rust is therefore as follows:

    let s: &'static str = "I have a static lifetime.";
    

    Obligatory reading