Fearless concurrency

    2024-11-14 (last edit: 2024-11-14)

    Parallelism vs Concurrency

    Concurrency is when tasks can make progress independently of each other.

    Parallelism is when multiple tasks make progress at the same time.

    Concurrency models in Rust

    Threads

    Nothing unusual here.

    Threads can be created with the thread::spawn function docs - please read them!.

    This method returns a JoinHandle<T> which can be used to wait for the thread to finish. T is the type of the thread's return value.

    Another way to spawn threads is using scope. Threads created in such way are mandatorily joined at the end of the scope, which guarantees that they will borrow items for no longer that the lifetime of the scope. Hence, they can borrow non-'static items!

    Propagating panics

    In Rust a panic of one thread doesn't affect the other threads (similar to how Java handles exceptions in threads).

    Closures

    Closures which are used to create threads must take ownership of any values they use. It can be forced with the move keyword.

    use std::thread;
    
    fn main() {
        let v = vec![1, 2, 3];
    
        let handle = thread::spawn(move || {
            println!("Here's a vector: {:?}", v);
        });
    
        handle.join().unwrap();
    }
    

    Normal ownership rules still apply. It means that we cannot mutate the vector in the spawned thread from the main thread!

    But what if we need to share some state?

    Send and Sync

    They are marker traits used to indicate that a type or a reference to it can be used across threads. See the nomicon for more information.

    • A type is Send if it is safe to move it (send it) to another thread.
    • A type is Sync if it is safe to share (sync) between threads (T is Sync if and only if &T is Send).

    This makes sense, because Sync is about sharing object between threads, and & is the shared reference.

    There is also a great answer on Rust forum, listing + explaining example types that are !Send or !Sync.

    For more convenient analysis, examples are listed here:

    Send + !Sync:

    • UnsafeCell (=> Cell, RefCell);

    !Send + !Sync:

    • Rc
    • *const T, *mut T (raw pointers)

    !Send + Sync:

    • MutexGuard (hint: !Send for POSIX reasons)

    Exercise for the reader: explain reasons for all limitations of the above types.

    Sharing state between threads

    Message passing

    One possible way is to use message passing. We can use a blocking queue (called mpsc - "multi producer single consumer FIFO queue") to do it. We talked about blocking queues in the Concurrent programming class. In Rust, they are strongly-typed. Sending and receiving ends have different types.

    Mutexes

    In Rust, a mutex wraps a value and makes it thread-safe. Because it becomes a part of the type, it's impossible to access the underlying value in an unsynchronized manner. It is conceptually similar to the RefCell type.

    Arc is a smart pointer like Rc but it can be shared between threads.

    Please read more about them in the book.

    The docs also mention poisoning.

    RwLocks

    RwLocks are similar to mutexes, but they distinguish between read and write locks.

    Atomic types

    Atomic types are described in the docs.

    use std::sync::Arc;
    use std::sync::atomic::{AtomicUsize, Ordering};
    use std::{hint, thread};
    
    fn main() {
        let spinlock = Arc::new(AtomicUsize::new(1));
    
        let spinlock_clone = Arc::clone(&spinlock);
        let thread = thread::spawn(move|| {
            spinlock_clone.store(0, Ordering::SeqCst);
        });
    
        // Wait for the other thread to release the lock
        while spinlock.load(Ordering::SeqCst) != 0 {
            hint::spin_loop();
        }
    
        if let Err(panic) = thread.join() {
            println!("Thread had an error: {:?}", panic);
        }
    }
    

    Note that atomic values don't have to be wrapped in a mutex when shared across threads.

    Wait...

    If most types are Sync + Send, then what stops us from using a standard, non-atomic integer in the example above?

    let spinlock = Arc::new(1);
    
    let spinlock_clone = Arc::clone(&spinlock);
    let thread = thread::spawn(move|| {
        *spinlock_clone += 1;
    });
    
    while *spinlock != 0 {
        hint::spin_loop();
    }
    
    error[E0594]: cannot assign to data in an `Arc`
     --> src/main.rs:9:9
      |
    9 |         *spinlock_clone += 1;
      |         ^^^^^^^^^^^^^^^^^^^^ cannot assign
      |
      = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Arc<i32>`
    

    ...so we would have to use a RefCell to be able to modify the value through a shared reference...

    let spinlock = Arc::new(RefCell::new(1));
    
    let spinlock_clone = Arc::clone(&spinlock);
    let thread = thread::spawn(move|| {
        *spinlock_clone.borrow_mut() += 1;
    });
    
    // Wait for the other thread to release the lock
    while *spinlock.borrow() != 0 {
        hint::spin_loop();
    }
    

    ...but RefCell isn't Sync:

    error[E0277]: `RefCell<i32>` cannot be shared between threads safely
       --> src/main.rs:9:18
        |
    9   |     let thread = thread::spawn(move|| {
        |                  ^^^^^^^^^^^^^ `RefCell<i32>` cannot be shared between threads safely
        |
        = help: the trait `Sync` is not implemented for `RefCell<i32>`
        = note: required because of the requirements on the impl of `Send` for `Arc<RefCell<i32>>`
        = note: required because it appears within the type `[closure@src/main.rs:9:32: 11:6]`
    note: required by a bound in `spawn`
    

    And that bound mentioned in the last line looks like this:

    pub fn spawn<F, T>(f: F) -> JoinHandle<T> where
        F: FnOnce() -> T,
        F: Send + 'static,
        T: Send + 'static,
    

    Exercise for the reader

    Why is it impossible to share a reference to a Mutex between threads spawned with std::thread::spawn?

    Data parallelism with Rayon

    Rayon is a library for parallelization of data processing. It can be used to parallelize the execution of functions over a collection of data by switching the standard Iterator to a ParallelIterator. It works very similar to Java's parallel streams.

    Why do that? Because thread synchronization is hard! Rust prevents data races, but logical races and deadlocks are impossible to prevent!!

    Rayon's FAQ is worth reading.

    Reading

    No assignment this week

    Please work on the first iteration of the big project instead.