|
| 1 | +% Choosing your Guarantees |
| 2 | + |
| 3 | +One important feature of Rust as language is that it lets us control the costs and guarantees |
| 4 | +of a program. |
| 5 | + |
| 6 | +There are various “wrapper type” abstractions in the Rust standard library which embody |
| 7 | +a multitude of tradeoffs between cost, ergonomics, and guarantees. Many let one choose between |
| 8 | +run time and compile time enforcement. This section will explain a few selected abstractions in |
| 9 | +detail. |
| 10 | + |
| 11 | +Before proceeding, it is highly recommended that one reads about [ownership][ownership] and |
| 12 | +[borrowing][borrowing] in Rust. |
| 13 | + |
| 14 | +[ownership]: ownership.html |
| 15 | +[borrowing]: references-and-borrowing.html |
| 16 | + |
| 17 | +# Basic pointer types |
| 18 | + |
| 19 | +## `Box<T>` |
| 20 | + |
| 21 | +[`Box<T>`][box] is pointer which is “owned”, or a “box”. While it can hand |
| 22 | +out references to the contained data, it is the only owner of the data. In particular, when |
| 23 | +something like the following occurs: |
| 24 | + |
| 25 | +```rust |
| 26 | +let x = Box::new(1); |
| 27 | +let y = x; |
| 28 | +// x no longer accessible here |
| 29 | +``` |
| 30 | + |
| 31 | +Here, the box was _moved_ into `y`. As `x` no longer owns it, the compiler will no longer allow the |
| 32 | +programmer to use `x` after this. A box can similarly be moved _out_ of a function by returning it. |
| 33 | + |
| 34 | +When a box (that hasn't been moved) goes out of scope, destructors are run. These destructors take |
| 35 | +care of deallocating the inner data. |
| 36 | + |
| 37 | +This is a zero-cost abstraction for dynamic allocation. If you want to allocate some memory on the |
| 38 | +heap and safely pass around a pointer to that memory, this is ideal. Note that you will only be |
| 39 | +allowed to share references to this by the regular borrowing rules, checked at compile time. |
| 40 | + |
| 41 | +[box]: ../std/boxed/struct.Box.html |
| 42 | + |
| 43 | +## `&T` and `&mut T` |
| 44 | + |
| 45 | +These are immutable and mutable references respectively. They follow the &lquo;read-write lock&rquo; |
| 46 | +pattern, such that one may either have only one mutable reference to some data, or any number of |
| 47 | +immutable ones, but not both. This guarantee is enforced at compile time, and has no visible cost at |
| 48 | +runtime. In most cases these two pointer types suffice for sharing cheap references between sections |
| 49 | +of code. |
| 50 | + |
| 51 | +These pointers cannot be copied in such a way that they outlive the lifetime associated with them. |
| 52 | + |
| 53 | +## `*const T` and `*mut T` |
| 54 | + |
| 55 | +These are C-like raw pointers with no lifetime or ownership attached to them. They just point to |
| 56 | +some location in memory with no other restrictions. The only guarantee that these provide is that |
| 57 | +they cannot be dereferenced except in code marked `unsafe`. |
| 58 | + |
| 59 | +These are useful when building safe, low cost abstractions like `Vec<T>`, but should be avoided in |
| 60 | +safe code. |
| 61 | + |
| 62 | +## `Rc<T>` |
| 63 | + |
| 64 | +This is the first wrapper we will cover that has a runtime cost. |
| 65 | + |
| 66 | +[`Rc<T>`][rc] is a reference counted pointer. In other words, this lets us have multiple "owning" |
| 67 | +pointers to the same data, and the data will be dropped (destructors will be run) when all pointers |
| 68 | +are out of scope. |
| 69 | + |
| 70 | +Internally, it contains a shared “reference count” (also called “refcount”), |
| 71 | +which is incremented each time the `Rc` is cloned, and decremented each time one of the `Rc`s goes |
| 72 | +out of scope. The main responsibility of `Rc<T>` is to ensure that destructors are called for shared |
| 73 | +data. |
| 74 | + |
| 75 | +The internal data here is immutable, and if a cycle of references is created, the data will be |
| 76 | +leaked. If we want data that doesn't leak when there are cycles, we need a garbage collector. |
| 77 | + |
| 78 | +#### Guarantees |
| 79 | + |
| 80 | +The main guarantee provided here is that the data will not be destroyed until all references to it |
| 81 | +are out of scope. |
| 82 | + |
| 83 | +This should be used when we wish to dynamically allocate and share some data (read-only) between |
| 84 | +various portions of yur program, where it is not certain which portion will finish using the pointer |
| 85 | +last. It's a viable alternative to `&T` when `&T` is either impossible to statically check for |
| 86 | +correctness, or creates extremely unergonomic code where the programmer does not wish to spend the |
| 87 | +development cost of working with. |
| 88 | + |
| 89 | +This pointer is _not_ thread safe, and Rust will not let it be sent or shared with other threads. |
| 90 | +This lets one avoid the cost of atomics in situations where they are unnecessary. |
| 91 | + |
| 92 | +There is a sister smart pointer to this one, `Weak<T>`. This is a non-owning, but also non-borrowed, |
| 93 | +smart pointer. It is also similar to `&T`, but it is not restricted in lifetime—a `Weak<T>` |
| 94 | +can be held on to forever. However, it is possible that an attempt to access the inner data may fail |
| 95 | +and return `None`, since this can outlive the owned `Rc`s. This is useful for cyclic |
| 96 | +data structures and other things. |
| 97 | + |
| 98 | +#### Cost |
| 99 | + |
| 100 | +As far as memory goes, `Rc<T>` is a single allocation, though it will allocate two extra words (i.e. |
| 101 | +two `usize` values) as compared to a regular `Box<T>` (for "strong" and "weak" refcounts). |
| 102 | + |
| 103 | +`Rc<T>` has the computational cost of incrementing/decrementing the refcount whenever it is cloned |
| 104 | +or goes out of scope respectively. Note that a clone will not do a deep copy, rather it will simply |
| 105 | +increment the inner reference count and return a copy of the `Rc<T>`. |
| 106 | + |
| 107 | +[rc]: ../std/rc/struct.Rc.html |
| 108 | + |
| 109 | +# Cell types |
| 110 | + |
| 111 | +&lquo;Cell&rquo;s provide interior mutability. In other words, they contain data which can be manipulated even |
| 112 | +if the type cannot be obtained in a mutable form (for example, when it is behind an `&`-ptr or |
| 113 | +`Rc<T>`). |
| 114 | + |
| 115 | +[The documentation for the `cell` module has a pretty good explanation for these][cell-mod]. |
| 116 | + |
| 117 | +These types are _generally_ found in struct fields, but they may be found elsewhere too. |
| 118 | + |
| 119 | +## `Cell<T>` |
| 120 | + |
| 121 | +[`Cell<T>`][cell] is a type that provides zero-cost interior mutability, but only for `Copy` types. |
| 122 | +Since the compiler knows that all the data owned by the contained value is on the stack, there's |
| 123 | +no worry of leaking any data behind references (or worse!) by simply replacing the data. |
| 124 | + |
| 125 | +It is still possible to violate your own invariants using this wrapper, so be careful when using it. |
| 126 | +If a field is wrapped in `Cell`, it's a nice indicator that the chunk of data is mutable and may not |
| 127 | +stay the same between the time you first read it and when you intend to use it. |
| 128 | + |
| 129 | +```rust |
| 130 | +# use std::cell::Cell; |
| 131 | +let x = Cell::new(1); |
| 132 | +let y = &x; |
| 133 | +let z = &x; |
| 134 | +x.set(2); |
| 135 | +y.set(3); |
| 136 | +z.set(4); |
| 137 | +println!("{}", x.get()); |
| 138 | +``` |
| 139 | + |
| 140 | +Note that here we were able to mutate the same value from various immutable references. |
| 141 | + |
| 142 | +This has the same runtime cost as the following: |
| 143 | + |
| 144 | +```rust,ignore |
| 145 | +let mut x = 1; |
| 146 | +let y = &mut x; |
| 147 | +let z = &mut x; |
| 148 | +x = 2; |
| 149 | +*y = 3; |
| 150 | +*z = 4; |
| 151 | +println!("{}", x); |
| 152 | +``` |
| 153 | + |
| 154 | +but it has the added benefit of actually compiling successfully. |
| 155 | + |
| 156 | +#### Guarantees |
| 157 | + |
| 158 | +This relaxes the “no aliasing with mutability” restriction in places where it's |
| 159 | +unnecessary. However, this also relaxes the guarantees that the restriction provides; so if your |
| 160 | +invariants depend on data stored within `Cell`, you should be careful. |
| 161 | + |
| 162 | +This is useful for mutating primitives and other `Copy` types when there is no easy way of |
| 163 | +doing it in line with the static rules of `&` and `&mut`. |
| 164 | + |
| 165 | +`Cell` does not let you obtain interior references to the data, which makes it safe to freely |
| 166 | +mutate. |
| 167 | + |
| 168 | +#### Cost |
| 169 | + |
| 170 | +There is no runtime cost to using `Cell<T>`, however if you are using it to wrap larger (`Copy`) |
| 171 | +structs, it might be worthwhile to instead wrap individual fields in `Cell<T>` since each write is |
| 172 | +otherwise a full copy of the struct. |
| 173 | + |
| 174 | + |
| 175 | +## `RefCell<T>` |
| 176 | + |
| 177 | +[`RefCell<T>`][refcell] also provides interior mutability, but isn't restricted to `Copy` types. |
| 178 | + |
| 179 | +Instead, it has a runtime cost. `RefCell<T>` enforces the read-write lock pattern at runtime (it's |
| 180 | +like a single-threaded mutex), unlike `&T`/`&mut T` which do so at compile time. This is done by the |
| 181 | +`borrow()` and `borrow_mut()` functions, which modify an internal reference count and return smart |
| 182 | +pointers which can be dereferenced immutably and mutably respectively. The refcount is restored when |
| 183 | +the smart pointers go out of scope. With this system, we can dynamically ensure that there are never |
| 184 | +any other borrows active when a mutable borrow is active. If the programmer attempts to make such a |
| 185 | +borrow, the thread will panic. |
| 186 | + |
| 187 | +```rust |
| 188 | +# use std::cell::RefCell; |
| 189 | +let x = RefCell::new(vec![1,2,3,4]); |
| 190 | +{ |
| 191 | + println!("{:?}", *x.borrow()) |
| 192 | +} |
| 193 | + |
| 194 | +{ |
| 195 | + let mut my_ref = x.borrow_mut(); |
| 196 | + my_ref.push(1); |
| 197 | +} |
| 198 | +``` |
| 199 | + |
| 200 | +Similar to `Cell`, this is mainly useful for situations where it's hard or impossible to satisfy the |
| 201 | +borrow checker. Generally we know that such mutations won't happen in a nested form, but it's good |
| 202 | +to check. |
| 203 | + |
| 204 | +For large, complicated programs, it becomes useful to put some things in `RefCell`s to make things |
| 205 | +simpler. For example, a lot of the maps in [the `ctxt` struct][ctxt] in the rust compiler internals |
| 206 | +are inside this wrapper. These are only modified once (during creation, which is not right after |
| 207 | +initialization) or a couple of times in well-separated places. However, since this struct is |
| 208 | +pervasively used everywhere, juggling mutable and immutable pointers would be hard (perhaps |
| 209 | +impossible) and probably form a soup of `&`-ptrs which would be hard to extend. On the other hand, |
| 210 | +the `RefCell` provides a cheap (not zero-cost) way of safely accessing these. In the future, if |
| 211 | +someone adds some code that attempts to modify the cell when it's already borrowed, it will cause a |
| 212 | +(usually deterministic) panic which can be traced back to the offending borrow. |
| 213 | + |
| 214 | +Similarly, in Servo's DOM there is a lot of mutation, most of which is local to a DOM type, but some |
| 215 | +of which crisscrosses the DOM and modifies various things. Using `RefCell` and `Cell` to guard all |
| 216 | +mutation lets us avoid worrying about mutability everywhere, and it simultaneously highlights the |
| 217 | +places where mutation is _actually_ happening. |
| 218 | + |
| 219 | +Note that `RefCell` should be avoided if a mostly simple solution is possible with `&` pointers. |
| 220 | + |
| 221 | +#### Guarantees |
| 222 | + |
| 223 | +`RefCell` relaxes the _static_ restrictions preventing aliased mutation, and replaces them with |
| 224 | +_dynamic_ ones. As such the guarantees have not changed. |
| 225 | + |
| 226 | +#### Cost |
| 227 | + |
| 228 | +`RefCell` does not allocate, but it contains an additional "borrow state" |
| 229 | +indicator (one word in size) along with the data. |
| 230 | + |
| 231 | +At runtime each borrow causes a modification/check of the refcount. |
| 232 | + |
| 233 | +[cell-mod]: ../std/cell/ |
| 234 | +[cell]: ../std/cell/struct.Cell.html |
| 235 | +[refcell]: ../std/cell/struct.RefCell.html |
| 236 | +[ctxt]: ../rustc/middle/ty/struct.ctxt.html |
| 237 | + |
| 238 | +# Synchronous types |
| 239 | + |
| 240 | +Many of the types above cannot be used in a threadsafe manner. Particularly, `Rc<T>` and |
| 241 | +`RefCell<T>`, which both use non-atomic reference counts (_atomic_ reference counts are those which |
| 242 | +can be incremented from multiple threads without causing a data race), cannot be used this way. This |
| 243 | +makes them cheaper to use, but we need thread safe versions of these too. They exist, in the form of |
| 244 | +`Arc<T>` and `Mutex<T>`/`RWLock<T>` |
| 245 | + |
| 246 | +Note that the non-threadsafe types _cannot_ be sent between threads, and this is checked at compile |
| 247 | +time. |
| 248 | + |
| 249 | +There are many useful wrappers for concurrent programming in the [sync][sync] module, but only the |
| 250 | +major ones will be covered below. |
| 251 | + |
| 252 | +[sync]: ../std/sync/index.html |
| 253 | + |
| 254 | +## `Arc<T>` |
| 255 | + |
| 256 | +[`Arc<T>`][arc] is just a version of `Rc<T>` that uses an atomic reference count (hence, "Arc"). |
| 257 | +This can be sent freely between threads. |
| 258 | + |
| 259 | +C++'s `shared_ptr` is similar to `Arc`, however in the case of C++ the inner data is always mutable. |
| 260 | +For semantics similar to that from C++, we should use `Arc<Mutex<T>>`, `Arc<RwLock<T>>`, or |
| 261 | +`Arc<UnsafeCell<T>>`[^4] (`UnsafeCell<T>` is a cell type that can be used to hold any data and has |
| 262 | +no runtime cost, but accessing it requires `unsafe` blocks). The last one should only be used if we |
| 263 | +are certain that the usage won't cause any memory unsafety. Remember that writing to a struct is not |
| 264 | +an atomic operation, and many functions like `vec.push()` can reallocate internally and cause unsafe |
| 265 | +behavior, so even monotonicity may not be enough to justify `UnsafeCell`. |
| 266 | + |
| 267 | +[^4]: `Arc<UnsafeCell<T>>` actually won't compile since `UnsafeCell<T>` isn't `Send` or `Sync`, but we can wrap it in a type and implement `Send`/`Sync` for it manually to get `Arc<Wrapper<T>>` where `Wrapper` is `struct Wrapper<T>(UnsafeCell<T>)`. |
| 268 | + |
| 269 | +#### Guarantees |
| 270 | + |
| 271 | +Like `Rc`, this provides the (thread safe) guarantee that the destructor for the internal data will |
| 272 | +be run when the last `Arc` goes out of scope (barring any cycles). |
| 273 | + |
| 274 | +#### Cost |
| 275 | + |
| 276 | +This has the added cost of using atomics for changing the refcount (which will happen whenever it is |
| 277 | +cloned or goes out of scope). When sharing data from an `Arc` in a single thread, it is preferable |
| 278 | +to share `&` pointers whenever possible. |
| 279 | + |
| 280 | +[arc]: ../std/sync/struct.Arc.html |
| 281 | + |
| 282 | +## `Mutex<T>` and `RwLock<T>` |
| 283 | + |
| 284 | +[`Mutex<T>`][mutex] and [`RwLock<T>`][rwlock] provide mutual-exclusion via RAII guards (guards are |
| 285 | +objects which maintain some state, like a lock, until their destructor is called). For both of |
| 286 | +these, the mutex is opaque until we call `lock()` on it, at which point the thread will block |
| 287 | +until a lock can be acquired, and then a guard will be returned. This guard can be used to access |
| 288 | +the inner data (mutably), and the lock will be released when the guard goes out of scope. |
| 289 | + |
| 290 | +```rust,ignore |
| 291 | +{ |
| 292 | + let guard = mutex.lock(); |
| 293 | + // guard dereferences mutably to the inner type |
| 294 | + *guard += 1; |
| 295 | +} // lock released when destructor runs |
| 296 | +``` |
| 297 | + |
| 298 | + |
| 299 | +`RwLock` has the added benefit of being efficient for multiple reads. It is always safe to have |
| 300 | +multiple readers to shared data as long as there are no writers; and `RwLock` lets readers acquire a |
| 301 | +"read lock". Such locks can be acquired concurrently and are kept track of via a reference count. |
| 302 | +Writers must obtain a "write lock" which can only be obtained when all readers have gone out of |
| 303 | +scope. |
| 304 | + |
| 305 | +#### Guarantees |
| 306 | + |
| 307 | +Both of these provide safe shared mutability across threads, however they are prone to deadlocks. |
| 308 | +Some level of additional protocol safety can be obtained via the type system. |
| 309 | +#### Costs |
| 310 | + |
| 311 | +These use internal atomic-like types to maintain the locks, which are pretty costly (they can block |
| 312 | +all memory reads across processors till they're done). Waiting on these locks can also be slow when |
| 313 | +there's a lot of concurrent access happening. |
| 314 | + |
| 315 | +[rwlock]: ../std/sync/struct.RwLock.html |
| 316 | +[mutex]: ../std/sync/struct.Mutex.html |
| 317 | +[sessions]: https://github.com/Munksgaard/rust-sessions |
| 318 | + |
| 319 | +# Composition |
| 320 | + |
| 321 | +A common gripe when reading Rust code is with types like `Rc<RefCell<Vec<T>>>` (or even more more |
| 322 | +complicated compositions of such types). It's not always clear what the composition does, or why the |
| 323 | +author chose one like this (and when one should be using such a composition in one's own code) |
| 324 | + |
| 325 | +Usually, it's a case of composing together the guarantees that you need, without paying for stuff |
| 326 | +that is unnecessary. |
| 327 | + |
| 328 | +For example, `Rc<RefCell<T>>` is one such composition. `Rc<T>` itself can't be dereferenced mutably; |
| 329 | +because `Rc<T>` provides sharing and shared mutability can lead to unsafe behavior, so we put |
| 330 | +`RefCell<T>` inside to get dynamically verified shared mutability. Now we have shared mutable data, |
| 331 | +but it's shared in a way that there can only be one mutator (and no readers) or multiple readers. |
| 332 | + |
| 333 | +Now, we can take this a step further, and have `Rc<RefCell<Vec<T>>>` or `Rc<Vec<RefCell<T>>>`. These |
| 334 | +are both shareable, mutable vectors, but they're not the same. |
| 335 | + |
| 336 | +With the former, the `RefCell<T>` is wrapping the `Vec<T>`, so the `Vec<T>` in its entirety is |
| 337 | +mutable. At the same time, there can only be one mutable borrow of the whole `Vec` at a given time. |
| 338 | +This means that your code cannot simultaneously work on different elements of the vector from |
| 339 | +different `Rc` handles. However, we are able to push and pop from the `Vec<T>` at will. This is |
| 340 | +similar to an `&mut Vec<T>` with the borrow checking done at runtime. |
| 341 | + |
| 342 | +With the latter, the borrowing is of individual elements, but the overall vector is immutable. Thus, |
| 343 | +we can independently borrow separate elements, but we cannot push or pop from the vector. This is |
| 344 | +similar to an `&mut [T]`[^3], but, again, the borrow checking is at runtime. |
| 345 | + |
| 346 | +In concurrent programs, we have a similar situation with `Arc<Mutex<T>>`, which provides shared |
| 347 | +mutability and ownership. |
| 348 | + |
| 349 | +When reading code that uses these, go in step by step and look at the guarantees/costs provided. |
| 350 | + |
| 351 | +When choosing a composed type, we must do the reverse; figure out which guarantees we want, and at |
| 352 | +which point of the composition we need them. For example, if there is a choice between |
| 353 | +`Vec<RefCell<T>>` and `RefCell<Vec<T>>`, we should figure out the tradeoffs as done above and pick |
| 354 | +one. |
| 355 | + |
| 356 | +[^3]: `&[T]` and `&mut [T]` are _slices_; they consist of a pointer and a length and can refer to a portion of a vector or array. `&mut [T]` can have its elements mutated, however its length cannot be touched. |
0 commit comments