|
| 1 | +% A 30-minute Introduction to Rust |
| 2 | + |
| 3 | +Rust is a systems programming language that focuses on strong compile-time correctness guarantees. |
| 4 | +It improves upon the ideas other systems languages like C++, D, |
| 5 | +and Cyclone by providing very strong guarantees and explicit control over the life cycle of memory. |
| 6 | +Strong memory guarantees make writing correct concurrent Rust code easier than in other languages. |
| 7 | +This might sound very complex, but it's easier than it sounds! |
| 8 | +This tutorial will give you an idea of what Rust is like in about thirty minutes. |
| 9 | +It expects that you're at least vaguely familiar with a previous 'curly brace' language. |
| 10 | +The concepts are more important than the syntax, |
| 11 | +so don't worry if you don't get every last detail: |
| 12 | +the [tutorial](http://static.rust-lang.org/doc/master/tutorial.html) can help you out with that later. |
| 13 | + |
| 14 | +Let's talk about the most important concept in Rust, "ownership," |
| 15 | +and its implications on a task that programmers usually find very difficult: concurrency. |
| 16 | + |
| 17 | +## Ownership |
| 18 | + |
| 19 | +Ownership is central to Rust, |
| 20 | +and is one of its more interesting and unique features. |
| 21 | +"Ownership" refers to which parts of your code are allowed to modify various parts of memory. |
| 22 | +Let's start by looking at some C++ code: |
| 23 | + |
| 24 | +``` |
| 25 | +int *dangling(void) |
| 26 | +{ |
| 27 | + int i = 1234; |
| 28 | + return &i; |
| 29 | +} |
| 30 | +
|
| 31 | +int add_one(void) |
| 32 | +{ |
| 33 | + int *num = dangling(); |
| 34 | + return *num + 1; |
| 35 | +} |
| 36 | +``` |
| 37 | + |
| 38 | +This function allocates an integer on the stack, |
| 39 | +and stores it in a variable, `i`. |
| 40 | +It then returns a reference to the variable `i`. |
| 41 | +There's just one problem: |
| 42 | +stack memory becomes invalid when the function returns. |
| 43 | +This means that in the second line of `add_one`, |
| 44 | +`num` points to some garbage values, |
| 45 | +and we won't get the effect that we want. |
| 46 | +While this is a trivial example, |
| 47 | +it can happen quite often in C++ code. |
| 48 | +There's a similar problem when memory on the heap is allocated with `malloc` (or `new`), |
| 49 | +then freed with `free` (or `delete`), |
| 50 | +yet your code attempts to do something with the pointer to that memory. |
| 51 | +More modern C++ uses RAII with constructors/destructors, |
| 52 | +but it amounts to the same thing. |
| 53 | +This problem is called a 'dangling pointer,' |
| 54 | +and it's not possible to write Rust code that has it. |
| 55 | +Let's try: |
| 56 | + |
| 57 | +``` |
| 58 | +fn dangling() -> &int { |
| 59 | + let i = 1234; |
| 60 | + return &i; |
| 61 | +} |
| 62 | +
|
| 63 | +fn add_one() -> int { |
| 64 | + let num = dangling(); |
| 65 | + return *num + 1; |
| 66 | +} |
| 67 | +``` |
| 68 | + |
| 69 | +When you try to compile this program, you'll get an interesting (and long) error message: |
| 70 | + |
| 71 | +``` |
| 72 | +temp.rs:3:11: 3:13 error: borrowed value does not live long enough |
| 73 | +temp.rs:3 return &i; |
| 74 | +
|
| 75 | +temp.rs:1:22: 4:1 note: borrowed pointer must be valid for the anonymous lifetime #1 defined on the block at 1:22... |
| 76 | +temp.rs:1 fn dangling() -> &int { |
| 77 | +temp.rs:2 let i = 1234; |
| 78 | +temp.rs:3 return &i; |
| 79 | +temp.rs:4 } |
| 80 | + |
| 81 | +temp.rs:1:22: 4:1 note: ...but borrowed value is only valid for the block at 1:22 |
| 82 | +temp.rs:1 fn dangling() -> &int { |
| 83 | +temp.rs:2 let i = 1234; |
| 84 | +temp.rs:3 return &i; |
| 85 | +temp.rs:4 } |
| 86 | +error: aborting due to previous error |
| 87 | +``` |
| 88 | + |
| 89 | +In order to fully understand this error message, |
| 90 | +we need to talk about what it means to "own" something. |
| 91 | +So for now, |
| 92 | +let's just accept that Rust will not allow us to write code with a dangling pointer, |
| 93 | +and we'll come back to this code once we understand ownership. |
| 94 | + |
| 95 | +Let's forget about programming for a second and talk about books. |
| 96 | +I like to read physical books, |
| 97 | +and sometimes I really like one and tell my friends they should read it. |
| 98 | +While I'm reading my book, I own it: the book is in my possession. |
| 99 | +When I loan the book out to someone else for a while, they "borrow" it from me. |
| 100 | +And when you borrow a book, it's yours for a certain period of time, |
| 101 | +and then you give it back to me, and I own it again. Right? |
| 102 | + |
| 103 | +This concept applies directly to Rust code as well: |
| 104 | +some code "owns" a particular pointer to memory. |
| 105 | +It's the sole owner of that pointer. |
| 106 | +It can also lend that memory out to some other code for a while: |
| 107 | +the code "borrows" it. |
| 108 | +It borrows it for a certain period of time, called a "lifetime." |
| 109 | + |
| 110 | +That's all there is to it. |
| 111 | +That doesn't seem so hard, right? |
| 112 | +Let's go back to that error message: |
| 113 | +`error: borrowed value does not live long enough`. |
| 114 | +We tried to loan out a particular variable, `i`, |
| 115 | +using Rust's borrowed pointers: the `&`. |
| 116 | +But Rust knew that the variable would be invalid after the function returns, |
| 117 | +and so it tells us that: |
| 118 | +`borrowed pointer must be valid for the anonymous lifetime #1... but borrowed value is only valid for the block`. |
| 119 | +Neat! |
| 120 | + |
| 121 | +That's a great example for stack memory, |
| 122 | +but what about heap memory? |
| 123 | +Rust has a second kind of pointer, |
| 124 | +a 'unique' pointer, |
| 125 | +that you can create with a `~`. |
| 126 | +Check it out: |
| 127 | + |
| 128 | +``` |
| 129 | +fn dangling() -> ~int { |
| 130 | + let i = ~1234; |
| 131 | + return i; |
| 132 | +} |
| 133 | +
|
| 134 | +fn add_one() -> int { |
| 135 | + let num = dangling(); |
| 136 | + return *num + 1; |
| 137 | +} |
| 138 | +``` |
| 139 | + |
| 140 | +This code will successfully compile. |
| 141 | +Note that instead of a stack allocated `1234`, |
| 142 | +we use an owned pointer to that value instead: `~1234`. |
| 143 | +You can roughly compare these two lines: |
| 144 | + |
| 145 | +``` |
| 146 | +// rust |
| 147 | +let i = ~1234; |
| 148 | +
|
| 149 | +// C++ |
| 150 | +int *i = new int; |
| 151 | +*i = 1234; |
| 152 | +``` |
| 153 | + |
| 154 | +Rust is able to infer the size of the type, |
| 155 | +then allocates the correct amount of memory and sets it to the value you asked for. |
| 156 | +This means that it's impossible to allocate uninitialized memory: |
| 157 | +Rust does not have the concept of null. |
| 158 | +Hooray! |
| 159 | +There's one other difference between this line of Rust and the C++: |
| 160 | +The Rust compiler also figures out the lifetime of `i`, |
| 161 | +and then inserts a corresponding `free` call after it's invalid, |
| 162 | +like a destructor in C++. |
| 163 | +You get all of the benefits of manually allocated heap memory without having to do all the bookkeeping yourself. |
| 164 | +Furthermore, all of this checking is done at compile time, |
| 165 | +so there's no runtime overhead. |
| 166 | +You'll get (basically) the exact same code that you'd get if you wrote the correct C++, |
| 167 | +but it's impossible to write the incorrect version, thanks to the compiler. |
| 168 | + |
| 169 | +You've seen one way that ownership and lifetimes are useful to prevent code that would normally be dangerous in a less-strict language, |
| 170 | +but let's talk about another: concurrency. |
| 171 | + |
| 172 | +## Concurrency |
| 173 | + |
| 174 | +Concurrency is an incredibly hot topic in the software world right now. |
| 175 | +It's always been an interesting area of study for computer scientists, |
| 176 | +but as usage of the Internet explodes, |
| 177 | +people are looking to improve the number of users a given service can handle. |
| 178 | +Concurrency is one way of achieving this goal. |
| 179 | +There is a pretty big drawback to concurrent code, though: |
| 180 | +it can be hard to reason about, |
| 181 | +because it is non-deterministic. |
| 182 | +There are a few different approaches to writing good concurrent code, |
| 183 | +but let's talk about how Rust's notions of ownership and lifetimes can assist with achieving correct but concurrent code. |
| 184 | + |
| 185 | +First, let's go over a simple concurrency example in Rust. |
| 186 | +Rust allows you to spin up 'tasks,' |
| 187 | +which are lightweight, 'green' threads. |
| 188 | +These tasks do not have any shared memory, and so, |
| 189 | +we communicate between tasks with a 'channel'. |
| 190 | +Like this: |
| 191 | + |
| 192 | +``` |
| 193 | +fn main() { |
| 194 | + let numbers = [1,2,3]; |
| 195 | +
|
| 196 | + let (port, chan) = Chan::new(); |
| 197 | + chan.send(numbers); |
| 198 | +
|
| 199 | + do spawn { |
| 200 | + let numbers = port.recv(); |
| 201 | + println!("{:d}", numbers[0]); |
| 202 | + } |
| 203 | +} |
| 204 | +``` |
| 205 | + |
| 206 | +In this example, we create a vector of numbers. |
| 207 | +We then make a new `Chan`, |
| 208 | +which is the name of the package Rust implements channels with. |
| 209 | +This returns two different ends of the channel: |
| 210 | +a channel and a port. |
| 211 | +You send data into the channel end, and it comes out the port end. |
| 212 | +The `spawn` function spins up a new task. |
| 213 | +As you can see in the code, |
| 214 | +we call `port.recv()` (short for 'receive') inside of the new task, |
| 215 | +and we call `chan.send()` outside, |
| 216 | +passing in our vector. |
| 217 | +We then print the first element of the vector. |
| 218 | + |
| 219 | +This works out because Rust copies the vector when it is sent through the channel. |
| 220 | +That way, if it were mutable, there wouldn't be a race condition. |
| 221 | +However, if we're making a lot of tasks, or if our data is very large, |
| 222 | +making a copy for each task inflates our memory usage with no real benefit. |
| 223 | + |
| 224 | +Enter Arc. |
| 225 | +Arc stands for 'atomically reference counted,' |
| 226 | +and it's a way to share immutable data between multiple tasks. |
| 227 | +Here's some code: |
| 228 | + |
| 229 | +``` |
| 230 | +extern mod extra; |
| 231 | +use extra::arc::Arc; |
| 232 | +
|
| 233 | +fn main() { |
| 234 | + let numbers = [1,2,3]; |
| 235 | +
|
| 236 | + let numbers_arc = Arc::new(numbers); |
| 237 | +
|
| 238 | + for num in range(0, 3) { |
| 239 | + let (port, chan) = Chan::new(); |
| 240 | + chan.send(numbers_arc.clone()); |
| 241 | +
|
| 242 | + do spawn { |
| 243 | + let local_arc = port.recv(); |
| 244 | + let task_numbers = local_arc.get(); |
| 245 | + println!("{:d}", task_numbers[num]); |
| 246 | + } |
| 247 | + } |
| 248 | +} |
| 249 | +``` |
| 250 | + |
| 251 | +This is very similar to the code we had before, |
| 252 | +except now we loop three times, |
| 253 | +making three tasks, |
| 254 | +and sending an `Arc` between them. |
| 255 | +`Arc::new` creates a new Arc, |
| 256 | +`.clone()` makes a new reference to that Arc, |
| 257 | +and `.get()` gets the value out of the Arc. |
| 258 | +So we make a new reference for each task, |
| 259 | +send that reference down the channel, |
| 260 | +and then use the reference to print out a number. |
| 261 | +Now we're not copying our vector. |
| 262 | + |
| 263 | +Arcs are great for immutable data, |
| 264 | +but what about mutable data? |
| 265 | +Shared mutable state is the bane of the concurrent programmer. |
| 266 | +You can use a mutex to protect shared mutable state, |
| 267 | +but if you forget to acquire the mutex, bad things can happen. |
| 268 | + |
| 269 | +Rust provides a tool for shared mutable state: `RWArc`. |
| 270 | +This variant of an Arc allows the contents of the Arc to be mutated. |
| 271 | +Check it out: |
| 272 | + |
| 273 | +``` |
| 274 | +extern mod extra; |
| 275 | +use extra::arc::RWArc; |
| 276 | +
|
| 277 | +fn main() { |
| 278 | + let numbers = [1,2,3]; |
| 279 | +
|
| 280 | + let numbers_arc = RWArc::new(numbers); |
| 281 | +
|
| 282 | + for num in range(0, 3) { |
| 283 | + let (port, chan) = Chan::new(); |
| 284 | + chan.send(numbers_arc.clone()); |
| 285 | +
|
| 286 | + do spawn { |
| 287 | + let local_arc = port.recv(); |
| 288 | +
|
| 289 | + local_arc.write(|nums| { |
| 290 | + nums[num] += 1 |
| 291 | + }); |
| 292 | +
|
| 293 | + local_arc.read(|nums| { |
| 294 | + println!("{:d}", nums[num]); |
| 295 | + }) |
| 296 | + } |
| 297 | + } |
| 298 | +} |
| 299 | +``` |
| 300 | + |
| 301 | +We now use the `RWArc` package to get a read/write Arc. |
| 302 | +The read/write Arc has a slightly different API than `Arc`: |
| 303 | +`read` and `write` allow you to, well, read and write the data. |
| 304 | +They both take closures as arguments, |
| 305 | +and the read/write Arc will, in the case of write, |
| 306 | +acquire a mutex, |
| 307 | +and then pass the data to this closure. |
| 308 | +After the closure does its thing, the mutex is released. |
| 309 | + |
| 310 | +You can see how this makes it impossible to mutate the state without remembering to aquire the lock. |
| 311 | +We gain the efficiency of shared mutable state, |
| 312 | +while retaining the safety of disallowing shared mutable state. |
| 313 | + |
| 314 | +But wait, how is that possible? |
| 315 | +We can't both allow and disallow mutable state. |
| 316 | +What gives? |
| 317 | + |
| 318 | +## A footnote: unsafe |
| 319 | + |
| 320 | +So, the Rust language does not allow for shared mutable state, |
| 321 | +yet I just showed you some code that has it. |
| 322 | +How's this possible? The answer: `unsafe`. |
| 323 | + |
| 324 | +You see, while the Rust compiler is very smart, |
| 325 | +and saves you from making mistakes you might normally make, |
| 326 | +it's not an artificial intelligence. |
| 327 | +Because we're smarter than the compiler, |
| 328 | +sometimes, we need to over-ride this safe behavior. |
| 329 | +For this purpose, Rust has an `unsafe` keyword. |
| 330 | +Within an `unsafe` block, |
| 331 | +Rust turns off many of its safety checks. |
| 332 | +If something bad happens to your program, |
| 333 | +you only have to audit what you've done inside `unsafe`, |
| 334 | +and not the entire program itself. |
| 335 | + |
| 336 | +If one of the major goals of Rust was safety, |
| 337 | +why allow that safety to be turned off? |
| 338 | +Well, there are really only three main reasons to do it: |
| 339 | +interfacing with external code, |
| 340 | +such as doing FFI into a C library, |
| 341 | +performance (in certain cases), |
| 342 | +and to provide a safe abstraction around operations that normally would not be safe. |
| 343 | +Our Arcs are an example of this last purpose. |
| 344 | +We can safely hand out multiple references to the `Arc`, |
| 345 | +because we are sure the data is immutable, |
| 346 | +and therefore it is safe to share. |
| 347 | +We can hand out multiple references to the `RWArc`, |
| 348 | +because we know that we've wrapped the data in a mutex, |
| 349 | +and therefore it is safe to share. |
| 350 | +But the Rust compiler can't know that we've made these choices, |
| 351 | +so _inside_ the implementation of the Arcs, |
| 352 | +we use `unsafe` blocks to do (normally) dangerous things. |
| 353 | +But we expose a safe interface, |
| 354 | +which means that the Arcs are impossible to use incorrectly. |
| 355 | + |
| 356 | +This is how Rust's type system allows you to not make some of the mistakes that make concurrent programming difficult, |
| 357 | +yet get the efficiency of languages such as C++. |
| 358 | + |
| 359 | +## That's all, folks |
| 360 | + |
| 361 | +I hope that this taste of Rust has given you an idea if Rust is the right language for you. |
| 362 | +If that's true, |
| 363 | +I encourage you to check out [the tutorial](http://static.rust-lang.org/doc/0.9/tutorial.html) for a full, |
| 364 | +in-depth exploration of Rust's syntax and concepts. |
0 commit comments