Skip to content

Commit 518a9c0

Browse files
authored
Merge branch 'master' into linkcheck
2 parents 6431365 + 28b7fab commit 518a9c0

13 files changed

+995
-33
lines changed

src/SUMMARY.md

+7-4
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,19 @@
55
- [Using the compiler testing framework](./running-tests.md)
66
- [Walkthrough: a typical contribution](./walkthrough.md)
77
- [High-level overview of the compiler source](./high-level-overview.md)
8+
- [Queries: demand-driven compilation](./query.md)
9+
- [Incremental compilation](./incremental-compilation.md)
810
- [The parser](./the-parser.md)
911
- [Macro expansion](./macro-expansion.md)
1012
- [Name resolution](./name-resolution.md)
11-
- [HIR lowering](./hir-lowering.md)
13+
- [The HIR (High-level IR)](./hir.md)
1214
- [The `ty` module: representing types](./ty.md)
1315
- [Type inference](./type-inference.md)
1416
- [Trait resolution](./trait-resolution.md)
1517
- [Type checking](./type-checking.md)
16-
- [MIR construction](./mir-construction.md)
17-
- [MIR borrowck](./mir-borrowck.md)
18-
- [MIR optimizations](./mir-optimizations.md)
18+
- [The MIR (Mid-level IR)](./mir.md)
19+
- [MIR construction](./mir-construction.md)
20+
- [MIR borrowck](./mir-borrowck.md)
21+
- [MIR optimizations](./mir-optimizations.md)
1922
- [trans: generating LLVM IR](./trans.md)
2023
- [Glossary](./glossary.md)

src/about-this-guide.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# About this guide
22

3-
This guide is meant to help document how rustc -- the Rust compiler --
3+
This guide is meant to help document how rustc the Rust compiler
44
works, as well as to help new contributors get involved in rustc
5-
development. It is not meant to replace code documentation -- each
6-
chapter gives only high-level details, the kinds of things that
5+
development. It is not meant to replace code documentation each
6+
chapter gives only high-level details the kinds of things that
77
(ideally) don't change frequently.
88

9-
The guide itself is of course open source as well, and the sources can
10-
be found at [the GitHub repository]. If you find any mistakes in the
11-
guide, please file an issue about it -- or, even better, open a PR
9+
The guide itself is of course open-source as well, and the sources can
10+
be found at the [GitHub repository]. If you find any mistakes in the
11+
guide, please file an issue about it, or even better, open a PR
1212
with a correction!
1313

14-
[the GitHub repository]: https://github.com/rust-lang-nursery/rustc-guide/
14+
[GitHub repository]: https://github.com/rust-lang-nursery/rustc-guide/

src/glossary.md

+10-9
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,24 @@ AST | the abstract syntax tree produced by the syntax crate
99
codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
1010
cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc.
1111
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
12-
HIR | the High-level IR, created by lowering and desugaring the AST. See `librustc/hir`.
12+
HIR | the High-level IR, created by lowering and desugaring the AST ([see more](hir.html))
1313
HirId | identifies a particular node in the HIR by combining a def-id with an "intra-definition offset".
14-
'gcx | the lifetime of the global arena (see `librustc/ty`).
14+
'gcx | the lifetime of the global arena ([see more](ty.html))
1515
generics | the set of generic type parameters defined on a type or item
1616
ICE | internal compiler error. When the compiler crashes.
1717
infcx | the inference context (see `librustc/infer`)
18-
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans. Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is found in `src/librustc_mir`.
19-
obligation | something that must be proven by the trait system; see `librustc/traits`.
18+
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
19+
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
2020
local crate | the crate currently being compiled.
2121
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
22-
query | perhaps some sub-computation during compilation; see `librustc/maps`.
23-
provider | the function that executes a query; see `librustc/maps`.
22+
query | perhaps some sub-computation during compilation ([see more](query.html))
23+
provider | the function that executes a query ([see more](query.html))
2424
sess | the compiler session, which stores global data used throughout compilation
2525
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
2626
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
2727
substs | the substitutions for a given generic type or item (e.g., the `i32`, `u32` in `HashMap<i32, u32>`)
28-
tcx | the "typing context", main data structure of the compiler (see `librustc/ty`).
28+
tcx | the "typing context", main data structure of the compiler ([see more](ty.html))
29+
'tcx | the lifetime of the currently active inference context ([see more](ty.html))
2930
trans | the code to translate MIR into LLVM IR.
30-
trait reference | a trait and values for its type parameters (see `librustc/ty`).
31-
ty | the internal representation of a type (see `librustc/ty`).
31+
trait reference | a trait and values for its type parameters ([see more](ty.html)).
32+
ty | the internal representation of a type ([see more](ty.html)).

src/high-level-overview.md

+1-6
Original file line numberDiff line numberDiff line change
@@ -135,9 +135,4 @@ take:
135135
6. **Linking**
136136
- Finally, those `.o` files are linked together.
137137

138-
139-
140-
141-
The first thing you may wonder if
142-
143-
[query model]: query.html
138+
[query model]: query.html

src/hir-lowering.md renamed to src/hir.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# HIR lowering
1+
# The HIR
22

33
The HIR -- "High-level IR" -- is the primary IR used in most of
44
rustc. It is a desugared version of the "abstract syntax tree" (AST)
@@ -116,4 +116,4 @@ associated with an **owner**, which is typically some kind of item
116116
(e.g., a `fn()` or `const`), but could also be a closure expression
117117
(e.g., `|x, y| x + y`). You can use the HIR map to find the body
118118
associated with a given def-id (`maybe_body_owned_by()`) or to find
119-
the owner of a body (`body_owner_def_id()`).
119+
the owner of a body (`body_owner_def_id()`).

src/how-to-build-and-run.md

+15-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# How to build the compiler and run what you built
22

33
The compiler is built using a tool called `x.py`. You will need to
4-
have Python installed to run it. But before we get to that, if you're going to
4+
have Python installed to run it. But before we get to that, if you're going to
55
be hacking on rustc, you'll want to tweak the configuration of the compiler. The default
66
configuration is oriented towards running the compiler as a user, not a developer.
77

@@ -48,6 +48,19 @@ use-jemalloc = false
4848

4949
### Running x.py and building a stage1 compiler
5050

51+
One thing to keep in mind is that `rustc` is a _bootstrapping_ compiler. That
52+
is, since `rustc` is written in Rust, we need to use an older version of the
53+
compiler to compile the newer version. In particular, the newer version of the
54+
compiler, `libstd`, and other tooling may use some unstable features
55+
internally. The result is the compiling `rustc` is done in stages.
56+
57+
- Stage 0: the current _beta_ compiler is compiled using the current _stable_ compiler.
58+
- Stage 1: the code in your clone is then compiled with the stage 0 compiler.
59+
- Stage 2: the code in your clone is then compiled with the stage 1 compiler (i.e. it builds itself).
60+
61+
For hacking, often building the stage 1 compiler is enough, but for testing and
62+
release, the stage 2 compiler is used.
63+
5164
Once you've created a config.toml, you are now ready to run
5265
`x.py`. There are a lot of options here, but let's start with what is
5366
probably the best "go to" command for building a local rust:
@@ -117,4 +130,4 @@ Here are a few other useful x.py commands. We'll cover some of them in detail in
117130
- `./x.py build` -- builds the stage2 compiler
118131
- Running tests (see the section [running tests](./running-tests.html) for more details):
119132
- `./x.py test --stage 1 src/libstd` -- runs the `#[test]` tests from libstd
120-
- `./x.py test --stage 1 src/test/run-pass` -- runs the `run-pass` test suite
133+
- `./x.py test --stage 1 src/test/run-pass` -- runs the `run-pass` test suite

src/incremental-compilation.md

+138
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Incremental compilation
2+
3+
The incremental compilation scheme is, in essence, a surprisingly
4+
simple extension to the overall query system. We'll start by describing
5+
a slightly simplified variant of the real thing – the "basic algorithm" – and then describe
6+
some possible improvements.
7+
8+
## The basic algorithm
9+
10+
The basic algorithm is
11+
called the **red-green** algorithm[^salsa]. The high-level idea is
12+
that, after each run of the compiler, we will save the results of all
13+
the queries that we do, as well as the **query DAG**. The
14+
**query DAG** is a [DAG] that indexes which queries executed which
15+
other queries. So, for example, there would be an edge from a query Q1
16+
to another query Q2 if computing Q1 required computing Q2 (note that
17+
because queries cannot depend on themselves, this results in a DAG and
18+
not a general graph).
19+
20+
[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
21+
22+
On the next run of the compiler, then, we can sometimes reuse these
23+
query results to avoid re-executing a query. We do this by assigning
24+
every query a **color**:
25+
26+
- If a query is colored **red**, that means that its result during
27+
this compilation has **changed** from the previous compilation.
28+
- If a query is colored **green**, that means that its result is
29+
the **same** as the previous compilation.
30+
31+
There are two key insights here:
32+
33+
- First, if all the inputs to query Q are colored green, then the
34+
query Q **must** result in the same value as last time and hence
35+
need not be re-executed (or else the compiler is not deterministic).
36+
- Second, even if some inputs to a query changes, it may be that it
37+
**still** produces the same result as the previous compilation. In
38+
particular, the query may only use part of its input.
39+
- Therefore, after executing a query, we always check whether it
40+
produced the same result as the previous time. **If it did,** we
41+
can still mark the query as green, and hence avoid re-executing
42+
dependent queries.
43+
44+
### The try-mark-green algorithm
45+
46+
At the core of incremental compilation is an algorithm called
47+
"try-mark-green". It has the job of determining the color of a given
48+
query Q (which must not have yet been executed). In cases where Q has
49+
red inputs, determining Q's color may involve re-executing Q so that
50+
we can compare its output, but if all of Q's inputs are green, then we
51+
can conclude that Q must be green without re-executing it or inspecting
52+
its value at all. In the compiler, this allows us to avoid
53+
deserializing the result from disk when we don't need it, and in fact
54+
enables us to sometimes skip *serializing* the result as well
55+
(see the refinements section below).
56+
57+
Try-mark-green works as follows:
58+
59+
- First check if the query Q was executed during the previous compilation.
60+
- If not, we can just re-execute the query as normal, and assign it the
61+
color of red.
62+
- If yes, then load the 'dependent queries' of Q.
63+
- If there is a saved result, then we load the `reads(Q)` vector from the
64+
query DAG. The "reads" is the set of queries that Q executed during
65+
its execution.
66+
- For each query R in `reads(Q)`, we recursively demand the color
67+
of R using try-mark-green.
68+
- Note: it is important that we visit each node in `reads(Q)` in same order
69+
as they occurred in the original compilation. See [the section on the query DAG below](#dag).
70+
- If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty.
71+
- We re-execute Q and compare the hash of its result to the hash of the result
72+
from the previous compilation.
73+
- If the hash has not changed, we can mark Q as **green** and return.
74+
- Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case,
75+
we can color Q as **green** and return.
76+
77+
<a name="dag">
78+
79+
### The query DAG
80+
81+
The query DAG code is stored in
82+
[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done
83+
by instrumenting the query execution.
84+
85+
One key point is that the query DAG also tracks ordering; that is, for
86+
each query Q, we not only track the queries that Q reads, we track the
87+
**order** in which they were read. This allows try-mark-green to walk
88+
those queries back in the same order. This is important because once a subquery comes back as red,
89+
we can no longer be sure that Q will continue along the same path as before.
90+
That is, imagine a query like this:
91+
92+
```rust,ignore
93+
fn main_query(tcx) {
94+
if tcx.subquery1() {
95+
tcx.subquery2()
96+
} else {
97+
tcx.subquery3()
98+
}
99+
}
100+
```
101+
102+
Now imagine that in the first compilation, `main_query` starts by
103+
executing `subquery1`, and this returns true. In that case, the next
104+
query `main_query` executes will be `subquery2`, and `subquery3` will
105+
not be executed at all.
106+
107+
But now imagine that in the **next** compilation, the input has
108+
changed such that `subquery1` returns **false**. In this case, `subquery2` would never
109+
execute. If try-mark-green were to visit `reads(main_query)` out of order,
110+
however, it might visit `subquery2` before `subquery1`, and hence execute it.
111+
This can lead to ICEs and other problems in the compiler.
112+
113+
[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
114+
115+
## Improvements to the basic algorithm
116+
117+
In the description basic algorithm, we said that at the end of
118+
compilation we would save the results of all the queries that were
119+
performed. In practice, this can be quite wasteful – many of those
120+
results are very cheap to recompute, and serializing and deserializing
121+
them is not a particular win. In practice, what we would do is to save
122+
**the hashes** of all the subqueries that we performed. Then, in select cases,
123+
we **also** save the results.
124+
125+
This is why the incremental algorithm separates computing the
126+
**color** of a node, which often does not require its value, from
127+
computing the **result** of a node. Computing the result is done via a simple algorithm
128+
like so:
129+
130+
- Check if a saved result for Q is available. If so, compute the color of Q.
131+
If Q is green, deserialize and return the saved result.
132+
- Otherwise, execute Q.
133+
- We can then compare the hash of the result and color Q as green if
134+
it did not change.
135+
136+
# Footnotes
137+
138+
[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis

0 commit comments

Comments
 (0)