Skip to content

Commit 2d3a197

Browse files
committed
comment various region-related things better
1 parent 7022ede commit 2d3a197

File tree

2 files changed

+60
-148
lines changed

2 files changed

+60
-148
lines changed

src/rustc/middle/region.rs

Lines changed: 16 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -1,134 +1,9 @@
1-
/*
1+
/*!
22
3-
Region resolution. This pass runs before typechecking and resolves region
4-
names to the appropriate block.
5-
6-
This seems to be as good a place as any to explain in detail how
7-
region naming, representation, and type check works.
8-
9-
### Naming and so forth
10-
11-
We really want regions to be very lightweight to use. Therefore,
12-
unlike other named things, the scopes for regions are not explicitly
13-
declared: instead, they are implicitly defined. Functions declare new
14-
scopes: if the function is not a bare function, then as always it
15-
inherits the names in scope from the outer scope. Within a function
16-
declaration, new names implicitly declare new region variables. Outside
17-
of function declarations, new names are illegal. To make this more
18-
concrete, here is an example:
19-
20-
fn foo(s: &a.S, t: &b.T) {
21-
let s1: &a.S = s; // a refers to the same a as in the decl
22-
let t1: &c.T = t; // illegal: cannot introduce new name here
23-
}
24-
25-
The code in this file is what actually handles resolving these names.
26-
It creates a couple of maps that map from the AST node representing a
27-
region ptr type to the resolved form of its region parameter. If new
28-
names are introduced where they shouldn't be, then an error is
29-
reported.
30-
31-
If regions are not given an explicit name, then the behavior depends
32-
a bit on the context. Within a function declaration, all unnamed regions
33-
are mapped to a single, anonymous parameter. That is, a function like:
34-
35-
fn foo(s: &S) -> &S { s }
36-
37-
is equivalent to a declaration like:
38-
39-
fn foo(s: &a.S) -> &a.S { s }
40-
41-
Within a function body or other non-binding context, an unnamed region
42-
reference is mapped to a fresh region variable whose value can be
43-
inferred as normal.
44-
45-
The resolved form of regions is `ty::region`. Before I can explain
46-
why this type is setup the way it is, I have to digress a little bit
47-
into some ill-explained type theory.
48-
49-
### Universal Quantification
50-
51-
Regions are more complex than type parameters because, unlike type
52-
parameters, they can be universally quantified within a type. To put
53-
it another way, you cannot (at least at the time of this writing) have
54-
a variable `x` of type `fn<T>(T) -> T`. You can have an *item* of
55-
type `fn<T>(T) -> T`, but whenever it is referenced within a method,
56-
that type parameter `T` is replaced with a concrete type *variable*
57-
`$T`. To make this more concrete, imagine this code:
58-
59-
fn identity<T>(x: T) -> T { x }
60-
let f = identity; // f has type fn($T) -> $T
61-
f(3u); // $T is bound to uint
62-
f(3); // Type error
63-
64-
You can see here that a type error will result because the type of `f`
65-
(as opposed to the type of `identity`) is not universally quantified
66-
over `$T`. That's fancy math speak for saying that the type variable
67-
`$T` refers to a specific type that may not yet be known, unlike the
68-
type parameter `T` which refers to some type which will never be
69-
known.
70-
71-
Anyway, regions work differently. If you have an item of type
72-
`fn(&a.T) -> &a.T` and you reference it, its type remains the same:
73-
only when the function *is called* is `&a` instantiated with a
74-
concrete region variable. This means you could call it twice and give
75-
different values for `&a` each time.
76-
77-
This more general form is possible for regions because they do not
78-
impact code generation. We do not need to monomorphize functions
79-
differently just because they contain region pointers. In fact, we
80-
don't really do *anything* differently.
81-
82-
### Representing regions; or, why do I care about all that?
83-
84-
The point of this discussion is that the representation of regions
85-
must distinguish between a *bound* reference to a region and a *free*
86-
reference. A bound reference is one which will be replaced with a
87-
fresh type variable when the function is called, like the type
88-
parameter `T` in `identity`. They can only appear within function
89-
types. A free reference is a region that may not yet be concretely
90-
known, like the variable `$T`.
91-
92-
To see why we must distinguish them carefully, consider this program:
93-
94-
fn item1(s: &a.S) {
95-
let choose = fn@(s1: &a.S) -> &a.S {
96-
if some_cond { s } else { s1 }
97-
};
98-
}
99-
100-
Here, the variable `s1: &a.S` that appears within the `fn@` is a free
101-
reference to `a`. That is, when you call `choose()`, you don't
102-
replace `&a` with a fresh region variable, but rather you expect `s1`
103-
to be in the same region as the parameter `s`.
104-
105-
But in this program, this is not the case at all:
106-
107-
fn item2() {
108-
let identity = fn@(s1: &a.S) -> &a.S { s1 };
109-
}
110-
111-
To distinguish between these two cases, `ty::region` contains two
112-
variants: `re_bound` and `re_free`. In `item1()`, the outer reference
113-
to `&a` would be `re_bound(rid_param("a", 0u))`, and the inner reference
114-
would be `re_free(rid_param("a", 0u))`. In `item2()`, the inner reference
115-
would be `re_bound(rid_param("a", 0u))`.
116-
117-
#### Implications for typeck
118-
119-
In typeck, whenever we call a function, we must go over and replace
120-
all references to `re_bound()` regions within its parameters with
121-
fresh type variables (we do not, however, replace bound regions within
122-
nested function types, as those nested functions have not yet been
123-
called).
124-
125-
Also, when we typecheck the *body* of an item, we must replace all
126-
`re_bound` references with `re_free` references. This means that the
127-
region in the type of the argument `s` in `item1()` *within `item1()`*
128-
is not `re_bound(re_param("a", 0u))` but rather `re_free(re_param("a",
129-
0u))`. This is because, for any particular *invocation of `item1()`*,
130-
`&a` will be bound to some specific region, and hence it is no longer
131-
bound.
3+
This file actually contains two passes related to regions. The first
4+
pass builds up the `region_map`, which describes the parent links in
5+
the region hierarchy. The second pass infers which types must be
6+
region parameterized.
1327
1338
*/
1349

@@ -153,10 +28,10 @@ type binding = {node_id: ast::node_id,
15328
name: ~str,
15429
br: ty::bound_region};
15530

156-
// Mapping from a block/expr/binding to the innermost scope that
157-
// bounds its lifetime. For a block/expression, this is the lifetime
158-
// in which it will be evaluated. For a binding, this is the lifetime
159-
// in which is in scope.
31+
/// Mapping from a block/expr/binding to the innermost scope that
32+
/// bounds its lifetime. For a block/expression, this is the lifetime
33+
/// in which it will be evaluated. For a binding, this is the lifetime
34+
/// in which is in scope.
16035
type region_map = hashmap<ast::node_id, ast::node_id>;
16136

16237
type ctxt = {
@@ -198,8 +73,8 @@ type ctxt = {
19873
parent: parent
19974
};
20075

201-
// Returns true if `subscope` is equal to or is lexically nested inside
202-
// `superscope` and false otherwise.
76+
/// Returns true if `subscope` is equal to or is lexically nested inside
77+
/// `superscope` and false otherwise.
20378
fn scope_contains(region_map: region_map, superscope: ast::node_id,
20479
subscope: ast::node_id) -> bool {
20580
let mut subscope = subscope;
@@ -212,6 +87,9 @@ fn scope_contains(region_map: region_map, superscope: ast::node_id,
21287
ret true;
21388
}
21489

90+
/// Finds the nearest common ancestor (if any) of two scopes. That
91+
/// is, finds the smallest scope which is greater than or equal to
92+
/// both `scope_a` and `scope_b`.
21593
fn nearest_common_ancestor(region_map: region_map, scope_a: ast::node_id,
21694
scope_b: ast::node_id) -> option<ast::node_id> {
21795

@@ -262,6 +140,7 @@ fn nearest_common_ancestor(region_map: region_map, scope_a: ast::node_id,
262140
}
263141
}
264142

143+
/// Extracts that current parent from cx, failing if there is none.
265144
fn parent_id(cx: ctxt, span: span) -> ast::node_id {
266145
alt cx.parent {
267146
none {
@@ -273,6 +152,7 @@ fn parent_id(cx: ctxt, span: span) -> ast::node_id {
273152
}
274153
}
275154

155+
/// Records the current parent (if any) as the parent of `child_id`.
276156
fn record_parent(cx: ctxt, child_id: ast::node_id) {
277157
alt cx.parent {
278158
none { /* no-op */ }

src/rustc/middle/ty.rs

Lines changed: 44 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,13 @@ enum closure_kind {
307307
ck_uniq,
308308
}
309309

310+
/// Innards of a function type:
311+
///
312+
/// - `purity` is the function's effect (pure, impure, unsafe).
313+
/// - `proto` is the protocol (fn@, fn~, etc).
314+
/// - `inputs` is the list of arguments and their modes.
315+
/// - `output` is the return type.
316+
/// - `ret_style`indicates whether the function returns a value or fails.
310317
type fn_ty = {purity: ast::purity,
311318
proto: ast::proto,
312319
inputs: ~[arg],
@@ -315,13 +322,32 @@ type fn_ty = {purity: ast::purity,
315322

316323
type param_ty = {idx: uint, def_id: def_id};
317324

318-
// See discussion at head of region.rs
325+
/// Representation of regions:
319326
enum region {
327+
/// Bound regions are found (primarily) in function types. They indicate
328+
/// region parameters that have yet to be replaced with actual regions
329+
/// (analogous to type parameters, except that due to the monomorphic
330+
/// nature of our type system, bound type parameters are always replaced
331+
/// with fresh type variables whenever an item is referenced, so type
332+
/// parameters only appear "free" in types. Regions in contrast can
333+
/// appear free or bound.). When a function is called, all bound regions
334+
/// tied to that function's node-id are replaced with fresh region
335+
/// variables whose value is then inferred.
320336
re_bound(bound_region),
337+
338+
/// When checking a function body, the types of all arguments and so forth
339+
/// that refer to bound region parameters are modified to refer to free
340+
/// region parameters.
321341
re_free(node_id, bound_region),
342+
343+
/// A concrete region naming some expression within the current function.
322344
re_scope(node_id),
345+
346+
/// Static data that has an "infinite" lifetime.
347+
re_static,
348+
349+
/// A region variable. Should not exist after typeck.
323350
re_var(region_vid),
324-
re_static // effectively `top` in the region lattice
325351
}
326352

327353
enum bound_region {
@@ -332,16 +358,22 @@ enum bound_region {
332358

333359
type opt_region = option<region>;
334360

335-
// The type substs represents the kinds of things that can be substituted into
336-
// a type. There may be at most one region parameter (self_r), along with
337-
// some number of type parameters (tps).
338-
//
339-
// The region parameter is present on nominative types (enums, resources,
340-
// classes) that are declared as having a region parameter. If the type is
341-
// declared as `enum foo&`, then self_r should always be non-none. If the
342-
// type is declared as `enum foo`, then self_r will always be none. In the
343-
// latter case, typeck::ast_ty_to_ty() will reject any references to `&T` or
344-
// `&self.T` within the type and report an error.
361+
/// The type substs represents the kinds of things that can be substituted to
362+
/// convert a polytype into a monotype. Note however that substituting bound
363+
/// regions other than `self` is done through a different mechanism.
364+
///
365+
/// `tps` represents the type parameters in scope. They are indexed according
366+
/// to the order in which they were declared.
367+
///
368+
/// `self_r` indicates the region parameter `self` that is present on nominal
369+
/// types (enums, classes) declared as having a region parameter. `self_r`
370+
/// should always be none for types that are not region-parameterized and
371+
/// some(_) for types that are. The only bound region parameter that should
372+
/// appear within a region-parameterized type is `self`.
373+
///
374+
/// `self_ty` is the type to which `self` should be remapped, if any. The
375+
/// `self` type is rather funny in that it can only appear on interfaces and
376+
/// is always substituted away to the implementing type for an interface.
345377
type substs = {
346378
self_r: opt_region,
347379
self_ty: option<ty::t>,

0 commit comments

Comments
 (0)