Skip to content

Don't try to forge provenance from a nil pointer #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 19, 2022
Merged

Don't try to forge provenance from a nil pointer #23

merged 1 commit into from
Oct 19, 2022

Conversation

dancrossnyc
Copy link
Collaborator

As pointed out by Ben Kimock in issue #22, NULL carries no provenance, and attempting to forge a dereferenceable pointer from NULL is UB.

Remove that paradigm from phbl by applying different techniques:

  1. Where we just need a pointer to get its address and nothing else, core::ptr::invalid serves nicely.
  2. When we want to get a pointer from a freshly-mapped address, as when loading the kernel, ask the page table to use its provenance to create the pointer. Indeed, the page table can even validate that it properly maps the given address.
  3. For early UART initialization, introduce an MMIO_BASE symbol in assembly language that we can leverage to get a provenance-providing pointer.

I believe these changes together address address the UB issue, but would appreciate @saethlin's input. Thank you again!

Fixes: #22
Signed-off-by: Dan Cross [email protected]

@dancrossnyc
Copy link
Collaborator Author

FYI, tested on Gimlet.

src/mmu.rs Outdated
return Err(Error::BadPointer);
}
let base = self as *const _ as *const u8;
let ptr = base.with_addr(va);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use from_exposed_addr instead. The provenance of base covers the page table itself, not whatever it maps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/uart.rs Outdated
Comment on lines 444 to 446
let base = crate::phbl::mmio_base();
let config = base.with_addr(self.addr()) as *mut ConfigMmio;
let uart = unsafe { &mut *config };
Copy link

@saethlin saethlin Oct 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I shift MMIO_BASE from an extern static to just a static and slap on a fn main with calls to init() and entry, Miri complains about UB here (that it even gets here is almost shocking, considering how much of this codebase is things Miri doesn't understand).

I believe this is still UB, but for a slightly more subtle reason: Instead of having no provenance at all, this pointer now has provenance, but the allocation it corresponds to has zero size. To the best of my understanding, an extern static is initialized by something else and you can set its address externally, but it still behaves like the type that it is. And in this case, since it is a zero-length array, it is a zero-sized type, and the pointer is permitted to access zero bytes. So this reborrow is still no good, because the referent type is not zero-sized (references must be safe to access through (ish)).

I agree with bjorn3 that the morally right thing to do here is call from_exposed_addr. I just put up a PR which adjusts the docs to be clear that this use of that function is valid: rust-lang/rust#103106 we will see what Ralf and Aria think.

If you really really want to not call that function or do an int-to-ptr cast (transmute is the same as ptr::invalid), you can probably set up a big enough allocation by having an extern static whose extent covers all the addresses you'll need to touch through this pointer (and pointers derived from it). MaybeUninit<u8> is probably a reasonable element type. Being about 6 miles out of my depth I have no idea if this is isn't an option because you need to access a huge range of addresses and you can't tolerate a huge array (if I had your equipment I'd probably try this out, might at least fail in an interesting way).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I shift MMIO_BASE from an extern static to just a static and slap on a fn main with calls to init() and entry, Miri complains about UB here (that it even gets here is almost shocking, considering how much of this codebase is things Miri doesn't understand).

I believe this is still UB, but for a slightly more subtle reason: Instead of having no provenance at all, this pointer now has provenance, but the allocation it corresponds to has zero size. To the best of my understanding, an extern static is initialized by something else and you can set its address externally, but it still behaves like the type that it is. And in this case, since it is a zero-length array, it is a zero-sized type, and the pointer is permitted to access zero bytes. So this reborrow is still no good, because the referent type is not zero-sized (references must be safe to access through (ish)).

I agree with bjorn3 that the morally right thing to do here is call from_exposed_addr. I just put up a PR which adjusts the docs to be clear that this use of that function is valid: rust-lang/rust#103106 we will see what Ralf and Aria think.

If you really really want to not call that function or do an int-to-ptr cast (transmute is the same as ptr::invalid), you can probably set up a big enough allocation by having an extern static whose extent covers all the addresses you'll need to touch through this pointer (and pointers derived from it). MaybeUninit<u8> is probably a reasonable element type. Being about 6 miles out of my depth I have no idea if this is isn't an option because you need to access a huge range of addresses and you can't tolerate a huge array (if I had your equipment I'd probably try this out, might at least fail in an interesting way).

I have no moral objection to from_exposed_addr. :-) The language in the docs about having previously "exposed" provenance kind of scared me; in particular, in the early UART code, there is no suitable pointer from which to expose such provenance, and I thought if I could be sufficiently clever I could just avoid the whole issue, but it seems my shenanigans were just wrong. I think my mental model here is just lacking: it seems provenance attaches itself to spans of memory (for lack of a better term), not simply pointers.

I was trying to remain cognizant of your footnote in the initial issue, but wasn't exactly sure what you meant. If Ralf and Aria agree with the change you sent, however, that more than satisfies my ignorant concerns, and it sounds like that's already the consensus from behind the scenes. It's not quite clear to me that, "I'm the kernel and I just mapped this address, so this pointer I created into the newly mapped region definitely points to something" case is explicitly covered, but I would really love it to be: I think I speak for all kernel programmers when I say we'd really like to be able to work with the language to make some guarantees about the weird stuff we do from time to time (synthesizing addresses based on external knowledge of the machine or processes or whatever that's not easily representable in programmatic form), and one of the reasons I started working with Rust in this domain was because C was becoming increasingly hostile to that kind of thing.

As for testing this out....A problem with MMIO stuff in general, is that very often you just don't know where the MMIO regions are until you start running; PCI BARs are often set by firmware, for instance (though this is not true on Oxide machines) and their address regions in the physical address space don't "exist" until set my software, and are therefore not knowable at compile time (let along where they get mapped to in the virtual address space, which is entirely at the kernel's discretion at run time, and while this program is a deliberately simple loader, it is isomorphic to a kernel in this sense). Regardless, testing can probably be done on just about any machine. We happen to know where this UART is because we built the system, but one may be able to simulate something similar by playing around with mmap to reserve a large chunk of address space and then allocating something real in the middle of that at an explicit virtual address.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you haven't, you should read this: https://faultlore.com/blah/tower-of-weakenings/

I think the root cause of this conversation and the situation with the docs is that in Rust generally there is a whole lot of interest/excitement about the top of the tower because things up there are much easier: They easier to formally reason about, easeir to write optimizations for, and easier to check in Miri (specifically, they are checkable at all). But as soon as you talk about MMIO you can't operate at this layer of the tower.

I generally see this kind of code referred to as accessing memory outside the Abstract Machine. That's because aliasing optimizations are concerned with doing creative things with memory which is allocated by Rust. If the memory comes from somewhere else, the issue of provenance as a means to justify optimizations (which is its only purpose) is kind of void, so long as you don't definitively produce the wrong provenance for the accesses.

So yeah in general Rust is committed to making this kind of code work. There's just precious few people working on specifying exactly how, so progress is slow. And I'm not one of them, my contribution in this area is trying to make Miri better. Eventually I hope it'll be able to do MMIO and execute some inline assembly.

If you or anyone else wants to follow behind-the-scenes, the best places are the Zulip: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines and the repo: https://github.com/rust-lang/unsafe-code-guidelines (it's misnamed guidelines instead of discussions). Things can move fast at times so I wouldn't blame anyone who has better things to do.

As pointed out by Ben Kimock in issue #22, NULL carries no
provenance, and attempting to forge a dereferenceable pointer
from NULL is UB.

Remove that paradigm from `phbl` by applying different
techniques:

1. Where we just need a pointer to get its address and nothing
   else, `core::ptr::invalid` serves nicely.
2. When we want to get a pointer from a freshly-mapped address,
   as when loading the kernel, ask the page table to validate
   the pointer, and then use `core::ptr::from_exposed_addr`.
   Indeed, the page table can even validate that it properly
   maps the given address, and we forge provenance.
3. For early UART initialization, just use `from_exposed_addr`
   directly: we have no real way to validate the address
   before we do so, but an invariant is that we enter Rust code
   with the MMIO region properly mapped.

I believe these changes together address address the UB issue,
but would appreciate @saethlin's and @bjorn3's input.

Thank you again!

Fixes: #22
Signed-off-by: Dan Cross <[email protected]>
@dancrossnyc dancrossnyc requested review from saethlin and bjorn3 and removed request for saethlin and bjorn3 October 17, 2022 13:44
Comment on lines +444 to +445
let regs = core::ptr::from_exposed_addr_mut::<ConfigMmio>(self.addr());
let uart = unsafe { &mut *regs };
Copy link

@saethlin saethlin Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer from from_exposed_addr(something.addr()) would be UB to deref if this were an address inside the AM. And this pattern in general is very lintable and trips the linter between my ears. I'd suggest you change if it I could come up with a better name instead of addr, but just FYI at some point a new nightly clippy might be upset by this.

And maybe soon we will have offset_of! in order to do this address computation for the write_volatile in a way that for sure doesn't have the aliasing implications of &mut. I hope.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oophf, sorry, I've been trying to reply for two days but have been a bit under the weather and haven't gotten it done.

If I'm interpreting this correctly, this is probably fine for now because the MMIO addresses for the UART fall outside of the AM, caveat that at some point things might be less happy with the overall pattern. This seems like something to revisit at some point; perhaps using something like core::ptr::addr_of_mut!?

I remain curious about what to do about memory that's been freshly mapped. It occurs to me that this can probably be approximated in a hosted environment by playing around with mmap.

@dancrossnyc
Copy link
Collaborator Author

I'm going to go ahead and submit this now, but probably follow-up with some other things.

@dancrossnyc dancrossnyc merged commit 0d20339 into main Oct 19, 2022
@dancrossnyc dancrossnyc deleted the ub branch October 19, 2022 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ptr::null{_mut} + with_addr + dereference -> UB
3 participants