Skip to content

Improve gimlet qspi error reporting #2071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

labbott
Copy link
Collaborator

@labbott labbott commented May 19, 2025

We currently ignore any errors when working with the QSPI driver. The biggest place this is noticable is if we happen to give a bad address for reading we will loop forever. Fix this up to actually handle errors.

@labbott labbott requested review from mkeeter, hawkw and cbiffle May 19, 2025 16:41
self.poll_for_write_complete(None);
self.qspi
.page_program(addr, data)
.map_err(|x| qspi_to_hf(x))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can write this without the lambda, e.g. map_err(qspi_to_hf)

(same note applies in a bunch of other places in the PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thought (take it or leave it) is that there are a number of places where we call the same method on self.qspi (in particular, read_status) and then have to add the map_err; if we wanted to reduce the repetition of that, we could consider adding wrappers like:

impl ServerImpl {
    fn read_status(&self) -> Result<u8, HfError> {
         self.qspi.read_status().map_err(qspi_to_hf)
    }

   // ... perhaps other wrappers as well?
}

to avoid having to write .map_err(qspi_to_hf) quite as much?

i'm not sure if this is actually worth the effort or not, but i figured it was worth suggesting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the suggestion for read_status. Maybe it should apply elsehwere?

Comment on lines 121 to 128
// If we can't read the id there's a good chance nothing else is going to
// work. `panic` would probably just be a crash loop. looping forever
// is slightly more polite.
loop {
// We are dead now.
hl::sleep_for(1000);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsewhere, we have tasks that die permanently by calling sys_recv_notification with an empty notification mask, like this:

// All these moments will be lost in time, like tears in rain...
// Time to die.
loop {
// Sleeping with all bits in the notification mask clear means
// we should never be notified --- and if one never wakes up,
// the difference between sleeping and dying seems kind of
// irrelevant. But, `rustc` doesn't realize that this should
// never return, we'll stick it in a `loop` anyway so the main
// function can return `!`
sys_recv_notification(0);
}

i think this is a bit nicer as we don't get periodically woken up by the timer, remember that we are just looping on it, and go back to sleep. not that this case is particularly worth optimizing for, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is there, perhaps, something we ought to do to indicate that we are in a permanently-bused state?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we would benefit from a LIKELY_VERY_VERY_DEAD state

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Humility shows tasks in sys_recv_notification(0) as (DEAD), so there's precedent for treating it as a canonically dead state.

@@ -61,6 +61,14 @@ struct Config {
pub clock: u8,
}

// There isn't a great crate to do `From` implementation so do this manually
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i suppose the alternative is giving drv-hf-api a dependency on drv_stm32h7_qspi? which, yeah, seems sad...

self.poll_for_write_complete(None);
self.qspi
.page_program(addr, data)
.map_err(|x| qspi_to_hf(x))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thought (take it or leave it) is that there are a number of places where we call the same method on self.qspi (in particular, read_status) and then have to add the map_err; if we wanted to reduce the repetition of that, we could consider adding wrappers like:

impl ServerImpl {
    fn read_status(&self) -> Result<u8, HfError> {
         self.qspi.read_status().map_err(qspi_to_hf)
    }

   // ... perhaps other wrappers as well?
}

to avoid having to write .map_err(qspi_to_hf) quite as much?

i'm not sure if this is actually worth the effort or not, but i figured it was worth suggesting.

@@ -258,6 +282,18 @@ impl Qspi {
// perform transfers.
let mut out = out;
while !out.is_empty() {
if self.reg.sr.read().tof().bit_is_set() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reading self.reg.sr just once, then checking tof, tef, and flevel (below) on that value?

@@ -61,6 +61,14 @@ struct Config {
pub clock: u8,
}

// There isn't a great crate to do `From` implementation so do this manually
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight bias towards making drv-hf-api depend on drv-stm32h7-qspi, so that it can define the From implementation, but I don't feel strongly about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the downside of that is that it probably makes drv-hf-api not compile for target MCUs other than STM32H7. But, we don't currently have any reason to implement this API on other MCUs, so 🤷‍♀️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also use drv-hf-api on cosmo which is still an STM32H7 but does not use drv-stm32h7-qspi for host flash so it felt a little weird to add that there. I don't love any option here tbh.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Industrial Strength version would be to add a stm32 feature to drv-hf-api which then brings in the QSPI crate and adds the From implementation, but that's Even More Shenanigans 🤷🏻

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively QspiError could go in some kinda qspi-types crate or something that both the drv-stm32h7-qspi and drv-hf-api crates depend on. but...that's a Lot...

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Comment on lines 205 to 225
if sr.tof().bit_is_set() {
self.reg.fcr.modify(|_, w| w.ctof().set_bit());
self.disable_all_interrupts();
return Err(QspiError::Timeout);
}
if sr.tef().bit_is_set() {
self.reg.fcr.modify(|_, w| w.ctef().set_bit());
self.disable_all_interrupts();
return Err(QspiError::TransferError);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that, rather than having all the error paths have to remember to explicitly disable IRQs, we could maybe wrap this in an inner function that we call and then match the Result from, so that there's one place where we clear the IRQs regardless of how we exited from the function. Not sure if this is worth the effort unless there's a possibility of adding additional error paths here in the future.

Comment on lines 323 to 328
self.disable_all_interrupts();
return Err(QspiError::Timeout);
}
if sr.tef().bit_is_set() {
self.reg.fcr.modify(|_, w| w.ctef().set_bit());
self.disable_all_interrupts();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, is it worth refactoring this so that the error path disables interrupts in one place instead of multiple?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this a shot (and I deeply appreciate you having an eye for things like this!)

We currently ignore any errors when working with the QSPI driver.
The biggest place this is noticable is if we happen to give a
bad address for reading we will loop forever. Fix this up to
actually handle errors.
@labbott labbott force-pushed the improve_gimlet_hf_errors branch from 2eecf2c to 91e2f6e Compare May 21, 2025 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants