Improve gimlet qspi error reporting #2071

labbott · 2025-05-19T16:41:23Z

We currently ignore any errors when working with the QSPI driver. The biggest place this is noticable is if we happen to give a bad address for reading we will loop forever. Fix this up to actually handle errors.

mkeeter · 2025-05-19T16:56:06Z

drv/gimlet-hf-server/src/main.rs

-        self.poll_for_write_complete(None);
+        self.qspi
+            .page_program(addr, data)
+            .map_err(|x| qspi_to_hf(x))?;


Nit: you can write this without the lambda, e.g. map_err(qspi_to_hf)

(same note applies in a bunch of other places in the PR)

another thought (take it or leave it) is that there are a number of places where we call the same method on self.qspi (in particular, read_status) and then have to add the map_err; if we wanted to reduce the repetition of that, we could consider adding wrappers like:

impl ServerImpl { fn read_status(&self) -> Result<u8, HfError> { self.qspi.read_status().map_err(qspi_to_hf) } // ... perhaps other wrappers as well? }

to avoid having to write .map_err(qspi_to_hf) quite as much?

i'm not sure if this is actually worth the effort or not, but i figured it was worth suggesting.

I took the suggestion for read_status. Maybe it should apply elsehwere?

drv/gimlet-hf-server/src/main.rs

drv/stm32h7-qspi/src/lib.rs

hawkw · 2025-05-19T17:05:00Z

drv/gimlet-hf-server/src/main.rs

+            // If we can't read the id there's a good chance nothing else is going to
+            // work. `panic` would probably just be a crash loop. looping forever
+            // is slightly more polite.
+            loop {
+                // We are dead now.
+                hl::sleep_for(1000);
+            }


elsewhere, we have tasks that die permanently by calling sys_recv_notification with an empty notification mask, like this:

hubris/drv/gimlet-seq-server/src/main.rs

Lines 174 to 184 in 7a24b88

// All these moments will be lost in time, like tears in rain...

// Time to die.

loop {

// Sleeping with all bits in the notification mask clear means

// we should never be notified --- and if one never wakes up,

// the difference between sleeping and dying seems kind of

// irrelevant. But, `rustc` doesn't realize that this should

// never return, we'll stick it in a `loop` anyway so the main

// function can return `!`

sys_recv_notification(0);

}

i think this is a bit nicer as we don't get periodically woken up by the timer, remember that we are just looping on it, and go back to sleep. not that this case is particularly worth optimizing for, though.

also, is there, perhaps, something we ought to do to indicate that we are in a permanently-bused state?

I do think we would benefit from a LIKELY_VERY_VERY_DEAD state

IIRC Humility shows tasks in sys_recv_notification(0) as (DEAD), so there's precedent for treating it as a canonically dead state.

hawkw · 2025-05-19T17:08:13Z

drv/gimlet-hf-server/src/main.rs

@@ -61,6 +61,14 @@ struct Config {
    pub clock: u8,
 }

+// There isn't a great crate to do `From` implementation so do this manually


ah, i suppose the alternative is giving drv-hf-api a dependency on drv_stm32h7_qspi? which, yeah, seems sad...

hawkw · 2025-05-19T17:11:51Z

drv/gimlet-hf-server/src/main.rs

-        self.poll_for_write_complete(None);
+        self.qspi
+            .page_program(addr, data)
+            .map_err(|x| qspi_to_hf(x))?;


another thought (take it or leave it) is that there are a number of places where we call the same method on self.qspi (in particular, read_status) and then have to add the map_err; if we wanted to reduce the repetition of that, we could consider adding wrappers like:

impl ServerImpl { fn read_status(&self) -> Result<u8, HfError> { self.qspi.read_status().map_err(qspi_to_hf) } // ... perhaps other wrappers as well? }

to avoid having to write .map_err(qspi_to_hf) quite as much?

i'm not sure if this is actually worth the effort or not, but i figured it was worth suggesting.

mkeeter · 2025-05-19T17:13:13Z

drv/stm32h7-qspi/src/lib.rs

@@ -258,6 +282,18 @@ impl Qspi {
        // perform transfers.
        let mut out = out;
        while !out.is_empty() {
+            if self.reg.sr.read().tof().bit_is_set() {


How about reading self.reg.sr just once, then checking tof, tef, and flevel (below) on that value?

mkeeter · 2025-05-19T17:15:18Z

drv/gimlet-hf-server/src/main.rs

@@ -61,6 +61,14 @@ struct Config {
    pub clock: u8,
 }

+// There isn't a great crate to do `From` implementation so do this manually


I have a slight bias towards making drv-hf-api depend on drv-stm32h7-qspi, so that it can define the From implementation, but I don't feel strongly about it.

I suppose the downside of that is that it probably makes drv-hf-api not compile for target MCUs other than STM32H7. But, we don't currently have any reason to implement this API on other MCUs, so 🤷‍♀️

We also use drv-hf-api on cosmo which is still an STM32H7 but does not use drv-stm32h7-qspi for host flash so it felt a little weird to add that there. I don't love any option here tbh.

The Industrial Strength version would be to add a stm32 feature to drv-hf-api which then brings in the QSPI crate and adds the From implementation, but that's Even More Shenanigans 🤷🏻

alternatively QspiError could go in some kinda qspi-types crate or something that both the drv-stm32h7-qspi and drv-hf-api crates depend on. but...that's a Lot...

drv/auxflash-server/src/main.rs

drv/stm32h7-qspi/src/lib.rs

hawkw

Looks good to me!

drv/stm32h7-qspi/src/lib.rs

hawkw · 2025-05-20T22:42:10Z

drv/stm32h7-qspi/src/lib.rs

+            if sr.tof().bit_is_set() {
+                self.reg.fcr.modify(|_, w| w.ctof().set_bit());
+                self.disable_all_interrupts();
+                return Err(QspiError::Timeout);
+            }
+            if sr.tef().bit_is_set() {
+                self.reg.fcr.modify(|_, w| w.ctef().set_bit());
+                self.disable_all_interrupts();
+                return Err(QspiError::TransferError);
+            }


It occurs to me that, rather than having all the error paths have to remember to explicitly disable IRQs, we could maybe wrap this in an inner function that we call and then match the Result from, so that there's one place where we clear the IRQs regardless of how we exited from the function. Not sure if this is worth the effort unless there's a possibility of adding additional error paths here in the future.

hawkw · 2025-05-20T22:42:52Z

drv/stm32h7-qspi/src/lib.rs

+                self.disable_all_interrupts();
+                return Err(QspiError::Timeout);
+            }
+            if sr.tef().bit_is_set() {
+                self.reg.fcr.modify(|_, w| w.ctef().set_bit());
+                self.disable_all_interrupts();


similarly, is it worth refactoring this so that the error path disables interrupts in one place instead of multiple?

I'll give this a shot (and I deeply appreciate you having an eye for things like this!)

We currently ignore any errors when working with the QSPI driver. The biggest place this is noticable is if we happen to give a bad address for reading we will loop forever. Fix this up to actually handle errors.

labbott requested review from mkeeter, hawkw and cbiffle May 19, 2025 16:41

mkeeter reviewed May 19, 2025

View reviewed changes

drv/gimlet-hf-server/src/main.rs Outdated Show resolved Hide resolved

mkeeter reviewed May 19, 2025

View reviewed changes

drv/stm32h7-qspi/src/lib.rs Show resolved Hide resolved

hawkw reviewed May 19, 2025

View reviewed changes

mkeeter reviewed May 19, 2025

View reviewed changes

mkeeter reviewed May 20, 2025

View reviewed changes

drv/auxflash-server/src/main.rs Outdated Show resolved Hide resolved

mkeeter reviewed May 20, 2025

View reviewed changes

drv/stm32h7-qspi/src/lib.rs Show resolved Hide resolved

mkeeter approved these changes May 20, 2025

View reviewed changes

hawkw approved these changes May 20, 2025

View reviewed changes

Improve gimlet qspi error reporting

91e2f6e

We currently ignore any errors when working with the QSPI driver. The biggest place this is noticable is if we happen to give a bad address for reading we will loop forever. Fix this up to actually handle errors.

labbott force-pushed the improve_gimlet_hf_errors branch from 2eecf2c to 91e2f6e Compare May 21, 2025 14:52

	// All these moments will be lost in time, like tears in rain...
	// Time to die.
	loop {
	// Sleeping with all bits in the notification mask clear means
	// we should never be notified --- and if one never wakes up,
	// the difference between sleeping and dying seems kind of
	// irrelevant. But, `rustc` doesn't realize that this should
	// never return, we'll stick it in a `loop` anyway so the main
	// function can return `!`
	sys_recv_notification(0);
	}

Improve gimlet qspi error reporting #2071

Are you sure you want to change the base?

Improve gimlet qspi error reporting #2071

Uh oh!

Conversation

labbott commented May 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!