Skip to content

Add the Static<T> type to make global mutable state easier #103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Add the Static<T> type to make global mutable state easier #103

wants to merge 3 commits into from

Conversation

Sp00ph
Copy link
Contributor

@Sp00ph Sp00ph commented Jan 27, 2021

This is useful for example when setting a flag inside an interrupt handler to then execute something later. This was previously only possible using static mut variables, which can't be accessed safely and can lead to memory unsoundness like reference invalidation.

@Lokathor
Copy link
Member

I seem to recall investigating this a bit previously.

I'll do a full check later, but isn't the main advantage over Cell that we can mark it as Sync so that it can be used in a static value?

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 27, 2021

Yes tbh that's basically the only difference but I had to reimplement the API because I can't just unsafe impl<T> Sync for Cell<T>. If you look at the code you'll see it's really just a wrapper around Cell

@Lokathor
Copy link
Member

I think there also need to be compiler fences all over the place if this is to be shared with the interrupt handler and main program.

I'll try to find my notes when i get home tonight.

This is necessary to prevent the compiler from introducing optimizations
that would break control flow when using `Static<T>` from within the
irq handler
@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 30, 2021

Ok so I found that in a quick little example program I wrote the compiler actually seemed to break the program when using a Static<T> from within the irq handler. I rewrote the implementation so now it uses UnsafeCell<T> and only does volatile reads/writes to change the value, which should fix the unwanted compiler optimizations

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 30, 2021

Ok so I wrote a little example program to show the bug that this change fixed:

#![no_std]
#![feature(start)]
#![forbid(unsafe_code)]

use gba::{
    fatal,
    io::{
        display::{DisplayStatusSetting, DISPSTAT},
        irq::{set_irq_handler, IrqEnableSetting, IrqFlags, BIOS_IF, IE, IME},
    },
    Static,
};

#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
    fatal!("{}", info);
    loop {}
}

static SHOULD_UPDATE: Static<bool> = Static::new(false);

extern "C" fn irq_handler(flags: IrqFlags) {
    static VBLANK: Static<u16> = Static::new(0);
    if flags.vblank() {
        BIOS_IF.write(BIOS_IF.read().with_vblank(true));
        if VBLANK.update(|n| n + 1) == 10 {
            gba::warn!("irq says should update");
            VBLANK.set(0);
            SHOULD_UPDATE.set(true);
        }
    }
}

#[start]
fn main(_: isize, _: *const *const u8) -> isize {
    IME.write(IrqEnableSetting::IRQ_YES);
    IE.write(IrqFlags::new().with_vblank(true));
    
    DISPSTAT.write(DisplayStatusSetting::new().with_vblank_irq_enable(true));
    
    set_irq_handler(irq_handler);

    loop {
        while !SHOULD_UPDATE.get() {}
        SHOULD_UPDATE.set(false);
        gba::warn!("main says should update")
    }
}

If you run this inside mGBA with

gba = { git = "https://github.com/Sp00ph/gba", rev = "54ba8e5"}

as a dependency, which is the commit without the fix, the logs will only show irq says should update.
If you run it using

gba = { git = "https://github.com/Sp00ph/gba"}

which has the fix, the logs will say both irq says should update and main says should update.

@Lokathor
Copy link
Member

This still needs compiler fences if it's being used as a way to communicate between main and interrupt, and needs to enforce that they're always placed into the code, so I don't think that you can allow a &mut T or *mut T to "escape" the methods provided, because then the user could do reads and writes that aren't fenced.

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 30, 2021

What exactly does this fencing look like? I assume it would be possible to return a VolAddress<T> from Static::<T>::as_ptr?
As for the get_mut method, would it maybe be possible to provide something among the lines of

pub fn with<U, F: FnOnce(&mut T) -> U>(&mut self, f: F) -> U

where any fencing operations are handled in the with method body before control is given to the user? This would make it impossible to have the &mut T escape the Static<T>.

@Lokathor
Copy link
Member

https://doc.rust-lang.org/beta/core/sync/atomic/fn.compiler_fence.html

you gotta call this guy once after each write to the location, and i think possibly after each read, so that standard memory accesses can't be moved to before the volatile memory access.

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 30, 2021

So does that mean that read_volatile and write_volatile aren't enough to guarantee correct codegen? Because in that case, would it even be correct to read from/write to a VolAddress inside the irq handler? Also, what Ordering do I need to give the fence function? I tried to look into how Orderings work but I'm kinda lost on that front

@Lokathor
Copy link
Member

So the thing is that volatile accesses are only ordered relative to other volatile accesses. Standard accesses can be moved to before or after a volatile access if the optimizer thinks it's a good idea.

The compiler fence isn't an actual assembly operation, it's just a sign for rustc/llvm that it can't move any operations across the fence during optimizations. I'm not an expert, but this is what I've been told by others who do know what they're doing. Theoretically a non-volatile op could be moved around a volatile op and then if your stack data and global data were supposed to stay in sync, the interrupt could witness a moment of desync. And this is deeply unlikely and probably not a big deal, but we should still prevent such problems entirely if we can.

As for ordering, SeqCst is kinda the "Default" to pick if you don't want to think about it too hard. It's never wrong, but sometimes it blocks a potential optimization here and there, so it might be slower in some situations.

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 30, 2021

I'm not sure I understand this quite yet. If I remove the as_ptr and the get_mut methods from Static<T> and just put a fence(Ordering::SeqCst) after every read_volatile/write_volatile, will that be enough to guarantee correct behavior?

@Lokathor
Copy link
Member

tentative "yes". I'll try to sit down and have a careful look when I'm not on my phone, but that should basically be all you need to do.

There's also the issue that if the value isn't 1, 2, or 4 bytes big then it can't be written as a single instruction and so you could get an interrupt as the write is happening and see a tear.

What I did to handle that was just make it explicitly always a u32 value, and then the user can cast to/from u32 on their own (eg word_cell.write(my_byte as u32);. I was imagining that you'd only have maybe one or two of these things per program, and so the low ergonomics of manually converting wouldn't be a big problem.

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 31, 2021

Ok so what about the following idea:

  • For all the integer types that fit in a word we make a static version (StaticBool, StaticI8, StaticU8, ..., StaticUsize).
    These can have the same API that Static<T> currently has, except we remove as_ptr and get_mut
  • We make a Static<T> type, which carries a boolean flag to indicate whether it's currently written to. We set that flag at the start of writing to the inner value and reset it after we're done. We provide a try_get method, which before copying the inner data checks that the value isn't being written to. That way, when the irq handler gets called while the value is being written to, you can't access any teared data. We'd also need a try_set method for the same reason.

This idea has the advantage that for the primitives, there should be basically no overhead over using a static mut variable. Also, Static<T> can be used with any T at the cost of a runtime overhead. I also thought of something like a StaticWord<T> which would only allow T that fit in a word, but I don't think it's currently possible to check the size of a type at compile time. We could just assert!(size_of::<T>() <= size_of::<usize>()) in the new function, but that seems like an unelegant solution. What do you think?

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 31, 2021

Also I just noticed that you couldn't even do the assertion in the constructor because you can't assert in a const fn, and the constructor needs to be const.

@Lokathor
Copy link
Member

to assert in const fn you do some dumb nonsense:

// this array can be any type, as long as the length is 1
[()][!(test_here) as usize]

But it does panic on test failure.

This is necessary to ensure memory safety
@Sp00ph
Copy link
Contributor Author

Sp00ph commented Jan 31, 2021

From what I found online, fence(AcqRel) should be enough to ensure that the optimizer doesn't reorder any memory reads or writes across a read/write to one of these static types

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Feb 1, 2021

Ok so I just modified the small example program from above to check whether these things actually work, and as it turns out, you get a linker error:

= note: C:\devkitPro\devkitARM\bin\arm-none-eabi-ld.exe: warning: cannot find entry symbol __start; defaulting to 0000000008000000
          C:\devkitPro\devkitARM\bin\arm-none-eabi-ld.exe: C:\VSCODE\dev\rust\gba\gba-test\target\thumbv4-none-agb\release\deps\gba_test-efc740bf2731ac5a.gba_test.d53p249f-cgu.3.rcgu.o: in function `gba_test::irq_handler':
          gba_test.d53p249f-cgu.3:(.text._ZN8gba_test11irq_handler17h06ede0151e42d430E+0x1e): undefined reference to `__sync_synchronize'
          C:\devkitPro\devkitARM\bin\arm-none-eabi-ld.exe: gba_test.d53p249f-cgu.3:(.text._ZN8gba_test11irq_handler17h06ede0151e42d430E+0x2c): undefined reference to `__sync_synchronize'
          C:\devkitPro\devkitARM\bin\arm-none-eabi-ld.exe: gba_test.d53p249f-cgu.3:(.text._ZN8gba_test11irq_handler17h06ede0151e42d430E+0x6c): undefined reference to `__sync_synchronize'
          C:\devkitPro\devkitARM\bin\arm-none-eabi-ld.exe: gba_test.d53p249f-cgu.3:(.text._ZN8gba_test11irq_handler17h06ede0151e42d430E+0x74): undefined reference to `__sync_synchronize'
          C:\devkitPro\devkitARM\bin\arm-none-eabi-ld.exe: C:\VSCODE\dev\rust\gba\gba-test\target\thumbv4-none-agb\release\deps\gba_test-efc740bf2731ac5a.gba_test.d53p249f-cgu.3.rcgu.o: in function `gba_test::main':
          gba_test.d53p249f-cgu.3:(.text._ZN8gba_test4main17h4dbc65bb62b27685E+0x32): undefined reference to `__sync_synchronize'
          C:\devkitPro\devkitARM\bin\arm-none-eabi-ld.exe: C:\VSCODE\dev\rust\gba\gba-test\target\thumbv4-none-agb\release\deps\gba_test-efc740bf2731ac5a.gba_test.d53p249f-cgu.3.rcgu.o:gba_test.d53p249f-cgu.3:(.text._ZN8gba_test4main17h4dbc65bb62b27685E+0x3e): more undefined references to `__sync_synchronize' follow

I would guess that this is because of the fence(AcqRel) calls, as I don't know where else the undefined reference to __sync_synchronize error would come from. I guess I could try to change the fence to a compiler_fence and see if that helps

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Feb 1, 2021

Ok so as it turns out the errors are because of the fence calls, but changing them to compiler_fence calls doesn't change anything and the error still happens. If I just remove the fence calls completely it compiles and works in this case but it might not always work exactly the way one expects

@Lokathor
Copy link
Member

Lokathor commented Feb 1, 2021

Very odd indeed that compiler_fence causes problems. I was told that it's an intrinsics that emits no code.

@Sp00ph
Copy link
Contributor Author

Sp00ph commented Feb 1, 2021

I also find that very weird considering that even in the compiler_fence docs is specifically says that no machine code will be emitted from a call to compiler_fence. So I don't understand why there would be a linker error

@Lokathor
Copy link
Member

@Sp00ph since we merged #107, does that do enough for global mutable state for your concerns?

Base automatically changed from master to main February 23, 2021 06:50
@Sp00ph
Copy link
Contributor Author

Sp00ph commented Feb 23, 2021

Yea I think the way it is implemented in #107 should be enough for most applications. Closing this now

@Sp00ph Sp00ph closed this Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants