Skip to content
This repository was archived by the owner on Jan 24, 2022. It is now read-only.

(Zero cost) stack overflow protection #34

Closed
japaric opened this issue Sep 27, 2017 · 7 comments · Fixed by #43
Closed

(Zero cost) stack overflow protection #34

japaric opened this issue Sep 27, 2017 · 7 comments · Fixed by #43

Comments

@japaric
Copy link
Member

japaric commented Sep 27, 2017

Today, the layout of RAM looks like this (assuming no heap and a single RAM region):

today

With this layout when a stack overflow occurs static variables end up being overwritten /
corrupted silently.

This scenario can be avoided by simply changing the memory layout to look like this:

ideal

In this new scenario a stack overflow will hit the lower RAM boundary. Trying to write beyond the
RAM boundaries raises a HardFault exception. Thus, in theory, the HardFault exception handler
could be used as a stack overflow handler.

In systems where heap memory, which by default starts where the static region ends and grows
upwards, exists a similar reordering of the regions can be applied.

Implementation

To my knowledge this can't be implement using only linker scripts. With a linker script you can
instruct the linker where to start a memory region but you can't specify the end address of the
region.

The related bits of the linker script we use are shown below:

PROVIDE(_stack_start = ORIGIN(RAM) + LENGTH(RAM));

SECTIONS
{
  /* .. */
  .bss : ALIGN(4)
  {
    _sbss = .;
    *(.bss .bss.*);
    . = ALIGN(4);
    _ebss = .;
  } > RAM

  .data : ALIGN(4)
  {
    _sidata = LOADADDR(.data);
    _sdata = .;
    *(.data .data.*);
    . = ALIGN(4);
    _edata = .;
  } > RAM AT > FLASH

  /* The heap starts right after the .bss + .data section ends */
  _sheap = _edata;

  /* .. */
}

A C implementation of this region reordering uses a two step linking process. See
this StackOverflow answer for details.

Other options for stack overflow protection

Assuming that we can't change the memory layout. We could:

  • Use the MPU (Memory Protection Unit) to mark the upper boundary of the static region as
    read-only. In this scenario when a stack overflow occurs a MemManage exception is raised. This has
    an initialization cost but no runtime cost. The downside is that not all microcontrollers have a
    MPU.

  • Implement stack probes for the Cortex-M targets. I believe enabling stack probes carries a runtime
    cost (per function call?) but I'm not sure.

@pftbest
Copy link
Contributor

pftbest commented Sep 27, 2017

You can achieve a similar result if you set a fixed size for a stack. You just create a section with that size at the beginning of RAM, and set the stack pointer to the end of a section.

Downside is that you waste some memory at the end.

@japaric
Copy link
Member Author

japaric commented Oct 2, 2017

@pftbest Yeah, I thought about that option but decided not to include it here because it seems to be very fram far from ideal. Reasons: (a) it requires user input (there's no sensible default, I think), (b) it's error-prone (e.g. if the selected stack space is too large you'll get an error about not being enough space to fit the static variables) and (c) it's not efficient (if you pick a stack size that's too small then you leave space unused, what you mentioned).

@whitequark
Copy link
Contributor

Implement stack probes for the Cortex-M targets. I believe enabling stack probes carries a runtime
cost (per function call?) but I'm not sure.

Stack probes only (currently) exist in Rust to avoid "jumping over" the guard page of the thread stack and into the heap. The old mechanism, which relied on segmented stacks and TLS slots, is dead and it would be really annoying to resurrect for this issue.

@whitequark
Copy link
Contributor

@japaric Anyway, I know how to do this. Place the static sections twice. The first time you place them, they go into /dev/null (via /DISCARD/ or something) but you can get the size with the SIZEOF operator. The second time, you use that result. So it's two-pass linking in one ld invocation.

japaric added a commit that referenced this issue Oct 20, 2017
so that the stack can never collide into them

closes #34
japaric added a commit that referenced this issue Nov 9, 2017
This is one possible solution to the stack overflow problem described in #34. This approach uses a
linker wrapper, called [swap-ld], to generate the desired memory layout. See #34 for a description
of the desired memory layout and #41 for a description of how `swap-ld` works.

The observable effects of this change in cortex-m programs are:

- the `_sbss` symbol is now override-able.
- there is now a `.stack` linker section that denotes the span of the call stack. `.stack` won't be
  loaded into the program; it just exists for informative purposes (`swap-ld` uses this
  information).

Given the following program:

``` rust
fn main() {
    static mut X: u32 = 0;
    static mut Y: u32 = 1;

    loop {
        unsafe {
            ptr::write_volatile(&mut X, X + 1);
            ptr::write_volatile(&mut Y, Y + 1);
        }
    }
}
```

If you link this program using the `arm-none-eabi-ld` linker, which is the cortex-m-quickstart
default, you'll get the following memory layout:

``` console
$ console
section                                                                   size         addr
.vector_table                                                            0x130    0x8000000
.text                                                                     0x94    0x8000130
.rodata                                                                    0x0    0x80001c4
.stack                                                                  0x5000   0x20000000
.bss                                                                       0x4   0x20000000
.data                                                                      0x4   0x20000004
```

Note how the space reserved for the stack (depicted by the `.stack` linker section) overlaps with
the space where .bss and .data reside.

If you, instead, link this program using `swap-ld` you'll get the following memory layout:

``` console
$ arm-none-eabi-size -Ax app
section                                                                   size         addr
.vector_table                                                            0x130    0x8000000
.text                                                                     0x94    0x8000130
.rodata                                                                    0x0    0x80001c4
.stack                                                                  0x4ff8   0x20000000
.bss                                                                       0x4   0x20004ff8
.data                                                                      0x4   0x20004ffc
```

Note that no overlap exists in this case and that the call stack size has reduced to accommodate the
.bss and .data sections.

Unlike #41 the addresses of static variables is now correct:

``` console
$ arm-none-eabi-objdump -CD app
Disassembly of section .vector_table:

08000000 <_svector_table>:
 8000000:       20004ff8        strdcs  r4, [r0], -r8 ; initial Stack Pointer

08000004 <cortex_m_rt::RESET_VECTOR>:
 8000004:       08000131        stmdaeq r0, {r0, r4, r5, r8}

08000008 <EXCEPTIONS>:
 8000008:       080001bd        stmdaeq r0, {r0, r2, r3, r4, r5, r7, r8}
 (..)

Disassembly of section .stack:

20000000 <.stack>:
        ...

Disassembly of section .bss:

20004ff8 <cortex_m_quickstart::main::X>:
20004ff8:       00000000        andeq   r0, r0, r0

Disassembly of section .data:

20004ffc <_sdata>:
20004ffc:       00000001        andeq   r0, r0, r1
```

closes #34

[swap-ld]: https://github.com/japaric/swap-ld
japaric added a commit that referenced this issue Nov 25, 2017
This is one possible solution to the stack overflow problem described in #34. This approach uses a
linker wrapper, called [swap-ld], to generate the desired memory layout. See #34 for a description
of the desired memory layout and #41 for a description of how `swap-ld` works.

The observable effects of this change in cortex-m programs are:

- the `_sbss` symbol is now override-able.
- there is now a `.stack` linker section that denotes the span of the call stack. `.stack` won't be
  loaded into the program; it just exists for informative purposes (`swap-ld` uses this
  information).

Given the following program:

``` rust
fn main() {
    static mut X: u32 = 0;
    static mut Y: u32 = 1;

    loop {
        unsafe {
            ptr::write_volatile(&mut X, X + 1);
            ptr::write_volatile(&mut Y, Y + 1);
        }
    }
}
```

If you link this program using the `arm-none-eabi-ld` linker, which is the cortex-m-quickstart
default, you'll get the following memory layout:

``` console
$ console
section                                                                   size         addr
.vector_table                                                            0x130    0x8000000
.text                                                                     0x94    0x8000130
.rodata                                                                    0x0    0x80001c4
.stack                                                                  0x5000   0x20000000
.bss                                                                       0x4   0x20000000
.data                                                                      0x4   0x20000004
```

Note how the space reserved for the stack (depicted by the `.stack` linker section) overlaps with
the space where .bss and .data reside.

If you, instead, link this program using `swap-ld` you'll get the following memory layout:

``` console
$ arm-none-eabi-size -Ax app
section                                                                   size         addr
.vector_table                                                            0x130    0x8000000
.text                                                                     0x94    0x8000130
.rodata                                                                    0x0    0x80001c4
.stack                                                                  0x4ff8   0x20000000
.bss                                                                       0x4   0x20004ff8
.data                                                                      0x4   0x20004ffc
```

Note that no overlap exists in this case and that the call stack size has reduced to accommodate the
.bss and .data sections.

Unlike #41 the addresses of static variables is now correct:

``` console
$ arm-none-eabi-objdump -CD app
Disassembly of section .vector_table:

08000000 <_svector_table>:
 8000000:       20004ff8        strdcs  r4, [r0], -r8 ; initial Stack Pointer

08000004 <cortex_m_rt::RESET_VECTOR>:
 8000004:       08000131        stmdaeq r0, {r0, r4, r5, r8}

08000008 <EXCEPTIONS>:
 8000008:       080001bd        stmdaeq r0, {r0, r2, r3, r4, r5, r7, r8}
 (..)

Disassembly of section .stack:

20000000 <.stack>:
        ...

Disassembly of section .bss:

20004ff8 <cortex_m_quickstart::main::X>:
20004ff8:       00000000        andeq   r0, r0, r0

Disassembly of section .data:

20004ffc <_sdata>:
20004ffc:       00000001        andeq   r0, r0, r1
```

closes #34

[swap-ld]: https://github.com/japaric/swap-ld
japaric added a commit that referenced this issue Feb 17, 2018
This is one possible solution to the stack overflow problem described in #34. This approach uses a
linker wrapper, called [swap-ld], to generate the desired memory layout. See #34 for a description
of the desired memory layout and #41 for a description of how `swap-ld` works.

The observable effects of this change in cortex-m programs are:

- the `_sbss` symbol is now override-able.
- there is now a `.stack` linker section that denotes the span of the call stack. `.stack` won't be
  loaded into the program; it just exists for informative purposes (`swap-ld` uses this
  information).

Given the following program:

``` rust
fn main() {
    static mut X: u32 = 0;
    static mut Y: u32 = 1;

    loop {
        unsafe {
            ptr::write_volatile(&mut X, X + 1);
            ptr::write_volatile(&mut Y, Y + 1);
        }
    }
}
```

If you link this program using the `arm-none-eabi-ld` linker, which is the cortex-m-quickstart
default, you'll get the following memory layout:

``` console
$ console
section                                                                   size         addr
.vector_table                                                            0x130    0x8000000
.text                                                                     0x94    0x8000130
.rodata                                                                    0x0    0x80001c4
.stack                                                                  0x5000   0x20000000
.bss                                                                       0x4   0x20000000
.data                                                                      0x4   0x20000004
```

Note how the space reserved for the stack (depicted by the `.stack` linker section) overlaps with
the space where .bss and .data reside.

If you, instead, link this program using `swap-ld` you'll get the following memory layout:

``` console
$ arm-none-eabi-size -Ax app
section                                                                   size         addr
.vector_table                                                            0x130    0x8000000
.text                                                                     0x94    0x8000130
.rodata                                                                    0x0    0x80001c4
.stack                                                                  0x4ff8   0x20000000
.bss                                                                       0x4   0x20004ff8
.data                                                                      0x4   0x20004ffc
```

Note that no overlap exists in this case and that the call stack size has reduced to accommodate the
.bss and .data sections.

Unlike #41 the addresses of static variables is now correct:

``` console
$ arm-none-eabi-objdump -CD app
Disassembly of section .vector_table:

08000000 <_svector_table>:
 8000000:       20004ff8        strdcs  r4, [r0], -r8 ; initial Stack Pointer

08000004 <cortex_m_rt::RESET_VECTOR>:
 8000004:       08000131        stmdaeq r0, {r0, r4, r5, r8}

08000008 <EXCEPTIONS>:
 8000008:       080001bd        stmdaeq r0, {r0, r2, r3, r4, r5, r7, r8}
 (..)

Disassembly of section .stack:

20000000 <.stack>:
        ...

Disassembly of section .bss:

20004ff8 <cortex_m_quickstart::main::X>:
20004ff8:       00000000        andeq   r0, r0, r0

Disassembly of section .data:

20004ffc <_sdata>:
20004ffc:       00000001        andeq   r0, r0, r1
```

closes #34

[swap-ld]: https://github.com/japaric/swap-ld
@ia0
Copy link

ia0 commented Jan 1, 2022

Hi,

It looks like the fix in #43 was reverted as a side-effect of #64 as far as I understand. It looks like no support from cortex-m-rt is actually need to get stack overflow protection as well as maximum heap usage (related to #5) by using a memory.x like the following:

__stack_size = 0x10000;

MEMORY
{
  FLASH : ORIGIN = 0x00000000, LENGTH = 0x00100000
  RAM   : ORIGIN = 0x20000000 + __stack_size, LENGTH = 0x00040000 - __stack_size
}

_stack_start = ORIGIN(RAM);
__eheap = ORIGIN(RAM) + LENGTH(RAM);

Then initializing the heap with:

extern "C" {
    static mut __sheap: u32;
    static mut __eheap: u32;
}
let sheap = unsafe { &mut __sheap } as *mut u32 as usize;
let eheap = unsafe { &mut __eheap } as *mut u32 as usize;
assert!(sheap < eheap);
// Unsafe: Called only once before any allocation.
unsafe { ALLOCATOR.init(sheap, eheap - sheap) }

However, I wonder if cortex-m-rt should help the user to achieve this. I'm not sure how, so I'm asking in this issue. I can open a new issue if needed.

Thanks!

@adamgreig
Copy link
Member

The concept from #43 ended up in https://github.com/knurling-rs/flip-link which is probably the better way to do it (for now at least!). The problem with doing it in the linker script is you end up having to know in advance a specific size for the stack, so either you overestimate and things don't fit, or you end up with lots of unused RAM that could have been stack. With a tool like flip-link, the stack is automatically made as big as it can be. Ideally, the linker itself could figure this out for us, but I don't believe that's currently possible.

@ia0
Copy link

ia0 commented Jan 2, 2022

Hi @adamgreig ,

I'm aware of flip-link but I've hit knurling-rs/flip-link#43 for one of the board I want to support (nRF52840-dongle using DFU where I need to reserve space at the beginning of the flash for the bootloader) so I stopped using it.

The problem with doing it in the linker script is you end up having to know in advance a specific size for the stack

This is not a problem if you use a heap, because both the stack and the heap are variable-sized. As long as you use only one (you always use a stack, so as long as you don't use a heap), you don't need a linker script. But as soon as you use a heap, you need to decide how to assign the non-data RAM between the stack and the heap, essentially choosing a size for both.

So the main issue I see with doing it in the linker script is that it's not convenient. The best scenario would be something like:

  • Specify the heap with some magic, e.g. #[cortex_m_rt::heap] static mut HEAP: [u8; HEAP_SIZE] = [0; HEAP_SIZE]; (the macro will just use the size of this symbol and ignore it from the binary). If the magic is missing, no heap is used.
  • Use flip-link (once the memory.x override bug is fixed) to get the stack-overflow protection.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants