Rust generates code appearing to reference nonexistent const data


#1

I’ve run into an issue that I don’t quite understand and I’m curious if it is expected behavior or if I should be looking at rustc. My setup is that I’m using nightly and compiling through xargo to get no_std for a microcontroller. I am working with some raw const data that represents bitmaps:

/// IBM 8x8 font, public domain
const FONT_8X8_DATA: [[u8; 8]; 37] = [
    [ 0x3E, 0x63, 0x73, 0x7B, 0x6F, 0x67, 0x3E, 0x00 ],   // U+0030 (0)
//...snip...
    [ 0x1E, 0x33, 0x33, 0x3E, 0x30, 0x18, 0x0E, 0x00 ],   // U+0039 (9)
    [ 0x0C, 0x1E, 0x33, 0x33, 0x3F, 0x33, 0x33, 0x00 ],   // U+0041 (A)
//...snip
    [ 0x7F, 0x63, 0x31, 0x18, 0x4C, 0x66, 0x7F, 0x00 ],   // U+005A (Z)
    [ 0x1C, 0x36, 0x36, 0x1C, 0x00, 0x00, 0x00, 0x00 ],   // U+00B0 (degree)
];

This data is exactly 296 bytes long, as you would expect given that it is a [[u8; 8]; 37]. Originally, I was accessing it through a method that looks like this:

pub fn get_8x8_character(character: char) -> Option<gfx::Bitmap<'static>> {
    if character.len_utf8() > 1 {
        return None
    }
    let literal: u8 = character as u8;
    match literal {
        b'0'...b'9' => Some(gfx::Bitmap::new(&FONT_8X8_DATA[(literal - b'0') as usize], 8, 8)),
        b'A'...b'Z' => Some(gfx::Bitmap::new(&FONT_8X8_DATA[(10 + literal - b'A') as usize], 8, 8)),
        0xB0 => Some(gfx::Bitmap::new(&FONT_8X8_DATA[36], 8, 8)),
        _ => None,
    }
}

This worked great and the font characters were displayed properly to the screen attached to my microcontroller. Now as I was messing with this and learning about &str and char (I have no String since I’m using no-std) I found that I could actually match directly on char ranges. So, I changed my method to read as follows:

pub fn get_8x8_character(character: char) -> Option<gfx::Bitmap<'static>> {
    match character {
        '0'...'9' => Some(gfx::Bitmap::new(&FONT_8X8_DATA[character as usize - b'0' as usize][..], 8, 8)),
        'A'...'Z' => Some(gfx::Bitmap::new(&FONT_8X8_DATA[10 + character as usize - b'A' as usize][..], 8, 8)),
        '\u{00B0}' => Some(gfx::Bitmap::new(&FONT_8X8_DATA[36][..], 8, 8)),
        _ => None,
    }
}

I thought this looked much more concise and was great, until I loaded it on my microcontroller. I was greeted with garbage on the screen where the bitmaps were supposed to appear. I started investigating and looked side by side at the output from arm-none-eabi-size with my working code vs the garbage code:

Good code:

section                size        addr
.vector_table           188   134217728
.text                  3996   134217916
.init                    12   134221912
.fini                    12   134221924
.rodata                 872   134221936
.stack                 4096   536870912
.bss                    556   536870912
.data                     0   536871468
.debug_gdb_scripts       34   134217916
.ARM.attributes          49           0
.debug_str            51168           0
.debug_loc            10477           0
.debug_abbrev          2124           0
.debug_info           72206           0
.debug_ranges          5008           0
.debug_macinfo            2           0
.debug_pubnames       17609           0
.debug_pubtypes       22788           0
.debug_frame            920           0
.debug_line           16907           0
Total                209024

Garbage code:

section                size        addr
.vector_table           188   134217728
.text                  3846   134217916
.init                    12   134221764
.fini                    12   134221776
.rodata                 576   134221792
.stack                 4096   536870912
.bss                    556   536870912
.data                     0   536871468
.debug_gdb_scripts       34   134217916
.ARM.attributes          49           0
.debug_str            51160           0
.debug_loc            10499           0
.debug_abbrev          2124           0
.debug_info           72247           0
.debug_ranges          4664           0
.debug_macinfo            2           0
.debug_pubnames       17609           0
.debug_pubtypes       22788           0
.debug_frame            920           0
.debug_line           16896           0
Total                208278

Here’s where it gets weird: Looking at the .rodata section sizes I see that they are different by exactly 296 bytes, with the garbage code being smaller. I don’t see the 296 bytes being made up elsewhere either (in fact, the overall .text size is smaller). I looked at the disassembly of the binary and found that for some reason, the entire FONT_8X8_DATA array was missing in the garbage code variant.

My question: Why is that data being stripped from my binary? And yet the Some(Bitmap) referencing FONT_8X8_DATA does not panic, but instead displays garbage data from who knows where. This is certainly not safe and something that I would expect to happen when doing this in C, not rust. Is there something I am doing wrong here to trigger this behavior? Or am I misunderstanding the situation?

Edit: I was mistaken about the non-match test (just returning a Some(Bitmap) instead of matching) also having this problem. I’ve removed references to that.


#2

Sounds like a code generation failure.

That said, you probably don’t want to use const. consts don’t exist in the final binary; they’re effectively copy+pasted everywhere they’re used. I believe that if you use const BLAH: &[Foo] = &[..];, the array data ends up being statically allocated once, but it’s probably safer to just use a static so you know for certain that it’s being stored in the binary.


#3

Huh. Changing it to static fixes the issue.

I guess there must be something confusing the compiler when I use const for that array and match on a char.


#4

Blind guess: it sees the const data, thinks to itself “I need to remember to allocate this somewhere”, and the backend just forgets to do so.