Variable annotation

I want to annotate some strings (mark them with specific name) in Rust code and compile the code to wasm code using wasm32-wasi target option. I need to know the offsets that show where they store in the data section.

Is there similar solution which gives me the capability to extract the offsets using annotation?

Thanks.

The best way for an outside program to access certain data from a WebAssembly binary is using a custom section.

For example, given this code:

#[link_section = "my-data"]
pub static FIRST: [u8; 13] = *b"Hello, World!";

#[link_section = "some-more-data"]
pub static SECOND: [u8; 6] = [1, 2, 3, 4, 5, 6];

// you can add more data to the same section
#[link_section = "my-data"]
pub static THIRD: [u8; 4] = u32::to_be_bytes(0xdead_beef);

I can compile it to a *.wasm file and use the wasm-objdump from WABT to see where the different sections are.

$ wasm-objdump --headers custom_sections.wasm

custom_sections.wasm:	file format wasm 0x1

Sections:

    Table start=0x0000000a end=0x0000000f (size=0x00000005) count: 1
   Memory start=0x00000011 end=0x00000014 (size=0x00000003) count: 1
   Global start=0x00000016 end=0x0000002f (size=0x00000019) count: 3
   Export start=0x00000031 end=0x00000056 (size=0x00000025) count: 3
   Custom start=0x0000005a end=0x00063bf2 (size=0x00063b98) ".debug_info"
   Custom start=0x00063bf5 end=0x00063d49 (size=0x00000154) ".debug_pubtypes"
   Custom start=0x00063d4d end=0x0008f1eb (size=0x0002b49e) ".debug_ranges"
   Custom start=0x0008f1ee end=0x00090247 (size=0x00001059) ".debug_abbrev"
   Custom start=0x00090249 end=0x0009025e (size=0x00000015) "some-more-data"
   Custom start=0x00090262 end=0x000d1ea5 (size=0x00041c43) ".debug_line"
   Custom start=0x000d1ea7 end=0x000d1ec0 (size=0x00000019) "my-data"
   Custom start=0x000d1ec4 end=0x0017980c (size=0x000a7948) ".debug_str"
   Custom start=0x00179810 end=0x001b7df2 (size=0x0003e5e2) ".debug_pubnames"
   Custom start=0x001b7df4 end=0x001b7e0d (size=0x00000019) "name"
   Custom start=0x001b7e0f end=0x001b7e64 (size=0x00000055) "producers"

I can also inspect the my-data sections.

$ wasm-objdump --section=my-data --full-contents custom_sections.wasm

custom_sections.wasm:	file format wasm 0x1

Contents of section Custom:
00d1ea7: 076d 792d 6461 7461 4865 6c6c 6f2c 2057  .my-dataHello, W
00d1eb7: 6f72 6c64 21de adbe ef                   orld!....

You'll notice the bytes for Hello, World! and 0xdead_beef have been concatenated. Depending on your data, you may need to set things up so there's a way to tell when one variable ends and another begins (e.g. by prefixing each section of bytes with its length).

Most WebAssembly libraries have a way for reading custom sections. For example, the wasmparser crate has a Payload::CustomSection(...) variant, and in the browser you can use WebAssembly.Module.customSections().

Of course, your WebAssembly binary can use FIRST, SECOND, and THIRD just like any other static variable.

5 Likes

Thank you so much for your great idea!
I am working on wasmtime and it also has a Payload::CustomSection(...) variant.

But the only thing that I couldn't understand is the relation between the offset of the custom data in a wasm binary (for example 00d1ea7 in your example), and the offset of the data in linear memory. Is there any relation that helps me to find the offset in memory?

That 0x00d1ea7 is the offset from the start of the *.wasm file, but there should be a second copy in linear memory for the static variable that is used by your code. If you add #[no_mangle], the variable should also be accessible via Instance::get_global().

When I added those annotations and recompiled, wasmer inspect showed me the following:

$ wasmer inspect custom_sections.wasm
Type: wasm
Size: 1.8 MB
Imports:
  Functions:
  Memories:
  Tables:
  Globals:
Exports:
  Functions:
  Memories:
    "memory": not shared (17 pages..)
  Tables:
  Globals:
    "FIRST": I32 (constant)
    "SECOND": I32 (constant)
    "THIRD": I32 (constant)
    "__data_end": I32 (constant)
    "__heap_base": I32 (constant)

Note that FIRST and friends won't be the bytes themselves, instead they'll contain the offset of the bytes in linear memory.

2 Likes

The first solution worked and I could implement it, thank you so much!

1 Like

@Michael-F-Bryan I have another question. I want to change the string content during parsing sections, and it might effect on its length. This can change the offsets and logic of the program. Is there any way to automatically update the offsets?

You might want to check out the walrus crate. It's a crate that was originally created by the wasm-bindgen project so they could patch *.wasm files generated by rustc.

That should give you everything you need. It even contains an IR and function builder for WebAssembly bytecode if you need to modify the functions themselves.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.