Review this fix for unsafe_libyaml, yaml_string_extend

The advisory has recently been posted addressing problems in libyml::string::yaml_string_extend, part of unsafe-libyaml.

This library, in one form or another, is still part of many YAML implementations (see here), including my own crate, until very recently. This code, auto-generated from C, provides low-level YAML parsing and generation. Hence, this issue potentially affects a large number of users. Let's not leave without attention.

Responding to the advisory, I switched immediately to unsafe-libyaml-norway as they recommend. However, after careful review to yaml_string_extend there, I suspect that the problem persists. The code there seems not much different from the old unsafe-libyaml and still computes sizes with
(*end).c_offset_from(*start)

which seem resolving to ptr::offset_from on *const u8. That operation is only defined when both pointers are derived from the same allocation (or one‑past). Otherwise, it is undefined behavior. Also, the size math uses force_mul(2_i64) which can silently overflow if buffers get large and offset_from for the pointer rebase (*pointer = new_start.wrapping_offset((*pointer).c_offset_from(*start) …)) has the same precondition: *pointer must be within the same allocation as *start**. This may be UB too.

I propose an alternative that handles the uninitialized (NULL) case, avoiding offset_from entirely. For size calculations, it uses integer address diffs and checks growth with checked_mul to prevent integer overflow.

pub(crate) unsafe fn yaml_string_extend(
    start: *mut *mut yaml_char_t,
    pointer: *mut *mut yaml_char_t,
    end: *mut *mut yaml_char_t,
) {
    let cur_start = *start;
    let cur_ptr   = *pointer;
    let cur_end   = *end;

    // Compute old capacity and used bytes without `offset_from`.
    // Handle the uninitialized (NULL) case explicitly.
    let (old_cap, used) = if cur_start.is_null() {
        (0usize, 0usize)
    } else {
        let s = cur_start as usize;
        let p = cur_ptr   as usize;
        let e = cur_end   as usize;

        // Sanity checks in debug builds; keeps Miri happy if invariants are violated upstream.
        debug_assert!(p >= s && p <= e, "pointer must be within [start, end]");
        debug_assert!(e >= s, "end must be >= start");

        (e - s, p.saturating_sub(s))
    };

    // Pick a new capacity (bytes). Start with a small nonzero capacity if previously uninitialized.
    let base = if old_cap == 0 { 16usize } else { old_cap };
    let new_cap = match base.checked_mul(2) {
        Some(v) => v,
        None => {
            // Overflow: leave unchanged or handle via your error reporting.
            // You could also clamp to isize::MAX if your allocator accepts it.
            return;
        }
    };

    // Allocate / reallocate.
    let new_start = yaml_realloc(cur_start as *mut c_void, new_cap as size_t) as *mut yaml_char_t;
    if new_start.is_null() {
        // Allocation failed; leave inputs unchanged (or set an error flag if you have one).
        return;
    }

    // Zero only the grown tail.
    if new_cap > old_cap {
        let tail = new_start.add(old_cap);
        memset(tail as *mut c_void, 0, (new_cap - old_cap) as size_t);
    }

    // Rebase `pointer` and `end` into the new allocation.
    let new_ptr = new_start.add(used);
    let new_end = new_start.add(new_cap);

    *pointer = new_ptr;
    *end     = new_end;
    *start   = new_start;
}

I also looked at STRING_EXTEND here, I think that also can be rewritten more robustly, I suggest

macro_rules! STRING_EXTEND {
    ($string:expr) => {{
        // Number of bytes you need ahead of `pointer`
        const NEED: usize = 5;
        let p = $string.pointer as usize;
        let e = $string.end     as usize;
        if $string.start.is_null() || p.saturating_add(NEED) > e {
            unsafe {
                yaml_string_extend(
                    core::ptr::addr_of_mut!($string.start),
                    core::ptr::addr_of_mut!($string.pointer),
                    core::ptr::addr_of_mut!($string.end),
                );
            }
        }
    }};
}

I would be great to have the community check if these fixes are sound. Then we can open a pull request to the maintained branches of unsafe-libyaml, or if (hopefully unlikely) the community is too passive, I will create my own fork.

Note that the advisory is for libyml, a vibe coded fork of unsafe-libyaml. Can you still reproduce UB in the legitimate crates?

It is, indeed, different. In libyml the function has been moved to the other file, string.rs, and the code looks like so:

pub unsafe fn yaml_string_extend(
    start: *mut *mut yaml_char_t,
    pointer: *mut *mut yaml_char_t,
    end: *mut *mut yaml_char_t,
) {
    let current_size = (*end).offset_from(*start) as size_t;
    let new_size = current_size * 2;

    let new_start: *mut yaml_char_t =
        yaml_realloc(*start as *mut libc::c_void, new_size)
            as *mut yaml_char_t;
    let _ = memset(
        new_start.add(current_size.try_into().unwrap())
            as *mut libc::c_void,
        0,
        current_size,
    );

    let offset = (*pointer).offset_from(*start);
    *pointer = new_start.add(offset.try_into().unwrap());
    *end = new_start.add((new_size as isize).try_into().unwrap());
    *start = new_start;
}

while both old and "Norway" versions look so:

pub(crate) unsafe fn yaml_string_extend(
    start: *mut *mut yaml_char_t,
    pointer: *mut *mut yaml_char_t,
    end: *mut *mut yaml_char_t,
) {
    let new_start: *mut yaml_char_t = yaml_realloc(
        *start as *mut libc::c_void,
        (((*end).c_offset_from(*start) as libc::c_long).force_mul(2_i64)) as size_t,
    ) as *mut yaml_char_t;
    memset(
        new_start.wrapping_offset((*end).c_offset_from(*start) as libc::c_long as isize)
            as *mut libc::c_void,
        0,
        (*end).c_offset_from(*start) as libc::c_ulong,
    );
    *pointer = new_start.wrapping_offset((*pointer).c_offset_from(*start) as libc::c_long as isize);
    *end = new_start.wrapping_offset(
        (((*end).c_offset_from(*start) as libc::c_long).force_mul(2_i64)) as isize,
    );
    *start = new_start;
}

The advisory reached me as the advisory about unsafe-libyaml so I started to look into that version code with lots of skeptic attention. Can we say the old version of the code is safe?

I took a quick at the code and:

It seems to resolve to a subtraction after casting to isize instead, which is completely safe:

force_mul seems to cause an abort (by double panic) on overflow.


Are you sure you "carefully" reviewed the code? It took me just one quick search on Github to find the above snippets, but maybe I'm missing something.

1 Like

The reason I posted it here was to get the second feedback. If you think it's fine, fine, I respect your opinion. If I were fully confident in my conclusions, I would not be doing.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.