Corrupted Strings


#1

Since shortly I noticed issues with corrupted strings on nightly, probably due to memory corruption. For example I have a method which expands “VERLAG[]” to [“VERLAG[]”, “VERLAG[].textindex”], but the result is
[“VERLAG[]”, “VERL�G[].textindex”]
The code runs in parallel, I am using rayon 1.0.1, but the method itself has no parallelism.
Any tips how to track this down? Is someone experiencing something similiar?

#[inline]
pub(crate) fn get_steps_to_anchor(path: &str) -> Vec<String> {
    let mut paths = vec![];
    let mut current = vec![];
    let parts = path.split('.');

    for part in parts {
        current.push(part.to_string());
        if part.ends_with("[]") {
            let joined = current.join(".");
            paths.push(joined);
        }
    }

    paths.push(path.to_string() + TEXTINDEX); // add path to index
    paths
}

#2

Can you provide more context? Just putting out a statement like without any context (since when? what situation? which program? can you show source?) makes it very hard for us to help you…

That doesn’t look like corruption, that looks like someone used lossy string conversion, and a non-utf8/non-utf16 character encoding was encountered


#3

Sorry I accidentally submitted the post early and had not enough rights to delete it. I updated the original post.


#4

Where is the path arg coming from? Do you have any unsafe code that you’re using? Is this repeatable or only happens sometimes?


#5

I rechecked all unsafe usages and there was indeed the source of the memory corruption. I replaced the the unsafe code and its working fine now.


#6

Yay!

From the context, I assume you were doing an unsafe from_utf8_unchecked conversion, and replaced that with a Result-based from_utf8? (I’m curious, and it is always nice to document for others what patterns to avoid :slight_smile: )


#7

My guess, based on what @PSeitz mentioned initially, is the path arg wasn’t actually pointing at memory guaranteed to be immutable or valid for the duration of get_steps_to_anchor(). But yeah, would be interesting to know what the actual issue was.


#8

I was writing outside of the bounds of a vector with get_mut_unchecked and was basically overwriting random memory. I had an assumption, about the size of the vector, which was previously correct. I introduced a bug and the usage of unsafe supported the introduction of silent bugs here.

It’s a quite performance critical component, that’s why I used unsafe code, but I should have at least added some debug_assert statements to have it crash in non-opt builds.