I published my first open-source Rust crate!

The crate is called moostache and it's a Mustache template engine.

I've been coding Rust on and off for a few years already, so I'm not completely new to the language, but this is my first attempt at writing a library that other people may use, so I'm open to feedback on the crate if anyone has any!

9 Likes

Great, congrats for finishing a project and having the courage to post here. I'll start with one tip: use miri to find undefined behaviour. Maybe you'll discover something :wink:

2 Likes

@jendrikw I ran miri and it complained about this line. On paper, it's wildly unsafe, however I'm upholding safety invariants in the rest of my code so that that line doesn't cause any crashes, and so far it hasn't in any of my 120+ unit/doc tests, which isn't to say it's 100% safe, but if it is unsafe I'd be very interested in seeing how UB could be triggered using the public interface of the library.

The TL;DR behind that line is I wanted to have a self-referential struct like this:

struct Template {
  source: String,
  parsed: Vec<Fragment<'a>>, // holds references into source
}

But the compiler obviously doesn't allow that, so my workaround was to leak the String to turn it into a &'static str and store that in the Template struct so it becomes this:

struct Template {
  source: &'static str,
  parsed: Vec<Fragment<'static>>,
}

Which the compiler does allow, and then when Template gets dropped I reclaim the memory, i.e. turn the &'static str back into a String (this is the linked line above that miri complained about), so that my library doesn't actually leak memory over time.

I don't know if this "leak it and then reclaim it" trick is very smart or very stupid, but I would love for someone with more experience than me to tell me which it is.

That code is definitely unsound in a way unrelated to self-reference, because it forgets the capacity of the String and therefore deallocates with a size that is not equal to the original allocation. You need to keep around all three parts (as per String::{from,into}_raw_parts()) — or just keep the original String — and limit your reference manipulation to the references you make from the string, rather than how you store the original string.

That said, I would suggest using ouroboros instead. It provides a safe interface to the kind of self-reference you're trying to create, so you don’t have to write any unsafe code and can rely on other people having reviewed it. It's definitely imperfect (though much of the trouble, past and present, has to do with mutable self-reference), but IMO it’s still a better idea than doing it yourself from scratch unless you have a special use case.

#[ouroboros::self_referencing]
pub struct Template {
    source: String,

    #[borrows(source)]
    #[covariant]
    fragments: Vec<Fragment<'this>>,

    skips: Vec<SectionSkip>,
}
5 Likes

Perhaps you're misunderstanding miri… if miri complains, that means that your unit tests (or whatever other execution you ran through miri) have actually triggered UB. Miri cannot help with API soundness at all.

17 Likes

The simple fix is to use *mut str in StaticStr instead, and impl Send/Sync manually. This way you maintain the provenance of the original Box and retain the right to deallocate it.

Note: I haven't audited the rest of your crate, but based on what you have said here, I think this should be enough to cover the immediate issue.

    #[derive(Debug, PartialEq)]
    enum StaticStr {
        Real(&'static str),
-       Leaked(&'static str),
+       Leaked(*mut str),
    }

+   unsafe impl Send for StaticStr {}
+   unsafe impl Sync for StaticStr {}

    impl StaticStr {
        // SAFETY: this method should never be made
        // public and should only be used internally
        // in this crate
        fn as_str(&self) -> &'static str {
            match self {
                StaticStr::Real(real) => real,
-               StaticStr::Leaked(leaked) => leaked,
+               StaticStr::Leaked(leaked) => unsafe { &*leaked },
            }
        }
    }
    
    // safe Drop impl for leaked Strings
    impl Drop for StaticStr {
        fn drop(&mut self) {
            if let StaticStr::Leaked(leaked) = *self {
                // SAFETY:
                // - StaticStr cannot derive a Clone impl
                // - this should only be used within Template
                //   and remain as a private field
                let _: Box<str> = unsafe {
-                    Box::from_raw(std::ptr::from_ref::<str>(leaked).cast_mut())
+                    Box::from_raw(leaked)
                };
            }
        }
    }
    
    impl From<String> for StaticStr {
        fn from(value: String) -> Self {
-            StaticStr::Leaked(Box::leak(value.into_boxed_str()))
+            StaticStr::Leaked(Box::into_raw(value.into_boxed_str()))
        }
    }
3 Likes

For my two cents: even if a function is only meant to be used internally, you can't label it safe just for that. That makes the function unsound.

2 Likes

Awesome job, congratulations. Keep this amazing work. I still hesitate to publish my work, because I just learn Rust. For example, I use lot of self referencing structures, but I know it's a really bad style of programming, but when you do it for myself, I can do whatever I want :).

1 Like

@steffahn Yup, you're right. I misunderstood how miri worked. I thought it was more like a static analysis tool that reported lines of code that could theoretically cause UB, but I now understand it actually runs the code and finds actual instances of UB. Thanks for clarifying.

@kpreid I'm aware of crates like ouroboros, self_cell, and yoke. However, in the spirit of learning and having fun, I wanted to handroll my own solution, even if it came out a little hacky. If I was working on something more "serious" I'd probably just use yoke, as I think I like its API the most out of the three.

Anyway, I went back to the drawing board and refactored my self-referential code. I also added more unit tests, and refactored existing unit tests to make sure they all hit the unsafe code path. Now all 140+ unit tests pass when they're run with miri!

The new self-referential implementation doesn't leak anything, doesn't reclaim anything, doesn't use any raw pointers, doesn't use pin, doesn't use macros or 3rd-party crates. It just uses transmute to change a &'a str into a &'static str in one place. It's not nearly as robust or foolproof as something like yoke, but I'm pretty happy with the sheer simplicity of it. I've published moostache v0.5.0 which contains the new implementation.

@jendrikw @kpreid @steffahn @RustyYato Thanks again for all your help! I really appreciate it.

2 Likes

For comparison, I’ve made a version with yoke just for fun, see this PR :wink:

3 Likes

Woohoo, my crate's first external contribution! PR looks great, I merged it in and published it as moostache v0.6.0 to crates.io :smile:

1 Like