Optional string argument: Option<&str> feels so bloated

I have a function sandkiste::Comile::compile which takes an optional string argument for better diagnostics:

pub trait Compile<'a, C>: Machine<'a> {
    fn compile(
        &self,
        chunk_name: Option<String>,
        code: C
    ) -> Result<Self::Function<'_>, MachineError>;
}

When calling the function, I find it pretty much annoying to write:

compile(Some("main".to_string()), code);

I wonder how/if I could make it more ergonomic to use. Some ideas (example below uses a &'a str instead of String):

  • I could use a trait to allow passing a string or (). This is actually inspired by how the (unstable) Pattern trait is used by some string functions in std. Function bar below shows an example.
  • I could use the empty string as special value to indicate that no name has been given (function baz below). But this feels semantically wrong, even if an empty string is normally not used as value.

Of course I could also use the builder pattern, but I feel like this would make things rather complex than easier. Not sure though.

What are your experiences with this and how do other people work with optional arguments? Maybe there's a simple trick I missed which makes this easy?

fn foo(arg: Option<&str>) {
    match arg {
        Some(x) => println!("Got: {x}"),
        None => println!("Got none"),
    }
}

trait BarArg<'a> {
    fn option(self) -> Option<&'a str>;
}

impl<'a> BarArg<'a> for () {
    fn option(self) -> Option<&'a str> {
        None
    }
}

impl<'a> BarArg<'a> for &'a str {
    fn option(self) -> Option<&'a str> {
        Some(self)
    }
}

fn bar<'a, T: BarArg<'a>>(arg: T) {
    match arg.option() {
        Some(x) => println!("Got: {x}"),
        None => println!("Got none"),
    }
}

fn none_if_default<T: Default + PartialEq>(value: T) -> Option<T> {
    if value == Default::default() {
        None
    } else {
        Some(value)
    }
}

fn baz(arg: &str) {
    match none_if_default(arg) {
        Some(x) => println!("Got: {x}"),
        None => println!("Got none"),
    }
}

fn main() {
    foo(None);
    foo(Some("Hello"));
    bar(());
    bar("Hello");
    baz("");
    baz("Hello");
}

(Playground)

Output:

Got none
Got: Hello
Got none
Got: Hello
Got none
Got: Hello

If you're not keeping the string in the function, it would be better to take Option<&str> instead of Option<String>. Then at the call site, you won't need .to_string() for literals. If the call site has Option<String>, then there's .as_deref() to borrow the inner string.

There's impl Into<Option<&str>> that allows passing "…" or None. You just call .into() on it once inside the function body.

5 Likes

I think the () concept isn't that great. Or at least, I'd rather see the top one than the bottom here:

thing.compile(None, code);
thing.compile((), code);

So to me it sounds like you want to be able to convert from

  • Option<String>
  • String
  • Perhaps &str too
    • But more than one Option<_> would be ambiguous for None

If you just need the first two, Option<String> is already From<String> and From<Option<String>>.

Or you could use a newtype, which doesn't require importing another trait.

#[derive(Default)]
pub struct ChunkName(Option<String>);
impl From<Option<String>> for ChunkName { /* ... */ }
impl From<String> for ChunkName { /* ... */ }
impl<'a> From<&'a str> for ChunkName { /* ... */ }

Alternatively, you could supply different method(s).

fn compile_named(&self, chunk_name: String, code: C) -> ... {
    self.compile(Some(chunk_name), code)
}
// Probably this should have the shortest name though, sounds like
fn compile_anon(&self, code: C) -> ... {
    self.compile(None, code)
}

Or if you don't care about dyn safety

fn compile_named<S: ToString>(&self, chunk_name: S, code: C) -> ... {
    self.compile(Some(chunk_name.to_string()), code)
}
1 Like

The general pattern used by std is to offer two functions when dealing with optional parameters, e.g.

fn compile(&self, code: C) -> Result<Self::Function<'_>>;
fn compile_named(&self, code: C, name: String) -> Result<Self::Function<'_>>;

My first instinct would be to say that the type of code: C should already carry a name. Or if it's purely for diagnostics in this function's error, make the error type able to fill in the name retroactively. Or even just always require a name, and users without a more meaningful name provide "the code". A lot depends on the actual concrete use case (and this, like a lot of what I see you post about, is generalized past the point it's realistic to talk about concretely; [offtopic] what does it even mean to be generic over VMs which are capable of compiling code? It reminds me of Monads in that way where [/offtopic] it initially seems to be a generic interface more because multiple things have the same shape of interface than because it's useful to work over the class of thing uniformly.).

The function shape as written seems to want to take something like impl Into<Cow<str>>, so it can take both Some("name") and Some(String::new("name")), but unfortunately then it's not possible to infer the type for None (and it's a difficult problem to provide a default type) so you have to write e.g. None::<&str> instead. Taking Option<Cow<str>> instead inconveniences almost everyone equally, requiring Some(s.into()) for &str or String. Wanting two levels of "implicit into" doesn't work out well with inference.

2 Likes

Or, if you have a name parameter, you might as well
just pass an empty string when there is no name – after all, names are usually non-empty, so an empty string to signal the absence of a name is not surprising. You don't even have to treat it specially, then.

Thank you very much for your opinions, ideas, and notes on what's idiomatically done. I feel like I would rather go for something that's idiomatic, instead of doing some trickery (I think).

The problem is that I don't really know what the callee will prefer. The docs (which I wrote a while ago when making that Compile trait) say:

The chunk_name is always passed as owned String, as it will usually be stored in the returned Function.

Edit: When writing this post, I didn't realize I could also simply use impl Into<Option<String>> as an alternative, see follow up.

I know it might look like I over-generalize here :sweat_smile: and maybe I often over-generalize indeed? :thinking: Anyway, I really want the generalization here. This code is supposed to be used to later be able to plug-in different languages and machine concepts to use the same (language) generic interface. [1] That's why I'm doing an immense effort to also keep the (supposedly dynamically typed) VMs data type(s) generic, see sandkiste::types. And I still feel like it might not be generic enough in some cases.

The problem I run into is that I might not know what's really best for the callee to get: A chunk_name that is owned or one that is borrowed (because it's needed as owned String for some implementors or, by other implementors, passed to FFI as pointer to a non-null-terminated byte sequence and thus being copied anyway). The structurally right thing to do would (theoretically) be to use something like GenericCow, which I do in other contexts. But it feels even less idiomatic, so I didn't want to use it here. And it would also feel like overkill because we're just talking about cloning a single String in the worst case.

That's been my motivation for the current interface which is simply using Option<String>:

Keep it simple/idiomatic.

(But it is a bit annoying to write Some("…".to_string()) at the caller side when I just have a &'static str there.)

I'm generally not a friend of AsRef or Into to "improve" ergonomics on the caller side. I feel like if some "non String/str" is passed, it should be explicitly converted, e.g. using ToString::to_string() or AsRef::<str>::as_ref().

I like that very much. :smiley: Only thing I'm not so sure about is if I'm overcomplicating things by introducing that? But I like the ergonomics at the callers side. And I usually expect the callee to need an owned string (and if that's not the case, it's just one extra allocation, which isn't really a huge issue). I will definitely give this a thought, thanks!

I like that too, but I also feel like it's overkill to make an extra method just to fix this nuisance of writing Some/None at the callee site. Though it annoyed me enough to think about it. Maybe it's a good (idiomatic) alternative to the Option<String>.

I think it's not nice from the side of the implementor, though that will be a lot of less pain because there will be only a small number of implementors but a huge number of calls to that methods.

Side note: Thinking about this issue (hopefully) also helps me to make better decisions in future for other interfaces with other use-cases. I feel like a lot of people struggle being insecure about whether to taking Strings or &'short strs, or Cows or AsRef<str>s, etc. It feels like a general problem. (Sorry to over-generalize again :rofl:.)

Do you mean inferring the type of chunk_name through the result type of the method? That would require adding a type parameter to sandkiste::errors::MachineError. Again, I feel like this overcomplicates things a bit. And I think it would probably cause a lot of headache when dealing with these errors. (Unless you meant something different.)

Oh, that's (partly) the same conclusion I came to when thinking of GenericCow above. Except I didn't think yet of the None case yet.

:woozy_face:

Yeah, I like that because of the simplicity. However, I feel like it's a bit abusing the type model though? Is this seen in idiomatic code in practice when dealing with optional "identifiers" of some sort?

std::thread::Thread::name, for example, returns an Option<&str> instead of a &str. I also feel like the latter would be somewhat dangerous when users of that method forget to handle the "empty string" case and generate errors that then carry double spaces or quoted empty strings.

In practice, the type C will usually be a String or a &'short str. But I wanted to be generic because some languages might optionally have a (precompiled) byte-code. (In either case, the word "compile" is not entirely fitting anyway, but the idea is to convert some sort of "source(ish)" code into an executable sandkiste::Machine::Function. Including the name in C feels like not giving a real advantage, but I'm not sure.


Currently my favorites are:

  • @quinedot's newtype approach
  • providing two methods
  • leaving the API as is
  • edit: or impl Into<Option<String>>, see next post

By the way, see sandkiste_lua#Example_use for how these very abstract traits are used in a concrete case. And from the callee side, see impl<'a> Compile<'a, String> for LuaMachine<'a> for the (currently only existing) example.


  1. see some small example here ↩︎

Well, since the naïve Option<&str> case doesn't communicate or enforce non-emptiness either, I don't think it's any worse (users can currently pass Some("")).

If you wanted to do this in a perfectly principled way, that would be accepting an Option<ValidatedName> of some sort, but I don't think that's worth the added complexity for something so trivial.

I suppose that is this generic implementation?

impl<T> const From<T> for Option<T> {
    /// Moves `val` into a new [`Some`].
    ///
    /// # Examples
    ///
    /// ```
    /// let o: Option<u8> = Option::from(67);
    ///
    /// assert_eq!(Some(67), o);
    /// ```
    fn from(val: T) -> Option<T> {
        Some(val)
    }
}

So maybe in my case, I need impl Into<Option<String>> (because I mostly expect callees to require an owned version of the string. That could be the solution to avoid the None/Some, yet leaves me with the .to_string() when I provide a constant name.

To tackle that, I guess, I'd run into the problems, which @quinedot and @CAD97 pointed out here:

But I might just use impl Into<Option<String>>. It will also signal users of that API that an owned string is (usually) preferred.


That would then look similar to this:

trait Code {}
impl Code for String {}
impl Code for &str {}

fn compile<'a, C: Code>(_arg: impl Into<Option<String>>, _code: C) {}

fn main() {
    let owned_code = "print 'Hello World!'".to_string();
    compile("main_func".to_string(), owned_code); // at least I can omit the `Some` here
    compile(None, "print 'Hi'");
}

(Playground)

Well, I'm thinking more of the implementors of the Compile trait than of the users. Using the empty string as a special value to denote the absence of the name would work "by convention" only. If you miss the relevant part in the docs, it's easy to forget handling the "empty-string is used as special case" thing.

1 Like

I feel like implementors will usually keep it.

But the more I think about it, maybe it would be even reasonable for the sake of ergonomics to accept the extra clone here.

Then I tried:

trait Code {}
impl Code for String {}
impl Code for &str {}

fn compile<'a, C: Code>(_chunk_name: impl Into<Option<&'a str>>, _code: C) {
    // The following line clones unnecessarily…
    let _chunk_name: Option<String> = _chunk_name.into().map(ToOwned::to_owned);
    // …assuming we need an `Option<String>` here.
}

fn main() {
    let owned_code = "print 'Hello World!'".to_string();
    let chunk_name = "Program1".to_string();
    // But ergonomics is nicer:
    compile("main_func", owned_code);
    compile(None, "print 'Hi'");

    // But wait!?
    let owned_code = "print 'Program two!'".to_string();
    let chunk_name = "Program2".to_string();
    compile(&chunk_name, owned_code); // 😭🤬
    //compile(&*chunk_name, owned_code); // we need this
}

(Playground)

Not really satisfying in the case of passing an owned chunk_name. :weary:


Or I could mix several ideas, and do this:

trait Code {}
impl Code for String {}
impl Code for &str {}

fn compile_with_opt_name<C: Code>(_code: C, _name: Option<String>) {
    // real implementation goes here
}

fn compile_with_name<C: Code>(code: C, name: &str) {
    // we live with the unnecessary clone here if
    // `name` could have been provided as owned:
    compile_with_opt_name(code, Some(name.to_string()))
}

fn compile<C: Code>(code: C) {
    compile_with_opt_name(code, None);
}


fn main() {
    let owned_code = "print 'Hello World!'".to_string();
    let chunk_name = "Program1".to_string();
    compile_with_name(owned_code, "main_func");
    compile("print 'Hi'");

    let owned_code = "print 'Program two!'".to_string();
    let chunk_name = "Program2".to_string();
    compile_with_name(owned_code, &chunk_name);
    
    // Or slightly optimized:
    let owned_code = "print 'Program two!'".to_string();
    let chunk_name = "Program2".to_string();
    compile_with_opt_name(owned_code, Some(chunk_name));
}

(Playground)

It's almost tragic how such a simple problem can be so overly complex. I'm even willing to do unnecessary clones, and yet it feels not very straightforward. :slightly_frowning_face:


Oh, I missed this one:

But...

fn compile_with_name<C: Code, N: ToString>(code: C, name: N) {
    compile_with_opt_name(code, Some(name.to_string()))
}

fn main() {
    /* … */
    compile_with_name("return", true); // yuks!!
}

(Playground)


Maybe trying to avoiding the clone has been the root of all evil. I could perhaps do something simple like that:

trait Code {}
impl Code for String {}
impl Code for &str {}

fn compile<C: Code>(_name: Option<&str>, _code: C) {
    let _name = _name.map(ToOwned::to_owned); // it doesn't really hurt
}


fn main() {
    compile(None, "print 'anonymous'");
    compile(Some("main"), "print 'named'");
    
    let name = Some("main".to_string());
    let code = "print 'owned named'";
    compile(name.as_deref(), code); // it doesn't hurt sooo much either
}

(Playground)

I still have to write Some/None, but I feel like it got a bit less verbose from the caller side.

If you add the general version which takes Option<String>, then the other two methods can have default bodies that delegate to that one, so you have compile and compile_named for the callers but just e.g. compile_impl for the implementors (naming is hard). Of course, impls can also do so themselves; you could even provide a macro to forward both trait methods to an inherent one if you don't want to put it on the trait.

(You could also default forward to not using the name, so implementors who don't use it can ignore it, and those who do use it can reverse for whatever generic placeholder name they want to use.)

Another alternative if the "impl API" and "use API" don't line up super well is to have a separate trait with the impl API and a blanket impl from that to the user trait. (E.g. Iterator/Itertools or RngCore/Rng, although those are due to crate splits necessitating the trait split.)

In general, though, it's pick your poison between

  • more methods,
  • generic Into-style arguments and more fragile type inference, or
  • sometimes awkward type conversions on the caller side.

No, what I'm saying is that if the error is only used in the error case, instead of machine.compile(code, Some("name".into())), you could potentially have machine.compile(code).map_err(|e| e.with_name("name")) and provide the name directly to the error after the compilation.

1 Like

Very good summary of the problem, I think.

Oh right, it's not really that much overhead. I like it. Here with just one extra method compile2:

trait Compile<C> {
    fn compile(name: Option<String>, code: C) -> CompiledFunc;
    fn compile2(name: &str, code: C) -> CompiledFunc {
        Self::compile(Some(name.to_string()), code)
    }
}

fn main() {
    let s = "main".to_string();
    Machine::compile(None, "print 'Hi'");
    Machine::compile(Some(s), "print 'Hi'");
    Machine::compile2("main", "print 'Hi'");
}

(Playground)

For the implementor of Compile<C>, nothing changes, and the caller has the compile2 convenience-method.

Yes. :expressionless:

I'm not too worried about breaking backwards compatibility yet. Afterall, I'm at version 0.1.0, (even if numbers IMHO don't matter, but I don't think there will be any actual users other than myself of the relatively new crate yet… but who knows).

But even if I allowed myself to rename compile, I'm not really sure how I should name those methods. I feel like keeping compile as is and providing a convenience method might be best.

Perhaps:

  • fn compile(name: Option<String>, code: C) (required)
  • fn compile_with_str_name(name: &str, code: C) (with default impl)

But "_with_str_name" is verbose again. If I just wrote _with_name or _named, then it would be shorter. But maybe a bit confusing, because compile allows naming too. On the other hand, the convenience method always takes a name, so this could work:

  • fn compile(name: Option<String>, code: C) (required)
  • fn compile_named(name: &str, code: C) (with default impl)

If I go that way, it isn't a major or minor change in regard to semver compatibility (but still possibly-breaking).


Oh, I see. I think that's not an option for me, because the chunk_name will be used by other methods later too, e.g. sandkiste::Function::call, which may return a sandkiste::MachineError, which also has a chunk_name as an Option<String> field (in case of the implementation for Lua, I get that Option<String> from lua_Debug, which requires that I set the chunk name previously).

It's not relevant for my use-case, but I was curious if providing "convenience" methods which utilize default implementations pose an overhead in regard to the vtable. But testing it, I got weird results. Bare and Fancy are traits, but when I try to get the size of the trait object, I get:

fn main() {
    let s = S;
    println!("{}", std::mem::size_of_val(&s as &dyn Bare));
    println!("{}", std::mem::size_of_val(&s as &dyn Fancy));
}

(Playground)

Output:

0
0

Why do I get zero here as a result in both cases? (See Playground link above for full source.)

size_of_val returns the size of the object itself. This does not include the vtable. The vtable is a property of the type, not of individual instances of the type which share the same vtable.

3 Likes

And how could I print the size of the vtable? I mean, if I store a Box<dyn T>, a bigger vtable requires to allocate more memory, right?

The vtable itself is stored in static memory along with the program code — it is not part of any run-time allocation and never copied. The only thing that needs to be stored at run time is a pointer to the vtable, and that pointer lives in the “fat pointer” to the dyn value, not the value itself.

2 Likes

Ah, that explains this:

fn main() {
    let s = S;
    println!("{}", std::mem::size_of_val(&(&s as &dyn Bare)));
    println!("{}", std::mem::size_of_val(&(&s as &dyn Fancy)));
}

(Playground)

Output:

16
16

So to answer my question:

Yes, the vtable grows, but it's not a problem from a performance p.o.v. because it neither makes the references (or wide-pointers) nor the pointed-to values bigger. The only thing that's bigger is one piece of static memory (which is pointed-to by the metadata of the wide-pointer).

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.