Naming an iterator over enum { char, u8 }

I'm parsing a small, custom, sexpr-like DSL (which I didn't invent and can't change) that includes quoted strings:

(foo "bar\n\377baz")

These strings are UTF-8 with some escaping rules: in addition to \n, \", and a few others, you can refer to an arbitrary byte using an octal escape like \377. That means the "unescaped" form of the string data is not a sequence of char, but a sequence of

enum CharOrByte {
    Char(char),
    Byte(u8),
}

(How exactly the unescaping works, e.g. whether you can unescape U+2014 from \342\200\224, is an annoying point but not relevant to this question.) What I'm wondering is, how would you name a type that implements Iterator<Item = CharOrByte>?

If you call “char” by its true name, “Unicode scalar value”, then maybe that could inspire renaming the enum first

enum Scalar {
    Unicode(char), // a unicode scalar value
    Byte(u8), // non-unicode scalar value
}

Then the iterator could just be Scalars.

Whether such generic names make sense might of course depend on your context… where does this functionality live? Are there lots of other things completely unrelated to strings, so that “scalar” is too unspecific? How is the iterator created? (Often there’s methods/functions whose name iterators can mirror.) Is the usage supposed to be qualified or unqualified? In case the module name relates nicely, a shorter, more unspecific name can be used with the intention of qualified usage, users only importing the module, not the iterator type, the enum type, or its variants.

2 Likes

In case that best-effort-parsing byte sequences as UTF-8 and thus interpreting valid UTF-8 sequences of byte escapes as equivalent to unicode data is desired and you are wondering about the implementation, the implementation of std::str::Utf8Chunks::next seems like a reasonable source of inspiration.

2 Likes

I think I like scalars for the method, Scalars for the iterator, and Scalar for the item. Thanks!

More details on why exactly it's implemented that way can be found in bstr.

3 Likes

In case anyone's interested, you can see my implementation of the Scalars iterator here:

I stole the UTF-8 state machine from @BurntSushi and adapted it for this (slightly different) use-case.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.