Indexing_fmt: A helper crate to format super¹²³ and sub₈₄₀ scripts

Example

use indexing_fmt::*;

let index = 12;
let name = format!("Ship{}", index.to_superscript());
assert_eq!(name, "Ship¹²");

let index = 840;
let name = format!("Docking-Bay{}", index.to_subscript());
assert_eq!(name, "Docking-Bay₈₄₀");
3 Likes

Can you please explain why "extern crate std;" is needed for your project:

mod test {
    use super::*;
    extern crate std;

For what I have read recently, they tell us that for modern Rust, the extern keyword is not any longer needed in most cases. E.g. using the popular rand crate now requires only an entry in Cargo.toml, but not longer "extern crate rand". But there are exceptions, as your project shows. Under which circumstances is extern keyword still needed. I am especially surprised, that it is required for the standard library in your case.

The code of the library is marked as #![no_std](line 1 in src/lib.rs). This means that the overall build does not utilize and std functionality. However, for testing, we can use std and in order to do so I have used the extern crate std syntax. This is purely for testing and the only component that I use is the format! macro.

To be honest, I do not know if there is a more elegant way of avoiding the extern crate std syntax. It could be the case and I simply do not know.

1 Like

No, this is completely fine (though it is more common to put the extern crate std conditionally at the root). It is possible to avoid extern crate syntax by making the no_std attribute conditional,

#[cfg_attr(not(test), no_std)]

but this is actually worse, because it means your code gets compiled with the std prelude active despite not intending to use it. What you’ve done is, IMO, best practice; though if there were multiple test modules, it would make sense to write the extern crate at the root:

#[cfg(test)]
extern crate std;

Pedantically, you could also avoid std in favor of alloc, because the format macro is in alloc:

mod tests {
    extern crate alloc;
    use alloc::format;

    #[test]
    fn superscript_single_digit() {
        let res = format!("value{}", 1.to_superscript());
        ...

But there is no specific advantage to doing that in tests, since tests (currently) always depend on std anyway (unless you are using a custom test harness). The place where this would matter is if your library's non-test code needed to allocate strings, in which case it would be good for it to depend on alloc but not std.

The circumstances where you need extern crate are whenever you are using a non-default crate from the sysroot — the set of libraries that are distributed pre-compiled with Rust toolchains. Cargo doesn’t know about these crates, so it cannot declare them for you.

You can see the list of such crates by going to std - Rust and looking at the crate list on the left sidebar. Of those crates:

  • It’s normally not necessary to declare std because it's implicitly declared, but #![no_std] removes that implicit declaration.
  • It’s never necessary to declare core.
  • It’s always necessary to declare alloc, but you only need to do that in #![no_std] because std re-exports all of alloc.
  • proc_macro is only used in macros.
  • test is unstable.

So, explicitly declaring a crate is rare and usually only appears in #![no_std] code. There are also occasional uses for extern crate foo as bar; to bring in a crate under a different name.

7 Likes

Thanks for this very clear and long answer :+1:

I was wondering if it is also possible to declare no_alloc somehow but after some digging through the web, I did not find anything.

#![no_std] implies "no alloc"; so you must explictly opt into alloc (i.e., extern crate alloc). There is no way to only opt out of std. In other words you did declare no_alloc by using no_std even though it's not obvious.

Not really related; but on crates.io, you can use the category slug no-std::no-alloc to try and "advertise" that your crate is not dependent on alloc (and thus, std).

1 Like

Neat, guess I'm curious if there are any plans on adding char based sub/superscripts
Aₓ, perhaps via a try_from from char to Option<Superscript<char>> or some such.

This will be much more complicated. Right now I am using a simple lookup table and decompose every number by base10 and then simply insert the corresponding unicode character. Doing something like this would probably not so easily possible for general chars. It would also require much more manual work. For example, the unicode char for "ₐ" is '\u{2090}' but the next entry is '\u{2091}' which is "ₑ". Try to search for a subscript of the small latin letter "b".

You could use a lookup table derived from the official UCD <sub> decompositions, e.g.:

2090;LATIN SUBSCRIPT SMALL LETTER A;Lm;0;L;<sub> 0061;;;;N;;;;;
2091;LATIN SUBSCRIPT SMALL LETTER E;Lm;0;L;<sub> 0065;;;;N;;;;;
2092;LATIN SUBSCRIPT SMALL LETTER O;Lm;0;L;<sub> 006F;;;;N;;;;;
2093;LATIN SUBSCRIPT SMALL LETTER X;Lm;0;L;<sub> 0078;;;;N;;;;;
2094;LATIN SUBSCRIPT SMALL LETTER SCHWA;Lm;0;L;<sub> 0259;;;;N;;;;;
2095;LATIN SUBSCRIPT SMALL LETTER H;Lm;0;L;<sub> 0068;;;;N;;;;;
2096;LATIN SUBSCRIPT SMALL LETTER K;Lm;0;L;<sub> 006B;;;;N;;;;;
2097;LATIN SUBSCRIPT SMALL LETTER L;Lm;0;L;<sub> 006C;;;;N;;;;;
2098;LATIN SUBSCRIPT SMALL LETTER M;Lm;0;L;<sub> 006D;;;;N;;;;;
2099;LATIN SUBSCRIPT SMALL LETTER N;Lm;0;L;<sub> 006E;;;;N;;;;;
209A;LATIN SUBSCRIPT SMALL LETTER P;Lm;0;L;<sub> 0070;;;;N;;;;;
209B;LATIN SUBSCRIPT SMALL LETTER S;Lm;0;L;<sub> 0073;;;;N;;;;;
209C;LATIN SUBSCRIPT SMALL LETTER T;Lm;0;L;<sub> 0074;;;;N;;;;;

... but you're right that there's no <sub> 0062 for "b".

There's better coverage for <super>, but still not the whole Latin alphabet.

1 Like

Indeed, exactly why I suggested it should return an Option as a try fn.

Is this something that you would use and like to see in this crate @ratmice? Or was this just asked out of curiosity?


Not sure why I couldn't see the subscript. I am using google pixel 8.

1 Like

Out of curiosity, most of my uses of sub and superscripts are embedded within parsers/lexical analysers where I end up embedding the unicode characters directly in regular expressions.

However most of the super/sub scripts I actually end up using are characters rather than digits
which is why I point it out.

I think if I was going to actually use the crate, it'd be within the AST, and I'd also need some sort of from_subscripted_text as well which takes a string including trailing subscripts. and produces a Subscript, but i've never really had problems just working with unicode characters directly in the AST, so it would not likely be necessary.

The actual regular expressions including all the sub/super scripts do end up admittedly inscrutable, but I don't see how the crate can help with that since the regex are read from a yacc like format. So not a convenient mechanism to use any crate, I'd end up both writing and reading the file from build.rs to use the crate which is pretty nasty, and the regexp is pretty well commented already.

I sometimes use subscripts and superscripts, but qaz.wtf is enough for me – precisely because not all letters can be used with subscripts or superscripts.

I wonder who and why may use these in non-interactive way…

I see the same thing on my phone if I use Chrome. But I don't use Chrome on my phone and instead use Firefox (for the uBlock Origin extension), and I don't see that issue.

1 Like

Seems like a browser-specific problem. Sometimes parts of the spec are simply not supported. However I do not know for sure about this particular implementation. The crate has some tests to which check every sub and superscript for each digit at least once.

My problem was that I wanted to generate a number of variable names with subscripts. So something like

for n in 0..n_vars {
    let sub = Subscript(n);
    println!(“parameter{}“, sub);
}

This problem is solved nicely by the crate.

I am very curious to hear about this use-case. Are you parsing mathematical equations?

Not parsing mathematical equations, it just allows them in type and variable names.
it's just a theorem prover where variable and type names may contain super/sub scripts... It'd be easiest for me to just point to some examples from the language lean, since that language which inspired my implementation...

def List.last : (as : List α) → as ≠ [] → α
  | [a],         _ => a
  | _::a₂:: as, _ => (a₂::as).last (by simp)

Most lean code is input via vscode, and using some input manager in the text editor like via the vscode-lean plugin which binds backslash unicode input

I've done some experiments trying to use a bidirectional parsers to go from the ascii "raw input" to the unicode text and back losslessly. Somewhat combining the input manager and the language.

That is basically the gist of what I was doing, been a while since i've had a chance to work on it.