UNIC 0.7.0 and unic-cli App

UNIC 0.7.0 is finally out, with new goodies! :tada:

In this release, we have many new components, and the first UNIC application: unic-cli.

unic-cli App

https://crates.io/crates/unic-cli/

The unic-echo command works similar to the Unix echo command, but can input/output in various formats, including Unicode codepoints (UTF-32), UTF-16 hex and UTF-8 hex. It can also output Rust literal strings, for when you want to keep your Rust source files free of control or non-ASCII characters.

The unic-inspector command is almost unique [:slight_smile:], only second to Ruby's uniscribe, by Jan Lelis! It helps you inspect/debug your strings right in the command prompt. (Options for more character properties and algorithms are coming in the next releases.)

As usual, use Cargo to have the new commands at your fingertips:

$ cargo install unic-cli


The Release Notes

https://github.com/behnam/rust-unic/releases/tag/v0.7.0

UNIC Applications

UNIC Applications are binary creates hosting in the same repository as unic super-crate, under the apps/ directory. These creates are not internal parts of the unic library, but tools designed and developed for the general audience, also serving as a test bed for the UNIC API. We are starting with CLI applications, and possibly expanding it to GUI and WEB applications, as well.

  • [unic-cli] The new UNIC CLI application provides command-line tools for working with Unicode characters and strings. In this release, first versions of unic-echo and unic-inspector commands are implemented.

New Components

Character Property

  • [unic-ucd-common ] Common character properties (alphabetic, alphanumeric, control, numeric, and white_space).
  • [unic-ucd-ident] Unicode Identifier character properties.
  • [unic-ucd-segment] Unicode Segmentation character properties.
  • [unic-emoji-char] Unicode Emoji character properties.

String Algorithm

  • [unic-segment] Implementation of Unicode Text Segmentation algorithms (Grapheme Cluster and Word boundaries).

Other Updates

This release was delayed for a couple of cycles, because of the problems with running tests in a workspace with a mix of std and no-std creates. The issue is resolved as of 1.22.0.

  • Enable no_std for many of the existing components.
  • Bumped minimum Rust to 1.22.0.
  • Lots of small fixes for data types and internal structure updates.
5 Likes

These tools look amazing. I recently started using the idna crate for punycode conversion support, which works great. I briefly looked at unic-idna, but opted for what appeared to be the far greater used idna. Is there any reason that I might prefer to switch to unic-idna?

Thanks!

1 Like

Thanks, @bluejekyll.

As noted in unic-idna's README file, the code is based on idna, and, at the moment, their functionalities are exactly the same, and we keep both at the latest Unicode version.

For unic-idna, we have plans to change the API as we revise the string-level iteration and conversion APIs in UNIC. On the other hand, idna has a stable API and better to use in more stable projects.

Regarding implementation unic-idna uses unic-ucd (and its sub-components) for the character-level data, while idna maintains everything in-repo. Therefore, if you need access to other UCD character properties, unic-idna would save you some memory.

1 Like

Thank you for the explanation. Given your understanding of the stability of the API I may opt to stay with idna for now. I don't have any strong need for getting at more character properties at-the-moment, but will keep that in mind.

I'll look for stability of the API as the time when I'll switch over. Awesome work on this project, it's very detailed.

1 Like