I see that for example std::io::Read::chars and std::io::BufReader::chars have been deprecated since Rust 1.27.0. It's suggested that str::from_utf8 be used instead. How is that a replacement for a char-at-a-time iterator? It seems fine for decoding slices already in memory from bytes to text, but I can't see how to use it in a character-at-a-time scenarios.
What is the recommended approach for applications that want to
Process a stream (rather than an in-memory slice) of bytes.
Want to choose a bytes-to-chars encoding.
Place a decoder layer using that encoding on top of the byte stream.
Read decoded characters one at a time from the decoder.
I've looked at Kang Seonghoon's rust-encoding project and its documentation, but I don't see anything there or in the stdlib that provides the functionality of e.g. Python's codecs module, where you can do something along the lines of
stream_reader = codecs.getreader(encoding)
char_stream = stream_reader(byte_stream)
for char in char_stream:
process_char(char)
What would be the equivalent available to Rust application developers?
Since a char in a Rust string isn't what you may think intuitively, what you really want depends on your use case.
If what you want is iterating over 'f', 'ö' and 'o' if you have an input string "föo" (note the umlaut i.e. this is a unicode-aware solution, not ASCII-only), you may want to take a look at the unicode-segmentation crate.
It provides an iterator that you can make by calling UnicodeSegmentation::graphemes(s) for a string slice s.
If what you want is iterating over 'f' , 'ö' and 'o' if you have an input string "föo" (note the umlaut i.e. this is a unicode-aware solution, not ASCII-only)
Thanks for the pointer. It seems to operate on strings rather than streams, but it's good to know about the grapheme processing functionality (I was interested just in iterating over code points, at least initially).