Iterating over chars from readers

I see that for example std::io::Read::chars and std::io::BufReader::chars have been deprecated since Rust 1.27.0. It's suggested that str::from_utf8 be used instead. How is that a replacement for a char-at-a-time iterator? It seems fine for decoding slices already in memory from bytes to text, but I can't see how to use it in a character-at-a-time scenarios.

What is the recommended approach for applications that want to

  1. Process a stream (rather than an in-memory slice) of bytes.
  2. Want to choose a bytes-to-chars encoding.
  3. Place a decoder layer using that encoding on top of the byte stream.
  4. Read decoded characters one at a time from the decoder.

I've looked at Kang Seonghoon's rust-encoding project and its documentation, but I don't see anything there or in the stdlib that provides the functionality of e.g. Python's codecs module, where you can do something along the lines of

stream_reader = codecs.getreader(encoding)
char_stream = stream_reader(byte_stream)
for char in char_stream:
    process_char(char)

What would be the equivalent available to Rust application developers?

1 Like

Since a char in a Rust string isn't what you may think intuitively, what you really want depends on your use case.

If what you want is iterating over 'f', 'ö' and 'o' if you have an input string "föo" (note the umlaut i.e. this is a unicode-aware solution, not ASCII-only), you may want to take a look at the unicode-segmentation crate.
It provides an iterator that you can make by calling UnicodeSegmentation::graphemes(s) for a string slice s.

If what you want is iterating over 'f' , 'ö' and 'o' if you have an input string "föo" (note the umlaut i.e. this is a unicode-aware solution, not ASCII-only)

That is indeed the type of thing I was after.

you may want to take a look at the unicode-segmentation crate.

Thanks for the pointer. It seems to operate on strings rather than streams, but it's good to know about the grapheme processing functionality (I was interested just in iterating over code points, at least initially).

An adapter for streams appears to be available as a crate. Make sure to use a BufReader since it probably uses reader.bytes() as its input.

1 Like

Thanks, Georg!