How to do low friction text input in Rust?

Hi! What is the most concise way of reading text data from the standard input, if I am allowed to use only the standard library? Suppose I need to read a matrix, in the following format (height, width, elements)

2 3
92 42 62
0  1  2

In C++, I would do the following:

#include <iostream>
#include <vector>
#include <cstdint>

std::vector<std::vector<int32_t>> read_matrix() {
    size_t n;
    std::cin >> n;
    size_t m;
    std::cin >> m;

    std::vector<std::vector<int32_t>> result(n, std::vector<int32_t>(m, 0));
    for (size_t i = 0; i < m; ++i) {
        for (size_t j = 0; j < n; ++j) {
            std::cin >> result[i][j]; 
        }
    }

    return result;
}

In Java, there are Scanner and StreamTokenizer. In Python, there is .split_lines(), .split() and map(int, line). What should I do in Rust?

I've came up with

use std::io;
use std::io::prelude::*;

fn read_matrix() -> Vec<Vec<i32>> {
    let stdin = io::stdin();
    let handle = stdin.lock();
    let mut lines = handle.lines();

    let dimensions_line = lines.next().unwrap().unwrap();
    let mut dimensions = dimensions_line.split_whitespace();
    let n: usize = dimensions.next().unwrap().parse().unwrap();
    let m: usize = dimensions.next().unwrap().parse().unwrap();

    let mut result = vec![vec![0i32; m]; n];
    for row in 0..n {
        let line = lines.next().unwrap().unwrap();
        let mut words = line.split_whitespace();
        for col in 0..m {
            result[row][col] = words.next().unwrap().parse().unwrap();
        }
    }
    result
}

But this is horribly ugly because of all double unwraps and "useless" temporaries. Is there a better way?

For the context, I want to use Rust to solve typical problems from the algorithms class, so I am constrained to only use the standard library, but I can trust the input and just panic! if it does not conform to the expected format.

First I'd separate reading and parsing if you can. Implementing FromStr will let you call .parse() on any string.
You could do something like:

let matrix: Vec<Vec<i32>> = s.lines().map(|line| {
    line.split_whitespace().map(|n| n.parse().unwrap()).collect()
}).collect();
// check the size of your matrix...
// return a Result
2 Likes

I'd use scan-rules, the crate. The reason that I'm answering with a crate is that libstd doesn't have a low-friction way to do this.

1 Like

Yep, there are nice solutions for parsing in the ecosystem, but unfortunately it is not that easy to add external crates to the testing systems :frowning:

I do use whiteread (shameless plug). It has a single file implementation, so one can just copy-paste it. (It's quite long, but mostly because of comments).

There's also a short macro implementation rust-si, which is also single file and even less than 100 lines. The small disadvantages are runtime parsing of format string and allocating a String on each scanned value. Usually it's not a problem, I guess, but when you have millions of ints to read it could start being one.

2 Likes

Hm, actually splitting everything on whitespace is a great idea!

It is possible to write a 15 lines library to do this!

use std::io::{self, Read};
use std::str::FromStr;

struct TextReader<'a> {
    tokens: std::str::SplitWhitespace<'a>
}

impl<'a> TextReader<'a> {
    fn new(text: &str) -> TextReader {
        TextReader { tokens: text.split_whitespace() }
    }

    fn read<T: FromStr>(&mut self) -> T {
        let token = self.tokens.next().expect("EOF reached");
        match token.parse() {
            Ok(x) => x, 
            Err(_) => panic!("Failed to parse {:?}", token),
        }
    }
}

fn read_matrix() -> Vec<Vec<i32>> {
    let mut buffer = String::new();
    io::stdin().read_to_string(&mut buffer).unwrap();
    let mut r = TextReader::new(&buffer);

    let n = r.read::<usize>();
    let m = r.read::<usize>();

    let mut result = vec![vec![0; m]; n];
    for row in 0..n {
        for col in 0..m {
            result[row][col] = r.read::<i32>();
        }
    }
    result
}

I wonder if it is somehow possible to simplify API to just

let r = TextReader::from_stdin()

If I try to do it in the straightforward way I run into "store a value and a reference to it inside the same struct" problem...

That's a really interesting problem whose solution I would love to know.

What we really need is for BufRead to implement split_at_whitespace(). It already has a split which splits on only a single specific byte and enumerates Vec<[u8]>s, which are easily convertible to String::from_utf8() without additional allocations. Adding split_at_whitespace here would be far more memory efficient than loading a whole file to memory.

1 Like

I tried some ways to make (owned, self-borrowed) structs generically usable, but I found myself at a loss after various attempts. So I gave up and did the simple thing specialized for your case. TextReader below should be safely moveable into a different context than where you read your data to a string.

struct TextReader {
    source: String,
    tokens: std::str::SplitWhitespace<'static>
}

impl TextReader {
    fn new(text: String) -> TextReader {
        let mut tr;
        unsafe {
            tr = TextReader {
                source: text,
                tokens: std::mem::uninitialized(),
            };
            std::ptr::write(
                &mut tr.tokens,
                transmute::<&str, &'static str>(&tr.source).split_whitespace());
        }
        tr
    }

    // ...
}

Playground

Safety

  • This technique is safe only if moving the owned value does not invalidate the borrowed reference. Specifically, this is not safe if instead of String you had [T; N] or some other string representation using short-string optimization, since moving the TextReader struct would invalidate the (unsafe) borrow.
  • Also, for safety, it is important also that SplitWhitespace not implement Drop (which it has absolutely no need to here).