Correct way for taking large input from keyboard

Hello, I have been doing rust for quite a while and i do not have cpp background. I am using the below code for taking input from terminal.
What is the best method to take such large input multiple time, also i know only little cpp and do not know how cpp program is taking less time.

Input format:
10566000
100000 1
100000 1
100000 1
100000 1
100000 1
100000 1
100000 1
100000 1
.      .
.      .
.      . 10566000 times
// cpp
    #include<bits/stdc++.h>
    using namespace std;

    int main()
    {
        ios_base::sync_with_stdio(false);
        cin.tie(NULL);
        int case;
        cin>>case;
        int c = 0;
        while(case > 0)
        {
            int x,y;
            cin>>x>>y;
            c += 1;
            case -= 1;
        }
        cout<<c<<endl;
    }
//$ time bash v_cpp.sh
//  10566000
//  real    0m1.027s
//  user    0m0.993s
//  sys     0m0.083s

original post by @kkroy22

// @kkroy22
    use std::io::{stdin, BufRead, BufReader};

    fn main() {
        let mut buf = BufReader::new(stdin()).lines();
        let mut case = buf.next().unwrap().unwrap().parse::<usize>().unwrap();
        let mut c = 0;
        while case > 0 {
            let xy = buf.next().unwrap().unwrap().split(" ").map(|x| x.parse::<usize>().unwrap()).collect::<Vec<usize>>();
            let _x = xy[0];
            let _y = xy[1];
            c += 1;
            case -= 1;
        }
        println!("{}", c);
    }
//$ time bash v_rust.sh
//  10566000
//  real    0m1.801s
//  user    0m1.790s
//  sys     0m0.094s

Optimization by @notoria

// @notoria
use std::io::{stdin, BufRead, BufReader};

fn main() {
    let s = stdin();
    let mut buf = BufReader::new(s.lock()).lines();
    let mut case = buf.next().unwrap().unwrap().parse::<usize>().unwrap();
    let mut c = 0;
    while case > 0 {
        let input = buf.next().unwrap().unwrap();
        let mut iter = input.split(' ');
        let _x: usize = iter.next().map(|x| x.parse::<usize>().unwrap()).unwrap();
        let _y: usize = iter.next().map(|x| x.parse::<usize>().unwrap()).unwrap();
        c += 1;
        case -= 1;
    }
    println!("{}", c);
}
//$ time bash v_rust.sh
//  10566000
//  real    0m0.777s
//  user    0m0.764s
//  sys     0m0.060s

Cpp equivalence by by @Michael-F-Bryan

// @Michael-F-Bryan
use std::io::{stdin, BufRead, BufReader};

fn main() {
    let s = stdin();
    let mut reader = BufReader::new(s.lock());
    let mut line = String::new();
    reader.read_line(&mut line);
    let mut case: usize = line.trim().parse().unwrap();
    let mut c = 0;
    while case > 0 {
        line.clear();
        reader.read_line(&mut line);
        let mut words = line.trim().split(" ");
        let _x: usize = words.next().unwrap().parse().unwrap();
        let _y: usize = words.next().unwrap().parse().unwrap();
        c += 1;
        case -= 1;
    }
    println!("{}", c);
}
//$ time bash v_rust.sh
//  10566000
//  real    0m0.880s
//  user    0m0.852s
//  sys     0m0.080s

Thank you community !!

1 Like

Use stdin().lock() to avoid locking on every read call. You can also make BufReader with a larger buffer. I think 4KB is the default one.

let _ = … collect::<Vec<usize>>(); 

is a total waste. You're allocating memory and dropping it. Use for_each if you want to iterate, but I don't get what the code is supposed to do, since the C++ version doesn't do anything like it.

If you want to keep values from the while loop, use Vec::with_capacity() and then vec.push or vec.extend(buf.filter_map(|x| x.unwrap().parse().ok()) or such.

4 Likes

Actually here is the scenario:
The above program just reads input and count the number of time it stays in while loop.

Just tried locking the stdin()

use std::io::{stdin, BufRead, BufReader};

fn main() {
    let s = stdin();
    let mut buf = BufReader::new(s.lock()).lines();
    let mut case = buf.next().unwrap().unwrap().parse::<usize>().unwrap();
    let mut c = 0;
    while case > 0 {
        let _ = buf.next().unwrap().unwrap().split(" ").map(|x| x.parse::<usize>().unwrap()).collect::<Vec<usize>>();
        //let x = xy[0];
        //let y = xy[1];
        c += 1;
        case -= 1;
    }
    println!("{}", c);
}

However did not get the improvement in execution time. I think the allocating and de-allocating the memory inside while loop along with space character b' ' is not a good idea.

Try this:

use std::io::{stdin, BufRead, BufReader};

fn main() {
    let s = stdin();
    let mut buf = BufReader::new(s.lock()).lines();
    let mut case = buf.next().unwrap().unwrap().parse::<usize>().unwrap();
    let mut c = 0;
    while case > 0 {
        let input = buf.next().unwrap().unwrap();
        let mut iter = input.split(' ');
        let _x: usize = iter.next().map(|x| x.parse::<usize>().unwrap()).unwrap();
        let _y: usize = iter.next().map(|x| x.parse::<usize>().unwrap()).unwrap();
        c += 1;
        case -= 1;
    }
    println!("{}", c);
}
1 Like

This would be an equivalent program which only uses the one buffer (adapted to run on the playground).

use anyhow::{Context, Error};
use std::io::{self, BufRead, BufReader, Read};

fn count_lines<R: Read>(reader: R) -> Result<(), Error> {
    let mut reader = BufReader::new(reader);
    let mut line = String::new();

    // read the line count (not really necessary if you are just reading to the
    // end of input)
    reader.read_line(&mut line)?;
    let mut case: usize = line.trim().parse()?;

    // Note: this is also just "case" but counting up instead of down
    let mut c = 0;

    while case > 0 {
        // make sure our line buffer is empty
        line.clear();
        reader.read_line(&mut line)?;

        let mut words = line.trim().split(" ");
        let x: usize = words.next().context("No first word")?.parse()?;
        let y: usize = words.next().context("No second word")?.parse()?;

        c += 1;
        case -= 1;
    }

    println!("{}", c);

    Ok(())
}

fn main() -> Result<(), Error> {
    let input = r#"8
                   100000 1
                   100000 1
                   100000 1
                   100000 1
                   100000 1
                   100000 1
                   100000 1
                   100000 1"#;

    count_lines(input.as_bytes())?;

    Ok(())
}

(playground)

I'm guessing this is a simplification of your real application because x and y never get used, and case and c are actually just the same thing counting down instead of up. Similarly, the first line indicating the number of entries isn't necessary if you just read until EOF.

1 Like

.lines() will generate a sequence of Strings, each allocated on the heap. You can avoid this by re-using a String like

    let mut input_line = String::new();

    loop {
        input_line.clear();
        match buf.read_line(&mut input_line) {
            Ok(0) => break,     // quit the loop, EOF
            Ok(_) => {}         // process the line
            Err(_) => continue, // ignore the line and read the next one. Line is probably non-utf8
        }

        // process `input_line`
        ...
    }
2 Likes

Hello, I think now this code is performing better than cpp code, here is the performance for your version of code

$ time bash v_rust.sh
10566000
real    0m0.777s
user    0m0.764s
sys     0m0.060s

Thank you very much

You are welcome. I think @Michael-F-Bryan and @qaopm solution/idea should make it even faster.

Hello,
I changed the syntax and preserve your method and found it to be better than cpp version. also i love to see some advance version of my program by using anyhow crate. Thank you very much.

use std::io::{stdin, BufRead, BufReader};

fn main() {
    let s = stdin();
    let mut reader = BufReader::new(s.lock());
    let mut line = String::new();
    reader.read_line(&mut line);
    let mut case: usize = line.trim().parse().unwrap();
    let mut c = 0;
    while case > 0 {
        line.clear();
        reader.read_line(&mut line);
        let mut words = line.trim().split(" ");
        let _x: usize = words.next().unwrap().parse().unwrap();
        let _y: usize = words.next().unwrap().parse().unwrap();
        c += 1;
        case -= 1;
    }
    println!("{}", c);
}

//$ time bash v_rust.sh
//  10566000
//  real    0m0.880s
//  user    0m0.852s
//  sys     0m0.080s

As a summary of the optimisations we've used:

  • Lock stdin at the start of the process so we don't need to re-lock it on every read
  • Use a buffered reader so we read more input less frequently (Edit: this actually isn't needed because std::io::StdinLock is already buffered and implements BufRead)
  • Reuse the buffer we write each line into (line)
  • Instead of collect()-ing the parsed words into a new Vec<usize> which is immediately thrown away, manually iterate over line.split() by calling the iterator's next() method
4 Likes

std::io::stdin() already is buffered. Re-buffering it only increase the memcpy calls.

Edit: this behavior is documented.

7 Likes

That's a good point.

It looks like the StdinLock we get when locking stdin also implements BufRead, so we can drop the intermediate BufReader and just call locked_stdin.read_line() directly. That should cut the number of copies in half.

1 Like

Oh, I didn't know about this. I wish clippy have a lint for this. Created BufReader Stdin or StdinLock · Issue #6755 · rust-lang/rust-clippy · GitHub

2 Likes

I suggest reading all input at once to a String, and then parse the numbers.
It will be faster, but the downside is that it has to reserve for the whole input (~ 91 MB).

use std::{io, io::prelude::*};

fn main() {
    let stdin = io::stdin();
    let mut buf = String::new();
    stdin.lock().read_to_string(&mut buf).unwrap();

    let mut tokens = buf.split_ascii_whitespace();
    let n: usize = tokens.next().unwrap().parse().unwrap();

    let x: Vec<_> = tokens.take(n*2).map(|s| s.parse::<i32>().unwrap()).collect();
    println!("{} {}", n, x.len());
}
$ time target/release/kbdread < input.dat 
10566000 21132000

real    0m0,492s
user    0m0,319s
sys     0m0,172s
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.