Vector of str from stdin - possible?

Hello everyone,

I wanna create vector of str (not String) elements. All data will come from io::stdin();
Every element have the same size.
push works fine for String but not for str.
It's possible to achieve this goal?

&T is a reference of T. &str is a reference of str, which is a contiguous region of UTF-8 encoded bytes. To have a reference, someone else should own the value during the entire lifetime of the reference. In this case, who will own the strs?

2 Likes

I know but how this apply to my question?
I will make an example what I want to do:
Input:
A022
A045
A458

Now I want to put this into vector of str. I know that lenght of the elements will be always 4bytes.
So where is the problem? In the way of working push function?

You are trying to use a borrowed type in a place that requires ownership. Use an owned type instead. You can use String, or something like [u8; 4].

1 Like

I know how to use the String but performance it isn't satisfying for me :wink:
So there is no way to put data from stdio into Vector of str type values?

To have &str generated at runtime, you must have a String somewhere. &str can't store anything.

2 Likes

Are you using the --release flag when you compile?

2 Likes

It's not that problem. To be precise it works quite fast but I know this still could be faster so I am looking a way how Can I improve that :slight_smile:

If you want a owned type that doesn't allocate, it will have to be something like [u8; 4].

1 Like

Thank you I will check If I can use something like this to my purposes. I didn't now this could contains letters also.

I mean, a string is just a sequence of bytes that happen to be valid utf-8. You may find a wrapper like this one useful:

use std::ops::Deref;
use std::fmt;

#[derive(Copy, Clone)]
struct FourByteString {
    inner: [u8; 4],
}

impl FourByteString {
    pub fn new(s: &str) -> Self {
        if s.len() != 4 {
            panic!("Invalid length");
        }
        let mut inner = [0; 4];
        inner.copy_from_slice(s.as_bytes());
        Self {
            inner,
        }
    }
    
    pub fn as_str(&self) -> &str {
        std::str::from_utf8(&self.inner).unwrap()
    }
}

impl Deref for FourByteString {
    type Target = str;
    fn deref(&self) -> &str {
        self.as_str()
    }
}
impl fmt::Display for FourByteString {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        fmt::Display::fmt(self.as_str(), f)
    }
}
impl fmt::Debug for FourByteString {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        fmt::Debug::fmt(self.as_str(), f)
    }
}

fn takes_str(a: &str) {
    println!("{}", a);
}

fn main() {
    let fbs = FourByteString::new("abcd");
    
    // This call works because of the Deref impl.
    takes_str(&fbs);

    // This call works because of the Display impl.
    println!("{}", fbs);
}
4 Likes

Thank you I will do some research yet and check this later. I hope this will be helpful or if not I will find another way to improve my code.

What are you comparing it against? For a micro-benchmark like this there isn't much room for optimisation.

  1. Read all input into a sufficiently large buffer at the very start
  2. Use your knowledge that the input string is ASCII to unsafe-ly skip UTF8 validation when yielding lines
  3. Use your knowledge about the input format (every line is 4 ascii characters followed by a newline) to let you calculate the number of lines and their position in O(1)

All of this can be done by making a wrapper around a Vec<u8>, but by that point your solution will be so customised for this particular use case that you can't use it elsewhere.

The "best" solution which still works for arbitrary inputs would be something like this:

use std::{
    error::Error, str,
    io::{self, Read},
};

fn main() -> Result<(), Box<dyn Error>> {
    let mut stdin = io::stdin();

    let mut buffer = Vec::new();
    stdin.read_to_end(&mut buffer)?;

    let lines: Vec<&str> = str::from_utf8(&buffer)?.lines().collect();

    Ok(())
}

(playground)

3 Likes

Idea with using buffer seems great. The question is how I could print it fast to the stdout?

I tried your code in the way like this:

fn main() -> Result<(), Box<dyn Error>> {
let mut stdin = io::stdin();
let mut buffer = Vec::new();
stdin.read_to_end(&mut buffer)?;
for i in buffer.iter() {
    print!("{}", i);
}
Ok(())
} 

Printing is quite slow even in this way without changing numbers to equivalent characters.
For 10 millions lines input data performance was similiar to constuction like below:

fn main() -> Result<(), Box<dyn Error>> {
let mut stdin = io::stdin();

let mut buffer = Vec::new();
stdin.read_to_end(&mut buffer)?;
let lines = unsafe { str::from_utf8_unchecked(&buffer) };
let tests: Vec<&str> = lines.split("\n").collect();
     for i in tests.iter() {
        println!("{}", i);
     }

Do you have some hints how I could improve printing to the std out?

println! locks stdout every call, so when calling it in a loop it's slow. You can try acquiring the lock manually.
https://nnethercote.github.io/perf-book/io.html

Good to know, unfortunately there is no difference in performance acquiring the lock manually. I just tried the code from your link.

Try wrapping stdout in a BufWriter

Could you explain me this in more detail?
Do you mean that I should try to create new BufWriter and catch data from stdin inside?

use std::io::{self, BufWriter, Write, Error};

fn main() -> Result<(), Error> {
  let input: String = ...;

  // Wrap stdout in a buffered writer
  let stdout = io::stdout();
  let mut writer = BufWriter::new(stdout);

  // Write your output
  for line in input.lines() {
    writeln!(writer, "{}", line)?;
  }

  // Then flush the buffered writer so it's seen on the screen
  writer.flush()?;
  
  Ok(())
}

Every write to stdout is a call into the kernel, which can be quite expensive. Additionally, stdout is line buffered by default so every time it sees a newline the kernel will flush the output to whoever is reading your process's stdout (the terminal).

By wrapping a std::io::Writer in a BufWriter each call to the buffered writer's write() method will just copy the bytes into an internal Vec<u8>. It will then periodically choose to write that data to the wrapped writer in bulk, so the result is that you do a couple large writes (e.g. 4k) instead of hundreds of little writes (e.g. 5 bytes - the 4 byte input and a newline).

Processors are really good at copying bytes around and making a round trip into the kernel when you write to stdout is relatively expensive (it has to save registers and switch to the kernel stack, find the right handle/pipe for stdout, possibly notify the reading process that data is available, etc.) so doing writes in bulk helps to cut down on the write call's overhead.

4 Likes

If I have a vector of structures like this:

    struct JustForShow {
        el1: String,
        el2: u64,
        el3: String,
    }

and I want to concatenate all the strings from el3 into one long String it is a better way than using push_str in loop to achieve this?