Recommended way to create struct without a lot of duplication

Hello,

I am fairly new to Rust and I'd like to learn what is the best/recommended way to read struct fields (from a file, for example) without a lot of memory duplication.

Let me exemplify what I mean. Take the following C++ code:

struct foo {
  int bar;
  string baz;
};

foo make_foo_from_stdin() {
  foo f;
  cin >> f.bar >> f.baz;
}

We're reading a foo instance directly from stdin, with no intermediate variables.

In Rust, how could we do this? Is there a way?
The best I know would be something along the lines of:
(not an exact replica because of C++'s streams processing of whitespace and all that)

struct Foo {
  bar: i32,
  baz: String,
}

fn make_foo_from_stdin() -> std::io::Result<Foo> {
  let mut buf = String::new();
  let stdin = std::io::stdin();
  stdin.read_line(buf)?;
  let bar: i32 = buf.parse();
  buf.clear();
  stdin.read_line(buf)?;

  Ok(Foo { bar, buf })
}

This is harder to read, harder to maintain (we possibly have to duplicate a lot of the struct's fields into local variables) and less efficient (we possibly have to duplicate a lot of memory).

I'm sure I'm doing something very wrong here, so I'd love to know how to properly do something like this in Rust.

Thanks in advance :slight_smile:

First of all, you should not worry about “duplicate a lot of the struct's fields into local variables”. The compiler is good at eliminating unnecessary memory copies, and even a few extra copies are likely to be unnoticeable unless you are writing a bulk data processing program that works on very large inputs.

Second, there are a lot of ways in which the cin >> C++ approach as you've written it is actually worse. For example, if the input is not a number (including if EOF is encountered), it will (if I remember correctly — it's been years since I touched C++ IO) fill the field with 0 leave the field uninitialized instead of immediately signaling an error.

If you're doing a lot of converting string input into fields, you should probably use a parsing library (if it's a custom syntax for each field) or a deserialization library (if it's a uniform syntax like JSON). Yes, it's more work up front, but it means you are more precisely specifying the behavior of your program and in particular its response to incorrect input.

1 Like

You can also write your own “miniature parsing library” to suit your specific needs. Your function becomes almost as short as the C++ one with a couple of helpers:

use std::io;
use std::str::FromStr;

struct Foo {
    bar: i32,
    baz: String,
}

fn make_foo_from_stdin() -> io::Result<Foo> {
    Ok(Foo {
        bar: read_and_parse()?,
        baz: read_line()?,
    })
}

fn read_line() -> Result<String, io::Error> {
    let mut buf = String::new();
    io::stdin().read_line(&mut buf)?;
    Ok(buf)
}

fn read_and_parse<T: FromStr>() -> Result<T, io::Error>
where
    T::Err: std::error::Error + Send + Sync + 'static, // needed to wrap as io::Error
{
    read_line()?
        .parse()
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e))
}

Playground

(Of course, as written, this allocates more strings than necessary. That could be fixed by making read_line and read_and_parse methods of a struct which holds a buffer to be reused, handing it off and making a new one whenever an actual String is wanted.)

1 Like

Right, makes sense I suppose.

It sets off an exception, as far as I'm aware.

Feels a bit over-the-top to pull a most likely big dependency just to parse a small file, but yeah, for bigger stuff it would make sense.

This looks way better than what I was doing, thanks!

But you really shouldn't invent your own format. If you use something common like JSON, then it will be much easier for others to interact with whatever code/data you generate from the serialized representation.

This is simply not true. Others have already commented on the memory use fallacy. The C++ iostream methods actually behave worse than Rust's approximate equivalents. They sometimes silently ignore errors, sometimes they behave counter-intuitively (you can only read a string up to the first whitespace using >>), or downright lead to undefined behavior (if the int overload fails, it doesn't initialize the variable, so you end up reading am uninitialized value, which is instant UB).

The Rust code is not longer just for the sake of being longer. You should not assume that Rust is deliberately trying to be painful or hard to maintain. It's the opposite – Rust is being explicit, making the code easier to read and harder to write incorrectly.

3 Likes