File I/O beyond strings

I'm plodding through Rust By Example [RBE] and I have a question on the file I/O library. RBE's examples are limited (reasonably so; this is not a complaint) to writing and reading lines of text as strings. Where is a good resource for learning about binary I/O? Ibid for builtin types, such as f64? I've been doing printf("%10.6f"... for so long I've almost forgotten life without it.) While awaiting your response, I'l just keep using string formatting as a workaround.

printf() is still text I/O. If you mean "formatted" I/O, that's performed by the write!() family of macros. For printing a float, since it's Display, you could do

writeln!(file, "{}", the_float)?;

Thanks. I'm embarrassed to have asked that part of my question. I had read that in RBE but it didn't "stick"

1 Like

I think this area is broad enough that you won't find any one answer.

Generally, you'll write code against the std::io::Read and std::io::Write traits directly. The byteorder crate has a useful byteorder::WriteBytesExt trait that can be used to write various types into something implementing std::io::Write. There's also a byteorder::ReadBytesExt extension trait that does the same thing for reading.

You can chain this together to implement your own quick'n'dirty file format. For example, say we had the following Person type:

use byteorder::{ReadBytesExt, WriteBytesExt, BE};
use std::io::{Error, ErrorKind, Read, Result, Write, Cursor};

#[derive(Debug)]
struct Person {
    name: String,
    age: u32,
    height: f32,
}

I could write each field to a writer as binary:

fn save_person<W>(writer: &mut W, person: &Person) -> Result<()>
where
    W: Write,
{
    // Save a `Person` struct by writing each of its fields one-by-one
    write_string(writer, &person.name)?;
    writer.write_u32::<BE>(person.age)?;
    writer.write_f32::<BE>(person.height)?;

    Ok(())
}

fn write_string<W>(writer: &mut W, s: &str) -> Result<()>
where
    W: Write,
{
    // We want to know ahead of time how long the string is so we can allocate
    // a suitably sized buffer rather than either reading byte-by-byte until a 
    // null byte (C-style) or accidentally reading too much.
    let length = u32::try_from(s.len()).unwrap();
    writer.write_u32::<BE>(length)?;
    // now we've written the length field, write the string's bytes in as-is
    writer.write_all(s.as_bytes())?;
    Ok(())
}

Reading it is essentially the same. We just need to follow the same order otherwise we'll see garbage.

fn read_person<R>(reader: &mut R) -> Result<Person>
where
    R: Read,
{
    let name = read_string(reader)?;
    let age = reader.read_u32::<BE>()?;
    let height = reader.read_f32::<BE>()?;
    Ok(Person { name, age, height })
}

fn read_string<R>(reader: &mut R) -> Result<String>
where
    R: Read,
{
    let length = reader.read_u32::<BE>()?;
    let mut buffer = vec![0; length as usize];
    reader.read_exact(&mut buffer)?;

    String::from_utf8(buffer).map_err(|e| Error::new(ErrorKind::InvalidInput, e))
}

Here's how you might test that code:

fn main() -> Result<()> {
    let person = Person {
        name: "Michael".to_string(),
        age: 42,
        height: 3.14,
    };

    let mut buffer = Vec::new();
    save_person(&mut buffer, &person)?;
    println!("raw bytes: {buffer:?}");
    println!("reading as text: \"{}\"", String::from_utf8_lossy(&buffer));
    
    let mut reader = Cursor::new(&buffer);
    let person = read_person(&mut reader)?;
    println!("{person:?}");
    
    Ok(())
}

Which prints out the following:

raw bytes: [0, 0, 0, 7, 77, 105, 99, 104, 97, 101, 108, 0, 0, 0, 42, 64, 72, 245, 195]
reading as text: "Michael*@H��"
Person { name: "Michael", age: 42, height: 3.14 }

You can see that leading 0, 0, 0, 7, which is 7 as a big-endian number ("Michael".len()). Further down you can see a 0, 0, 0, 42 which corresponds to the age field, and so on.

If you save buffer to a file with std::fs::write() and point xxd at it, you'll see the hex dump:

$ cat file.bin | xxd
00000000: 0000 0007 4d69 6368  ....Mich
00000008: 6165 6c00 0000 2a40  ael...*@
00000010: 48f5 c3              H..

Note that while I'm writing into a Vec<u8> and reading from a Cursor<&Vec<u8>>, I could swap in a std::fs::File and everything would work fine because File implements both Read and Write.

All binary I/O eventually reduces to this sort of code. It's just that most formats will add extra layers to make sure we don't read things in the wrong order, add backwards compatibility, or they might even make the file self-describing. This is where serde and the various serde_* crates come in (serde_cbor, serde_bson, bincode, etc.).

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.