Tokio::fs memory errors

i am trying to use tokio::fs feature to read and write files/directories as fast as possible. i initially planned on using this code to learn how to use tokio::fs but i ended up using it in my application because it worked well enough:

#[pyfunction]
pub fn read_file(py: Python, file: String) -> PyResult<PyObject>{
    let file = file;

    let data = pyo3_asyncio::tokio::run(py, async move {
        return Ok(tokio::fs::read(file).await);
    });
    return Ok(PyBytes::new(py, &data??).into());
}

( btw, to those who don't know/use pyo3, just know that #[pyfunction] doesn't affect the code in this case, the return type PyResult<PyObject> just means Result<Python::object, Python::error>, and PyBytes::new(py, &data??).into() just turns the bytes into a PyObject (or Python::object) )

as you can see, this code works well, but if i try and open a very large file (i tested on a 2gb file), then i get a memory error. my initial thought was to use a buffer, but tokio::fs::read does not seem to have an option to include a buffer. so i came here to ask for any suggestions, how can i make sure that even if i am using too much memory, the program doesn't crash?

1 Like

fs::read — both std and tokio — reads the entirety of a file into memory. When the function returns, every single byte of the file is in RAM. If your computer can't handle that, kaboom. Buffering is irrelevant, as that just affects the speed at which the bytes are read.

The obvious question is: What are you trying to do, and why do you think reading a giant file into memory is the solution? It's likely that you can get what you really want by not reading the entire file at once but instead reading some bytes, operating on them, throwing the bytes away, reading more bytes, repeat.

6 Likes

Can you expand on what you mean by "memory error"?

If you run your program under a debugger, it'll automatically pause on segfaults and give you a chance to print a backtrace to see where the issue came from.

1 Like

the program crashes on the line

return Ok(tokio::fs::read(file).await);

and i know this is a memory error because this only happens when i reach 100% RAM.
now that i think about it, it could be something else, but i have no idea what.

the issue is that i'm using this as a python library in a bigger project. my goal was to take stuff that ended up being too slow in python, and do them in rust. i can try and change the way the file reading works, but if i need to change the output/input then i would need to change the entire project, so i really don't want to do that

Well you should post an error message. But if the problem is that you're trying to read a file into RAM that's bigger than the amount of available RAM you have, that's nothing to do with Rust or Tokio, that's a running out of RAM problem, and the only way to fix it is to not read it all into RAM at the same time but rather only a finite amount at any moment.

4 Likes

that's a shame... i thought there might be some way of stopping the file reading until more memory can be allocated, but if there isn't then i guess i would just write a warning if there's too large of a file for too little RAM.

well, at least now there's a solution to fixing this, thanks!

(btw, i just got the error message while fixing the issue and wanted to send the error:

thread '<unnamed>' panicked at C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\pyo3-0.19.2\src\err\mod.rs:789:5:
Python API call failed

as you can see, there's not much that you can learn from it, so i didn't write it in the original question)

1 Like

If you need to read files that are bigger than memory, then you can't use tokio::fs::read, which is defined to read the entire file before returning control to you.

You can, however, use tokio::fs::File::open to open the file for reading, then read or read_buf to read the file in smaller chunks; note that (unlike read), there's no guarantee that these methods will read more than 1 byte at a time, so you'll probably need to use a loop to read "enough" data into memory.

That way, you get control and can pause reading the file while you free up more memory from somewhere else, or split the file into chunks, or otherwise handle a very large file.

4 Likes