Ownership and stream parsing


#1

Hi all,

I am working on rust parser for Apache Parquet file format.
At the end of the file there is file metadata structure, which has offset to the first data page in the file. So I need to seek() to the end, read file metadata, find out first page offset, seek to this offset and start reading.
Code:


Data format:

The problem is Rust’s ownership. File will be owned by Thrift deserializer. But at the same time I need to own the same file to perform seeks.

As I think about it, Rust is correct in denying such operation, because it is a possibility for unexpected behaviour if file you own all the sudden change state (position, buffering) by another owner.
In my case, deserializer is lightway object, and I do not really need to hold it. As soon as deserializer is done, I can drop it. But now I am struggling with another problem: how to scope deserializer locally but return result of deserializer? Do I understand it right, that once I call:

let mut protocol = TCompactInputProtocol::new(buffered);

my “buffered” is gone and there is no way to return it from the function except of using Rc<>?
Do you think it would make more sense to modify Thrift’s Rust API so it does not take ownership of file stream in constructor but borrow reference during deserialization?


#2

A &File also implements std::io::Read so you can share a reference across multiple users.


#3

Not sure what you mean. Use &File when I return value? Or pass &File into deserializer (can’t because Apache Thrift have already specified it as move parameter).


#4
// get a shared reference to a File - this implements std::io::Write and Seek and you can pass it to any place that requires those traits without moving the File itself
let file = &File::open(...); // or some other means to get a File
// use it here
let buffered = BufReader::new(file);
let mut protocol = TCompactInputProtocol::new(buffered);
// use it to read data at the start
let buffered = BufReader::new(file);
buffered.seek(...);

#5

Ok, I ended up hacking thift itself:

Seem more proper solution to me.