Hi all,
I am working on rust parser for Apache Parquet file format.
At the end of the file there is file metadata structure, which has offset to the first data page in the file. So I need to seek() to the end, read file metadata, find out first page offset, seek to this offset and start reading.
Code:
https://github.com/vchekan/rust-parquet/blob/e5693ba3a2138d9345b43045bfe390c13f31649f/src/lib.rs#L61
Data format:
https://github.com/apache/parquet-format
The problem is Rust's ownership. File will be owned by Thrift deserializer. But at the same time I need to own the same file to perform seeks.
As I think about it, Rust is correct in denying such operation, because it is a possibility for unexpected behaviour if file you own all the sudden change state (position, buffering) by another owner.
In my case, deserializer is lightway object, and I do not really need to hold it. As soon as deserializer is done, I can drop it. But now I am struggling with another problem: how to scope deserializer locally but return result of deserializer? Do I understand it right, that once I call:
let mut protocol = TCompactInputProtocol::new(buffered);
my "buffered" is gone and there is no way to return it from the function except of using Rc<>?
Do you think it would make more sense to modify Thrift's Rust API so it does not take ownership of file stream in constructor but borrow reference during deserialization?