I have an example code of read data from parquet file.
I found example for read data from local disk file, but i want load data from memory, because my data already load in memory by other method.
How can i load file data from memory?
like bellow code, i have file_data
in memory, but i don't know how to use it for data fusion.
Thank you.
src/main.rs
use arrow::record_batch::RecordBatch;
use datafusion::{
error::DataFusionError,
prelude::{ParquetReadOptions, SessionContext},
};
#[tokio::main]
async fn main() -> Result<(), DataFusionError> {
let ctx = SessionContext::new();
// load file data
let file_data = std::fs::read("data/example.parquet").unwrap();
println!("{:?}", file_data.len());
// How can we use load data from memory(file_data) to datafusion?
// TODO!
// ctx.register_parquet_from("foo", file_data, ParquetReadOptions::default()).await?;
// create the dataframe
ctx.register_parquet(
"foo",
"data/example.parquet",
ParquetReadOptions::default(),
)
.await?;
// create a plan
let df = ctx.sql("SELECT count(*) FROM foo").await?;
// execute the plan
let results: Vec<RecordBatch> = df.collect().await?;
// format the results
let pretty_results = arrow::util::pretty::pretty_format_batches(&results)?.to_string();
println!("{}", pretty_results);
Ok(())
}
Cargo.toml
[package]
name = "datafusion"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
arrow = { version = "20.0.0", features = ["prettyprint"] }
datafusion = "11.0"
tokio = "1.0"