One way is to consume the source objects. It looks like this works by using an owned Value rather than a borrowed Value. The into_array method consumes the Value and gives us a regular Vec. The Vec's values can be consumed using into_iter. This compiles for me.
use simd_json::{owned::Value, prelude::ValueIntoContainer, Result};
pub async fn handle_data(mut data: Vec<u8>) -> Result<()> {
let obj: Value = simd_json::to_owned_value(data.as_mut_slice())?;
let entries = obj.into_array().unwrap();
for entry in entries.into_iter() {
tokio::spawn(async move { entry });
}
Ok(())
}
That would definitely be better, to reduce allocations caused by creating owned objects. But I tried it and the problem is that to_borrowed_value borrows from the Vec<u8>data, and the resulting Valueobj has the lifetime of the borrow. So we can't move both of these (data and obj) into an Arc, or at least I don't know how to do it without self_cell or something similar. (And in fact, self_cell won't work here because it won't allow borrowing mutably from the data to construct the obj.)
In that case there isn't a simple change because you can't make a borrow of a local last for 'static, so you'd have to put the root of the data (ie. the byte buffer) into an Arc. That would however mean re-parsing the same data independently in all tasks.
I would instead suggest you to stop trying to make this architecture work. Just do the parsing in one task, then submit each entry to a thread pool through a queue (eg. mpsc). You can then use the thread::scope() API to get rid of the 'static requirement.
Actually, simdjson has the interesting into_static() method, but that doesn't seem to work as well, as the object still drops once the function returns.
Regarding sending through channels - wouldn't that beat the purpose of borrowing, forcing me to copy the array members? If possible, I want to stick with the original buffer as much as possible.
Thanks for all the suggestions though, much appreciated!
into_static clones everything, and is similar to the owned Value approach. You should be able to use a consuming iterator and have it work. But you might as well used the owned API from the start.
There is no magic wand to make borrowing not have local lifetimes without giving up something.
I'm not sure exactly which suggestion[1] you're talking about here. But 'static bounds are the thing that forces you to make things owned. E.g. tokio tasks inherently require that.
Depending on the architecture, it's possible to use channels with borrowed data. Channels as a concept isn't inherently incompatible with borrowing.
Thanks, I actually thought for some reason it doesn't clone everything, but I read more carefully and that's true.
Of course, there's no magic wand . But I wanted to make sure that I'm not missing out on any feature or possibility in Rust before I'm giving up to cloning the data.
Trying to mentally model the issue - I need to get Rust to understand that the obj lifetime is longer than the function calling it. I can pass obj around, but iiuc it'll still have an owner that can't promise its lifetime will be longer than all the rest of the tasks.
I was wondering if there's a way to signal the tasks that entry lives long enough for them to use with no worries, or maybe make sure I join all the tasks somewhere and only then drop obj and hopefully it'll satisfy the requirement.
It can be something that's logically correct, just not sure if there's a way in Rust to pull it off.
I'm not really the person to ask, but as I understand it there are async runtimes that support scoped (non-'static) tasks. But tokio is not one of them.