Solving a complicated lifetime issue

Hi all,

I've got an interesting issue where I'm receiving rather big json arrays (MBs) and I need to do some processing on each element of the array.

I want to split the work for each element in a different tokio task.

I've chosen simdjson for its performance benefits.

My code looks roughly like this:

pub async fn handle_data(
    data: Vec<u8>,
) -> Result<()> {
    let obj: Value = simd_json::to_borrowed_value(data.as_mut_slice())?;

    let entries = obj.as_array().unwrap();

    for entry in entries {
        tokio::spawn(async move {
            handle_entry(entry).await
        });
    }

    Ok(())
}

Obviously - this does not compile. entry's type is &Value, so it borrows obj, but obj drops once the function returns.

I'm looking for a way for obj to outlive all the sub-tasks I'm spawning, and only then drop it.

Thanks!

Put it in an Arc (and move the .as_array() call into the task worker function).

One way is to consume the source objects. It looks like this works by using an owned Value rather than a borrowed Value. The into_array method consumes the Value and gives us a regular Vec. The Vec's values can be consumed using into_iter. This compiles for me.

        use simd_json::{owned::Value, prelude::ValueIntoContainer, Result};

        pub async fn handle_data(mut data: Vec<u8>) -> Result<()> {
            let obj: Value = simd_json::to_owned_value(data.as_mut_slice())?;

            let entries = obj.into_array().unwrap();

            for entry in entries.into_iter() {
                tokio::spawn(async move { entry });
            }

            Ok(())
        }

That would definitely be better, to reduce allocations caused by creating owned objects. But I tried it and the problem is that to_borrowed_value borrows from the Vec<u8> data, and the resulting Value obj has the lifetime of the borrow. So we can't move both of these (data and obj) into an Arc, or at least I don't know how to do it without self_cell or something similar. (And in fact, self_cell won't work here because it won't allow borrowing mutably from the data to construct the obj.)

In that case there isn't a simple change because you can't make a borrow of a local last for 'static, so you'd have to put the root of the data (ie. the byte buffer) into an Arc. That would however mean re-parsing the same data independently in all tasks.

I would instead suggest you to stop trying to make this architecture work. Just do the parsing in one task, then submit each entry to a thread pool through a queue (eg. mpsc). You can then use the thread::scope() API to get rid of the 'static requirement.

1 Like

Actually, simdjson has the interesting into_static() method, but that doesn't seem to work as well, as the object still drops once the function returns.

Regarding sending through channels - wouldn't that beat the purpose of borrowing, forcing me to copy the array members? If possible, I want to stick with the original buffer as much as possible.

Thanks for all the suggestions though, much appreciated! :pray:

into_static clones everything, and is similar to the owned Value approach. You should be able to use a consuming iterator and have it work. But you might as well used the owned API from the start.

There is no magic wand to make borrowing not have local lifetimes without giving up something.

I'm not sure exactly which suggestion[1] you're talking about here. But 'static bounds are the thing that forces you to make things owned. E.g. tokio tasks inherently require that.

Depending on the architecture, it's possible to use channels with borrowed data. Channels as a concept isn't inherently incompatible with borrowing.


  1. or part of which suggestion ↩ī¸Ž

1 Like

They're talking about the scoped threads suggestion from @paramagnetic :

1 Like

Thanks, I actually thought for some reason it doesn't clone everything, but I read more carefully and that's true.

Of course, there's no magic wand :slight_smile: . But I wanted to make sure that I'm not missing out on any feature or possibility in Rust before I'm giving up to cloning the data.

Trying to mentally model the issue - I need to get Rust to understand that the obj lifetime is longer than the function calling it. I can pass obj around, but iiuc it'll still have an owner that can't promise its lifetime will be longer than all the rest of the tasks.

I was wondering if there's a way to signal the tasks that entry lives long enough for them to use with no worries, or maybe make sure I join all the tasks somewhere and only then drop obj and hopefully it'll satisfy the requirement.

It can be something that's logically correct, just not sure if there's a way in Rust to pull it off.

I hope it better explains my question.

Thanks!

I'm not really the person to ask, but as I understand it there are async runtimes that support scoped (non-'static) tasks. But tokio is not one of them.

I believe smol is the only runtime that does. However, the moro crate does this for any runtime, but it is experimental.

1 Like

(EDITED: originally I thought this may be a solution, but I overlooked the tokio-incompatibility part :upside_down_face: )

A friend has referred me to this:

But indeed turns out tokio is not relevant here.

Thanks everyone, I learned a lot!

Although experimental, moro did work for someone with the same sort of problem in an earlier thread.

I haven't looked into moro yet, but the friend actually also suggested to look in here:

Which seems to be superseded by:

If I'll give any of them a go, I'll make sure to document my findings here :slight_smile:

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.