Global/Singleton variable without Send/Sync

For a simple HTTP service I'm trying to use the library wkhtmltopdf-rs that can be initialized only once per-process (more info here and here).

Basically, I'd just need to make a global instance of PdfApplication and I thought to just check if the variable was empty and eventually initialize it, or initialize it in the main() but I'm having no luck.

use wkhtmltopdf::PdfApplication;

static mut PDF: Option<PdfApplication> = None;

async fn build(req: web::Json<BuildRequest>) -> impl Responder {
    let pdf_app = &PDF.unwrap();
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    use actix_web::{App, HttpServer};

    PDF = Some(PdfApplication::new().expect("Err"));

    HttpServer::new(|| {
        App::new().service(build)
    })
    .bind("127.0.0.1:3000")?
    .run()
    .await
}

But the code throws the error:

cannot move out of static item PDF
move occurs because PDF has type std::option::Option<wkhtmltopdf::PdfApplication>, which does not implement the Copy trait

I've then tried with thread_local but it hangs from time to time and I feel like it's a dirty/incorrect solution.

Tried with lazy_static and OnceCell too but it was complaining about a missing Sync trait:

static PDF: OnceCell = OnceCell::new();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *const () cannot be shared between threads safely
|
= help: within PdfApplication, the trait Sync is not implemented for *const ()
= note: required because it appears within the type PhantomData<*const ()>
= note: required because it appears within the type PdfGuard
= note: required because it appears within the type PdfApplication
= note: required because of the requirements on the impl of Sync for once_cell::imp::OnceCell<PdfApplication>
= note: required because it appears within the type once_cell::sync::OnceCell<PdfApplication>
= note: shared static variables must have a type that implements Sync

And, finally, with data() of Actix it raised errors about a missing Copy/Clone trait.

What should be the correct way-to-go? If that's possible at all.

Thank you!

How about creating one instance of PdfApplication in a thread and then sending work to it via mpsc from your other threads?

You probably want once_cell::sync::OnceCell but currently, you are likely trying to work with once_cell::unsync::OnceCell.

You could use something like send_wrapper::SendWrapper. FYI, it will panic if you try accessing it from different threads. To protect against panic, you would need to check PDF.valid() and only use PDF

Something like

use once_cell::sync::Lazy;
use send_wrapper::SendWrapper;
static PDF: Lazy<SendWrapper<PdfApplication>> = Lazy::new(SendWrapper::new(PdfApplication::new().unwrap()));

Note that the wkhtmltopdf crate seems weird / confusing / probably unsound in its API though. I noticed two points:

  • The builder method claims to be a &mut self method for soundness in its documentation, yet it accepts &self
  • The returned PdfBuilder is Send and Sync, and the PdfApplication is not. I don’t see any good reason why PdfApplication couldn’t be send if PdfBuilder is. Either using wkhtmltopdf (the C library) from multiple threads (but not concurrently) is sound, in which case PdfApplication could just as well implement Send (and Sync) [and in a global variable, you’d need to put it in a Mutex], or PdfBuilder has an unsound Send/Sync implementation.
1 Like

From the docs of PdfApplication::new:

Wkhtmltopdf may only be initialized once per process, and and all PDF generation must happen from the same thread that initialized wkhtmltopdf.

In rust you cannot dodge the concern of thread safety. A static can be used by code from any thread and therefore anything in a static must be safe to use from any thread. Therefore statics require Sync.

I agree with @ZiCog 's solution.

Something does seem off here. I haven't looked at it fully (hard to from my phone), but PdfBuilder probably should contain either a &'a PdfApplication (to make it un-Sync) or a &'a mut PdfApplication (to prevent multiple from existing). I suspect it might be the latter because build_from_path could re-entrantly use another builder through the AsRef<Path> argument.

I see, then the ”make PdfBuilder not Send” approach is the correct one. Sync probably doesn’t matter since you cannot even do anything with a &PdfBuilder.

The builder method accepting &self might actually not be problematic as long as everything’s single-threaded. Sure, the AsRef thing could be re-entreant, but as long as this .as_ref call doesn’t happen in a callback from the C code, or otherwise in the middle of multiple calls to the C library that must not be interleaved incorrectly, then there might not be any soundness problems after all.


I agree with @ZiCog’s solution if the intention is to actually use the PdfApplication from multiple threads.


Reading more of the crate, I hit another weirdness: The PdfOutput type has a lifetime, even though all methods of creating it can generically create a PdfOutput<'b> of any lifetime, including PdfOutput<'static>, and not related to anything else.

The intention is... Well, just to use it whenever the route (and therefore build() function) is invoked. I don't really know much about multi threading, sorry. It just needs to be a simple microservice with a very low traffic. What's your suggestion here? To initialize PdfApplication in a different thread and communicate via mpsc?

Hey @steffahn thank you for your answer. As you said, it works just fine until it panics. What do you mean with "check PDF.valid() and only use PDF"? I mean, it's clear, but what should you do if PDF is not valid?

FYI the current code is something like

static PDF: Lazy<SendWrapper<PdfApplication>> = Lazy::new(||{
    SendWrapper::new(PdfApplication::new().unwrap())
});

async fn build(req: web::Json<BuildRequest>) -> Result<NamedFile> {
    let pdf_app = &PDF; // <-- It may panic here

    let mut pdf = pdf_app.builder()
        .build_from_html(&req.html)
        .expect("Failed to build PDF");

  // ... Other stuff
}

P.S.: at the end it seems I can just solve this by setting Actix to 1 worker only to not spawn new threads, although I'd loved to know how to solve this properly :frowning:

I’ve continued thinking a bit about “proper ways” of doing stuff. I’ve found an interesting crate procspawn that makes it considerably straightforward to set up a whole pool of processes that would allow multiple wkhtmltopdf running in parallel. As far as I understand, spawning something on the pool puts it into a queue immediately you can also inspect the queue length in order to reject when there’s too many pending requests at once; but joining on the resulting handle is not async-aware, i.e. can block; to more properly interact with it, you could use… well… disclaimer, I’m not familiar with actix_web at all, but I found block in actix_web::web - Rust, which should be able resolve any potential problems of blocking when calling join on a JoinHandle.

When using procspawn, you could keep using the global static PDF thing in a Lazy<SendWrapper>; it should only be running a single thread in each process. The main process would probably never initialize the lazy PDF value at all, but that’s working as intended. Make sure to desugar the actix_web::main macro, in order to insert the procspawn::init call before everything else. AFAICT, this desugaring involves using something like: actix_rt::System::new("some name...").block_on(async move { /* remaining contents of main function */}), with the correct/matching version of actix_rt; or mayby actix_web::rt::System exists, depending on your actix_web version...

This is slightly confusing to me btw why I’m not able to find a way to do this without an additional dependency, even though the actix_web::main macro exists... maybe you can find a better approach by examining the result of the macro with cargo expand

Now, using procspawn would require everything sent between threads to be (de-)serializable via serde, for inter process communication. On the other hand, as far as I can tell actix_web requires serde for some things, too, so perhaps it’s not too hard to find an appropriate interface point.

Uh, thank you, it seems way more complex than I expected. Btw yes, actix already requires serde to deserialize requests data, I'll take a shot on this but at the end of the day probably it's just easier to actually run everything single-threaded and finger crossed.

Thank you for the effort in looking for a solution :slight_smile:

Btw, this sounds like a nice use case for:

in case you didn't need the parallelism.

Basically non-Send values are given their own dedicated thread to live in, and "uses of it" are actually "usage queries" sent to that thread to be performed therein.

3 Likes

Thanks. That's a crate I had seen before, and that I've unsuccessfully searched for when writing an earlier answer for this thread. Now knowing its name again, I also found where I’ve first learned about it.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.