Fastest way to validate media file

I have lots (tens of thousands) media files, some of which are corrupted. I'm thinking of way to check their integrity in Rust, so I need a non-panicking way to parse them (and to validate that, given resolution w by h, there are indeed w*h pixels).

Those files are mostly jpeg, png, mp4, with a few files of other formats also present.

What solution would be the fastest? I'm considering

  1. Invoking ffmpeg -i ... -f null /dev/null and checking exit code;
  2. image Rust crate, and something else for video.

All else being equal, spawning a subprocess per file will always be slower than not doing that.

But you might not want to link against ffmpeg, and you might have trouble finding pure-Rust libraries for all the formats you want to process.

You could use a mixed strategy — check the extension or magic numbers, and then validate it using image or ffmpeg or whatever else can handle the format and is readily available to you.

While spawning overhead might be considerable, ffmpeg is just unreasonably faster that image crate. Additionally, it looks like a one shot thing, thus you can maybe link against ffmpeg or opencv (for opencv there's already a good crate that will do everything for you. idk about ffmpeg though, maybe write a bit extern "C" , you're not shipping it anyways).

Can you like take a small sample and try out everything? The task itself is trivial, there's just a lot of volume, and just trying all options may not be as time consuming.

1 Like

Video is such a mess that you should probably just shell out to ffmpeg, yes. Video decoding is expensive enough that the "start the process" overhead is probably small enough to not care.

But maybe make a local fast path for small jpgs?

I've found that there are ffmpeg wrappers, so I'll probably use one of them.

But maybe make a local fast path for small jpgs?

That's a very good idea! At this point, my plan is to just throw them into a separate bin, to also sort out which ones I need.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.