I've tried searching crates.io and a basic web search with no luck. Does anyone have libraries they recommend for validating files that are uploaded via the web?
I'm thinking things like making sure the file format actually matches the mime-type and extension as well as some basic sanitization for certain filetypes (png and pdf for example). File size is not necessary as I'm handling that elsewhere. Filename sanitization also doesn't need to be included.
I know the basics of how to check somethings with some formats, but it's a really big topic and I'd prefer a library if there are any good ones.
Oh, and if this should be in help, I'm open to reposting or just having it moved. I wasn't really sure where something like this should be posted since it's not directly code or language concept related.
Validation sort of depends on what you're doing with the files. Are you trying to validate them before passing them to a parser to try and avoid a malicious file triggering an exploit? If so you would probably be better off isolating that code in some sort of sandbox to minimize the damage an exploit can do.
If you're just storing them and allowing a user to retrieve them you almost certainly shouldnt do validation on the file contents.
I would also caution against OVER-validating files because you could end up rejecting valid files, which is extremely frustrating for users. This happens a lot with email address validation on web forms for example, and it's incredibly annoying when a real top level domain is rejected because it isn't on the short list the site is validating against.
If you want to check that a file that's supposed to be a PNG or a PDF is actually a valid PNG or a PDF, then the only reliable way to tell is to attempt a full parse with a compliant parser.
Given that PDFs are not really machine-readable, and neither are PNG images, I am highly suspicious of what you are actually trying to accomplish, because you probably don't actually need this sort of validation. Given that you won't be able to use the result of parsing for much (unless you are trying to do OCR or fancy image processing stuff), you are probably trying to create a "this is not a valid document" popup, which is definitely in the category of unnecessary annoyance.
The immediate application is a small custom content management application.
So files uploaded will be displayed to the public (images). Malicious considerations are part of it, but so is format validation. A png needs to actually be a png. Same for jpg, bmp, gif, etc. Pdf and other formats may be a consideration later for different purposes.