Malware in audio file

I want to create a project and one thing really terrifies me. I want to let users to upload audio files and play them afterwards. I want to stream this file to clients (web, desktop), but scared of potential security hole. Is it possible to exploit audio files to execute malware? Is there any good practice to detect malware?

Audio files will be uploaded as binary. Server will generate name and save binary as audio file. Audio file will be streamed to users with websockets

If the decoder implementation has bugs, in principle it's possible to create special audio files that exploit those bugs. But it would require an attacker to have pretty specialized knowledge of 1. The audio format in question 2. The decoder implementation and 3. Writing malware and exploiting bugs.

That combination will be a pretty rare skillset, so while it's not impossible to do something like that, it is relatively improbable that it will happen unless your project really becomes popular (as in millions and millions of users).

13 Likes

Wow, thanks a lot. I'm doing this project for me and friends, but who knows how much would people like it. Hope i will gain knowledge in future to remove/minimize these types of exploits

This sort of thing can and has happened with other file formats, though I don't know of any specific incidents with audio decoders and it’s less likely in simpler formats. However, there is a robust way to protect your users: when the file is uploaded, decode it (using safe Rust code only, so your server is not vulnerable!), then re-encode it using your own encoder. This ensures that if there is any weirdness in the container structure, it is discarded, and the audio samples themselves cannot carry malware because they are arbitrary numbers that are not parsed.

13 Likes

It is possible to further enhance security by compiling said Rust decoder to WASM and run it as a module in e.g. wasmer, i.e. by sandboxing the decoder.
That will prevent any I/O from taking place that isn't explicitly allowed, and thus limit the damage that an exploit can do to the running VM instance.

6 Likes

assuming the WASM engine doesn't have any exploitable bugs...

2 Likes

The nature of securing things is not to reach 100% security. That's both technically and economically infeasible.

Instead, the goal is to make it as uneconomical as reasonably possible for an attacker to attack, where what's reasonable will differ between contexts. From this POV, running the decoder in a WASM engine is a security win: it takes relatively little effort to include, but makes the goal of successfully exploiting that part of the pipeline much harder to achieve.

9 Likes

It's happened before, for example a quick search finds the open source ALAC codec has had exploitable versions deployed on common versions of Android in 2021.

You should generally not be particularly concerned with web as a target unless you have cause to think your users are extremely high value (targeted by government organizations, for example) - if someone has an audio exploit that's viable on common browsers you are the least of anyone's problem. For desktop clients that's up to the quality and maintenance of the codecs you use there; anything that ships with the OS, (and therefore gets updates from it) is fine, but if you're including ffmpeg or something that's on you to ensure you stay up on updates.

As a mitigation, limiting the supported formats to one or two very common formats would be a good idea, in either case though - MP3 and ogg will be very well battle-tested in comparison to, say, some RealMedia codec from 2003.

8 Likes

You can embed everything, at least to MP3 files, but not in music itself, although it's also possible, but in limited volumes. MP3 files allow to embed music information which can be images and so on. An image viewer may have an ability to execute code, I guess you remember the famous Apple bag, when showing an image could jail break Apple OS. For all those fears, I use Java music player, which has one level of isolation as VM. However Rust has another level of protection, so as long you do not go over Rust execution, you should be safe.

1 Like

Example: libwebp had a buffer overflow that could be exploited by a very, very well crafted image file.

There are people who spend weeks in their basement, trying to go from a dumb buffer overflow to remote code execution. Never consider "it's too complicated to exploit" a sufficient security.

Fortunately most of those bugs are memory safety issues, so safe Rust should reduce the risk.

1 Like

Any code you use can have bugs in it, and so long as it takes any unprocessed input, be it in the form of a TCP request, an HTML request, or a file that the user is allowed to upload that you later process, such as an image, a text file, or an audio or video file, it may have bugs that allow a malicious user to execute code remotely. In other words, if you don't want your server to be hacked, don't connect it to the internet, and don't power it on. Only then will you be safe!

Or, you know, take the risk and deal with the possibility of being hacked.

Slightly a tangent to your original question, but you are much more likely to see more common attacks, such as denial of service style attacks.

For example, if a user can upload any file to your service, does your service check to prevent arbitrary quantities of data from being uploaded?

If a sufficiently large file is uploaded, it could run your machine out of memory or storage space.

Do you use one which implements the entire decoder in Java or do you use something like JavaFX's Media class which seems to internally use the native gstreamer library for decoding: jfx/modules/javafx.media/src/main/java/com/sun/media/jfxmediaimpl/NativeMediaManager.java at 163bf6d42fde7de0454695311746964ff6bc1f49 · openjdk/jfx · GitHub

i didn't implement anything yet as i was scared of kind of attacks that i won't be able to defend against. I do know basics stuff like ddos and this kind of stuff. Also i will limit audio files to max 10mb i guess. It should be enough for 99% of tracks

1 Like

All decoders are pure Java in my case. They work fine on Android and any other platform where Java is supported. But currently I baby also a plan to migrate some decoders to Rust to get advantages of underline hardware capabilities for a decent playback.

Actual high performance (eg 4k@60 and up) requires at least GPU decoding, and then you lose the memory protection benefits of Rust. There's also (what you probably are referring to) hardware implementations, I'm not too familiar with them but the drive by references I've seen imply they don't improve the security side really (other than providing an implementation you hope has been well validated!)

I assumed that the person asked for a dedicated audio playback. Sure, a video playback will be out of my knowledge. You do not need a GPU at all, however a range of USB DACs can be used. My implementation works also for a multi channels audio.