[Solved] Rust/wasm32 code obfuscation techniques?

  1. "carbo web deploy --release" , compiled without debug info, is probably already harder to read than most js code.

  2. However, are there other flags we can use to make the rust/wasm32 code even more obfuscated?

  3. I konw that server should treat all client input as untrusted / potentially malicious. The focus here is making client functionality hard to reverse engineer.

Thanks!

Well, mostly what is exported to the JavaScript side are the entry-points for the wasm module (panic, prinltn/console.log, main, etc.) So a correct answer would be to make most of your code run in rust; where native decompilation is difficult.

1 Like

Sorry, I wasn't clear:

  1. 99.9% of my code is Rust. The non-Rust portion = index.html + css.

  2. My question is: how do I make rust/wasm32 decompilation/hijacking difficult :slight_smile:

The short answer is: you don't.

The longer answer is: obfuscation can possibly increase the time it takes to reverse engineer the code, but it will never prevent the most determined from stealing your precious IP.

If you need to keep IP secret, provide it as a SaaS. If you are even more paranoid, you'll also encrypt everything (in flight and at rest); use only strong ciphers, keys, and pass phrases; authn with OTPs and 2FA; lock everything behind a firewall, preferably with only non-routable network addresses and bastion hosts for both ingress and egress... And after all that, still get hacked in a few years anyway. :joy:

6 Likes

It is possible to prevent reverse engineering: terminology - What is Indistinguishability Obfuscation? - Cryptography Stack Exchange

Unfortunately, the constant factor overhead / slowdown is too impractical for general use.

=====

I do agree, afaik, ther eis no current practical technique for obfuscation.

However, it's a matter of tradeoff. If there's some LLVM script that I can run on my wasm32 that (1) increases codesize by 50%, (2) slows down progra by 10%, and (3) makes any reverse engineer hate their life, I'm happy to run said script. :slight_smile:

I believe indistinguishability obfuscation is effective for code that never executes. But the moment it executes (e.g. a user purchases a license for your software), all bets are off; It is trivial to trace the execution, record inputs and outputs, throw away the overhead, and produce compatible software with the same functionality.

Again, this does not prevent reverse engineering, it only delays its inevitability.

2 Likes

In my (limited) understanding of IO:

  1. I hand you an encrypted program P
  2. You can run P on whatever input you want.
  3. But you can't "learn anything about inner workings of P"

In particular, imagine:

P(x) = sign_message(k, x + "00");

In particular, if I hand you P':

  1. You can evaluate P'(x) on any input, which is evaluting sign_message(k, x+"00"); for arbitrary x.

  2. Without breaking standard crypto assumptions, you can't extract k.

  3. P' is useless for signing messages that end in "01", "10", or "11"

Just for my understanding ... In crypto terms, if P(x) = sign_message(k, x + "00"), then is P'(x) = verify_message(p, x + "00"), e.g the public side of an asymmetric cipher?

Even if this is the case, any bad actor who gains access to P'(x) with input(s) outside of the "00" suffix can still reproduce a function which works on those inputs. This is what I meant by "a user purchases a license". It's a fun idea, but has the same kind of heat-death future.

  1. I've formally studied crypto, but not IO -- so my notation here is messy. Sorry for that.

  2. Here, P = sign_message(k, x + "00"), as some rust code. `P' = P, compiled via an IO system"

  3. If we just compiled P via rustc, it would be fairly straight forward to pullout the key "k".

  4. The point of "compiling" P to P' via IO ... is that we preserve the functionality: i.e. for all x, P'(x) = P(x), but it's done in such a way we don't learn anything else about how P works (unless we break standard crypto assumptions)..

This is kind of silly, but imagine a situation where:

  1. I want to grant you the power to locally, without contacting some server or even without an internet connection, to be able to sign all messages that end in "00"

  2. I do not want you to be able to sign messages ending in "01" "11" "10"

====

For (1), I need some way of handing you the "secret key." For (2), I need you to be unable to extract the key.

In my limited understanding, IO solves this problem -- which also happens to solve the "general" obfuscation problem.

If I've read the cliff's notes correctly, IO is implemented as a series of circuits. Of note is that circuits are well-known to be implemented with look-up tables. LUTs also satisfy the requirement that P'(x) provides the appropriate functionality without divulging the inner workings of the function.

The operations hiding behind some LUTs can be "obvious", e.g. matching a 4-input NAND truth table, for example. Even if only 1% of the LUTs can be inferred in this manner, the overall shape of the algorithm can be slowly whittled down into sensible code using a combination of TDD, brute force, and a lot of patience.

Fair warning: I don't actually know if all IO functions are describable with LUTs. Nor do I have any context on the sizes of inputs or outputs. Compared with circuits (as in FPGAs and CPLDs) the LUTs there are very small.

I know enough about neither IO nor about what you are describing to argue what is wrong. However, this line seems suspect. In the output circuit, you know exactly what the inputs, output, and type of gate evbery node of the circuit is. It just so turns out knowing this is not very useful (besides for evaluating) because the circuit is "garbled".

There is a related technology called "Yao's Garbled Circuit" see https://www.youtube.com/watch?v=s9AUtz1na5E

1 Like

That video strongly implies that circuits in functional encryption are entirely unrelated to electrical circuits, so I must have made a bad assumption. The second thing I got out of it is that there is a strong presence of asymmetric encryption between two remote parties. That sounds to me like another layer of security in SaaS.

I'm still not convinced that cryptography solves the problem of making reverse engineering impossible, or even impractical. Here's a slide from the presentation that visually shows why I believe it's dubious:

Alice is providing Bob a subset of the key which only recovers a subset of plain text messages. This is a representation of how the "trial" or "demo" build of an application compiled with indistinguishability obfuscation works, according to the quoted answer on stackexchange. The "holes" in the key are the parts of the interface that are removed from the trial build.

So here's the problem with this method. Charlie comes along and purchases a full version of your obfuscated software. His key now has fewer holes (or perhaps no holes at all) giving him unfettered access to most or all functionality provided in the application. Charlie can either share his key (pirating), or do black box testing to reproduce your software.

Another weakness of obfuscation is clean room implementation. This is the distinction between Photoshop and GIMP or Krita. Neither project had to reverse engineer Photoshop to make an application that can edit photos and digital paint. And these are all examples of very non-trivial software.

If you want to invest in protecting your IP, your best bet is probably a legal approach not a technical one.

I think we have dfferent definitions of Obfuscation.

  1. Your definition of obfuscation seems ot be "prevents pirating" -- and I agree, that type of Obfuscation can't exist because "cp" exists.

  2. My definition of obfuscation is "you can copy the program, you can evaluate the function on arbitrary input, but you can't learn anything else."

===

Going back to the sign("00") example. I give you a program P'(x) = sign_message(k, "00"). I don't care how many copies of P' you make or who you give it to.

All I care is that you can't abuse P' to have it sign messages ending ni "01" "10" or "11"

The above = situation where (1) pirating is trivial (2) in a "obfuscation = no pirating" world, obfuscation is impossible and (3) in a "obfuscation = you can't do anything else world", obfuscation is still possible.

Not quite. I was using pirating as an example of how even an obfuscated program can still make its way into the hands of someone with enough determination to reverse engineer it, regardless of setting up a price gate or similar. Once the function can be evaluated, it can (to some extent) be reverse engineered. This is well-known in reverse engineering circles:

Black Box Testing

Black-box testing helps to examine the functionality of an application depending on its specifications and without peering into its internal workings or structures. It is sometimes called Specifications based testing. This method of testing is usually applied to all levels of software testing such as integration, unit, system, as well as acceptance. It is made of mostly higher-level testing and is also dominant in unit testing. Here, test cases are centered around specifications, design parameters, and requirements. Tests used are fundamentally functional in nature, although non-functional tests may also be used. Usual black-box test design techniques comprise of all-pairs testing, decision table testing, equivalence partitioning, cause-effect graph, boundary value analysis, error guessing, use case testing, state transition testing, user story testing, combining technique, and domain analysis. Black box testing involves analyzing a running program by probing it with different inputs. Bear in mind that black box testing can be done even without access to the binary code.


Reverse Engineering Enhanced State Models of Black Box Software Components to support Integration Testing

When considering the black box components where no source code and formal specifications are available, most approaches rely on inference from the system execution and deriving the formal models from the observations.


And related material shows up fairly often at conferences like CCC, REcon, DEF CON, and Black Hat.

I agree. You can absolutely "reverse engineer" from the (input, output) pairs. The point is that in the reverse engineering process, the binary of P' is useless -- and the only useful thing you can get from P' is evaluating it.

In particular, I agree with you that:

  • you can eval P' on arbitrary inputs
  • you can get lots of (input, output) p;airs
  • from the (input, output) pairs you can put in time to reverse engineer the function / create a clean room implementation from the (input, output) pairs

However, the source of P' is useless for reverse engineering. In particular, from the reverse engienering perspective, the only useful thing you can do with P' is evaluate it.

Going back to the P'(x) = sign_message(k, "00") example:

I agree that:

  • You can evalute P'(x) for many values of x to get many (input, output) pairs
  • You can setup a cluster and brute force "k" by checking vs the (input, output) pairs.

I'm okay with both of the above -- as long as: even with the binary of P', you can't extract bits of k.

I'm clearly missing a lot of context, here. And I guess it doesn't really matter why you would want to protect P; I'm sure you have your reasons.

This P' has nothing to do with my actual use, it's a pure hypothetical example of something IO makes possible.

My actual use case: I'm building a webapp, and want to make it hard for competitors to steal/copy parts of teh tech.

So basically allowing only full evaluation of your program. There should be no possibility for reverse engineers to retrieve 'readable' intermediate states.
This could only work through sandboxing or emulation because your program effectively boils down to being a black box. Webassembly is not built as a black box emulation platform, the bytecode and behaviour are defined and open to the world.
I've heard of tools that run some kind of software defined virtual machine that processes custom bytecode, used for DRM purposes. I lack the knowledge/experience to provide more information though. (Some people also refer to these programs as obfuscation software.)

When stepping down the requirement of 'unreadable' to a 'hard to interpret' intermediate state, there are indeed code obfuscators that do this. I found this in my browsing history, GitHub - obfuscator-llvm/obfuscator, but never used it. It's old, looks unmaintained, but it can push you towards an answer to your question (maybe).

It sounds intuitive to look for obfuscators that work on the LLVM intermediate language, because that would make the tool deployable for all kinds of targets. WebAssembly is also pretty young so there might not be such a tool built specifically for wasm.

This P' has nothing to do with my actual use, it’s a pure hypothetical example of something IO makes possible.

I said "protect P", which was not a typo. P being your source code, and you have some reasons to protect it. Regardless, your application will be reproduced. Content owners have been searching for the Holy Grail of DRM for decades, and they will continue the quest indefinitely. Not even hardware is safe from cloning.

2 Likes