Strange code/str embedded in the final build of an actix-web project

Hello everyone,

I've been using Rust for 2 months, and I had a hard time convincing my team to redevelop an enterprise application from Java/Spring to Rust/Actix-web.
Everything went well, until I discovered that an official Actix-web middleware embedded some code (Html+Js) and some weird text in the build (--release).
Analyzing the strings contained in the executable, we find references to Amazon and Facebook and other weirdness.

Even creating an empty project containing only Actix-web.

here is a sample of what I found in the final executable, (on an empty project containing only actix-web with its compression middleware enabled) :

I tried to find the cause, but I'm still a beginner in this language, so I tried the best I could to find a logical explanation.

I opened a ticket describing what I found, and I also gave the elements to do the experiment again. But unfortunately I didn't get any answer from the Actix project team, and I fully understand that these good people have a life and that every second spent for the open source community is out of kindness and not out of obligation.

the link of the post :

I rely on your experience to help me find the cause, because the migration project is totally stopped for security reasons, we can't deploy a program in enterprise containing unknown code.

is it a test code of the library that ended up in the final build by mistake?

Can someone give me an explanation of what is really going on ?

Thank you in advance.

This is likely due to the inclusion of the compression algorithm Brotli, written by Google (and FOR Google, really), which uses a static dictionary formed using real-world data -- the most popular strings traveling through their own servers. So naturally, it'd contain popular website names. I'm somewhat surprised they didn't remove trademarked names from the dictionary, though. This is exactly why I thought Brotli was a little silly when it came out..

1 Like

Thanks @jessa0 for your responsiveness, it's really interesting what you say, I did a search on the .cargo folder to find one of these occurrences in the librarys source code, but I didn't find all the embedded words..

is it normal to have an HTML structure and JS file names ?

@jessa0 That seems correct. rust-brotli-decompressor/ at 0045b6e18a91b12937ffc45a7a4f934ffeab7d2e · dropbox/rust-brotli-decompressor · GitHub Has lots of data, but it's individual bytes in hexadecimal. So I converted 'facebook' to bytes

$ echo facebook | hexdump -C
00000000  66 61 63 65 62 6f 6f 6b  0a                       |facebook.|

And after a little grep, I found these bytes in line 2633:

$ rg '0x66, 0x61, 0x63, 0x65, 0x62'
2633:   0x61, 0x6d, 0x62, 0x69, 0x65, 0x6e, 0x74, 0x65, 0x66, 0x61, 0x63, 0x65, 0x62, 0x6f, 0x6f, 0x6b,
1 Like

I did an ASCII/UTF8 search, I didn't think of doing a hexadecimal search, it's really smart what you did, thanks @mdHMUpeyf8yluPfXI .

I am really impressed by the level of responsiveness and technical experience of the Rust community. :+1:

But keep in mind that the answer to the question "why is there the string 'facebook' in my executable" does not in any way help answering the question "what is the definition of untrusted code I'm using and how do I prevent unstrusted code from being included in my binary"

I found a gist of the brotli dictionary in ascii, sorted alphabetically, and indeed, it contains a lot of HTML / Javascript looking fragments.

1 Like

if we look at the strings included in the executable, we find blocks of a Html page structure with JavaScript file names, a website is a code, especially if it is included in a web server that will be exposed to the public.

here are some file names included in the binary : checked.min.js , jquery.js, time.js, header.min.js, etc..

but as @jessa0 said, this structure is also in the dictionary, so it's justified, but it still leaves some doubts when we don't know the source (thanks to you I know now).

I think that users should be warned of the existence of this kind of dictionary in the compression module, because it increases the size of the binary, and it has an impact for those who do embedded.
(adding company names in the dictionary is very questionable)

I am now relieved, thanks again.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.