Security advisory for rustc (CVE-2021-42574)

This is a lightly edited cross-post of the official security advisory. The
official advisory contains a signed version with our PGP key, as well.

The Rust Security Response WG was notified of a security concern affecting
source code containing "bidirectional override" Unicode codepoints: in some
cases the use of those codepoints could lead to the reviewed code being
different than the compiled code.

This is an issue with how source code may be rendered in certain contexts, and
its assigned identifier is CVE-2021-42574. While the issue itself is not a flaw
in rustc, we're taking proactive measures to mitigate its impact on Rust developers.

Overview

Unicode has support for both left-to-right and right-to-left languages, and to
aid writing left-to-right words inside a right-to-left sentence (or vice versa)
it also features invisible codepoints called "bidirectional override".

These codepoints are normally used across the Internet to embed a word inside a
sentence of another language (with a different text direction), but it was
reported to us that they could be used to manipulate how source code is
displayed in some editors and code review tools, leading to the reviewed code
being different than the compiled code. This is especially bad if the whole
team relies on bidirectional-aware tooling.

As an example, the following snippet (with {U+NNNN} replaced with the Unicode
codepoint NNNN):

if access_level != "user{U+202E} {U+2066}// Check if admin{U+2069} {U+2066}" {

...would be rendered by bidirectional-aware tools as:

if access_level != "user" { // Check if admin

Affected Versions

Rust 1.56.1 introduces two new lints to detect and reject code containing the
affected codepoints. Rust 1.0.0 through Rust 1.56.0 do not include such lints,
leaving your source code vulnerable to this attack if you do not perform
out-of-band checks for the presence of those codepoints.

To assess the security of the ecosystem we analyzed all crate versions ever
published on crates.io (as of 2021-10-17), and only 5 crates have the affected
codepoints in their source code, with none of the occurrences being malicious.

Mitigations

We will be releasing Rust 1.56.1 today, 2021-11-01, with two new
deny-by-default lints detecting the affected codepoints, respectively in string
literals and in comments. The lints will prevent source code files containing
those codepoints from being compiled, protecting you from the attack.

If your code has legitimate uses for the codepoints we recommend replacing them
with the related escape sequence. The error messages will suggest the right
escapes to use.

If you can't upgrade your compiler version, or your codebase also includes
non-Rust source code files, we recommend periodically checking that the
following codepoints are not present in your repository and your dependencies:
U+202A, U+202B, U+202C, U+202D, U+202E, U+2066, U+2067, U+2068, U+2069.

Timeline of events

  • 2021-07-25: we received the report and started working on a fix.
  • 2021-09-14: the date for the embargo lift (2021-11-01) is communicated to us.
  • 2021-10-17: performed an analysis of all the source code ever published to
    crates.io to check for the presence of this attack.
  • 2021-11-01: embargo lifts, the vulnerability is disclosed and Rust 1.56.1 is
    released.

Acknowledgments

Thanks to Nicholas Boucher and Ross Anderson from the University of
Cambridge for disclosing this to us according to our security policy!

We also want to thank the members of the Rust project who contributed to the
mitigations for this issue. Thanks to Esteban Küber for developing the lints,
Pietro Albini for leading the security response, and many others for their
involvement, insights and feedback: Josh Stone, Josh Triplett, Manish
Goregaokar, Mara Bos, Mark Rousskov, Niko Matsakis, and Steve Klabnik.

Appendix: Homoglyph attacks

As part of their research, Nicholas Boucher and Ross Anderson also uncovered a
similar security issue identified as CVE-2021-42694 involving homoglyphs inside
identifiers. Rust already includes mitigations for that attack since Rust
1.53.0. Rust 1.0.0 through Rust 1.52.1 is not affected due to the lack of
support for non-ASCII identifiers in those releases.

17 Likes

Ooo, I could have reported this problem. Years ago I had lots of fun confusing people with backward text in Javascript, forgot all about it: secure_express_demo/feh.js at master · ZiCog/secure_express_demo · GitHub

struct Sﺍ {
    ﺍ: i32,
}

struct Si {
    J: u32,
}

fn main() {
    let ﻝ = Sﺍ { ﺍ: 666 };
    let i = Si { J: 42 };

    println!("{}", ﻝ.ﺍ); // WTF?
    println!("{}", i.J);
}

Depending on the fonts used and the characters chosen it can very hard to distinguish rogue code from normal code.

Anyway, the new lint does indeed complain:

warning: identifier contains uncommon Unicode codepoints

Great stuff.

2 Likes

This lint/CVE isn't for the confusable identifiers.

This is specifically about text in string literals / comments showing up outside the string / comment that contains them, leading to incorrect human reading of code.

What are the exact codepoints linted against here?

Per my previous Unicode BiDi understanding, LTR/RTL override and embed should be less bad, in that the string literal will still contain the entire string literal (the containing LTR context surrounds the embedded RTL context). Though I suppose {RTL}resu{LTR} could still be a confusable attack.

If I understand correctly, it's the ISOLATE which leads to part of the string literal showing up after the end of the LTR line.

What about ZWJ/ZWNJ? I could easily write == "us{ZWNJ}er" and that will also just render (correctly!) identically to "user" (unless the font has a ligature which would be broken by the ZWNJ).

ZW(N)J are also much more difficult to justify a lint against, as there are many more uses for those, and some text is impossible to represent without them (not the least of which being compound emojis :gasp:). And it's impossible to know whether ZWNJ is justified, as a legitimate use is preventing ligatures, of which you can't know if there's actually a ligature to be avoided, since there are ~infinite fonts.

I definitely agree on linting against BiDi control characters, as they shouldn't usually be needed for common BiDi text and aren't typically inserted transparently (as far as I understand it, which is little tbf). But I also fear the slippery slope logical conclusion of linting against all format/invisible codepoints since many are actually useful but abusable.

the solution is to not have security critical stringly typed code

2 Likes

You can see the exact patch here: [master] Fix CVE-2021-42574 by pietroalbini · Pull Request #90462 · rust-lang/rust · GitHub

1 Like

Hmm... OK. Whatever it is is seems to be caused by the general confusion and chaos that is Unicode.

Just for context for everyone

2 Likes