Is the "format" macro platform-independent?

I met a very weird problem, i use the format! to format a string to xml, just like

format!(r###"<Output OutputFormat="Text">{}</Output>"###, line)

the value of line is "ABC\0\0\0\0", the result is like <Output OutputFormat="Text">ABC </Output> on my Mac, but on Windows it's be like <Output OutputFormat="Text">ABC, it's very weird, so i want to know is the "format" macro platform-independent?

1 Like

What do you expect to see? What's the output on your Mac? What are you using to display the output XML? The nul byte is a non-printable control character, so "ABC" is a correct (but not necessarily the only correct) way to render line. In any case, it's not about format! but your terminal/editor/viewer program.

Given that you're printing XML, the program should replace nuls with &#0; or &#x0; (and other non-printables similarly), but Rust isn't going to do that for you because it can't know that it's XML (and indeed doesn't even know what XML is). You'll need to do it yourself or get a third-party XML library to do that for you.

6 Likes

It's the same on Linux:

It'd really be good to know, what your expected output is, as was already requested.

Actually, both a null character and &#x0; are not well-formed XML. An XML document may not contain a null character in any representation.

https://www.w3.org/TR/xml11/Overview.html#NT-Char

3 Likes

Hmm, are you sure? A naked nul byte cannot exist in XML, making OP’s code actually ill-formed, but I believe a character reference can. The relevant chapter and verse seems to be 4.1 where the production does accept &#0;.

The key phrase in 4.1 is:

Well-formedness constraint: Legal Character

Characters referred to using character references MUST match the production for Char.

And “the production for Char” omits the value zero.

The grammar in 4.1 simply doesn't attempt to syntactically reject a string of all zero digits. (That would be possible to write, 0+[1-9][0-9]*, but doesn't reflect how a typical real parser would parse the text or report errors. It’s simpler to check the numeric value after parsing. Also, Char excludes surrogates and writing a grammar to reject them too would be even more complex, so it makes more sense to define the allowed character set in a single place.)

Sorry I didn't express the problem clearly, the result is like <Output OutputFormat="Text">ABC, and i tried to do it on the other windows PC, but most of the results are like <Output OutputFormat="Text">ABC </Output>, so weird

The differences are in how the text is being displayed by the tools you are using to view it, not in what text is produced by format!(). The cut off </Output> is likely because some tools (those written in C, typically) will treat a null as signaling the end of the string.

You should probably not be putting null bytes in a string. Particularly for XML, as previously noted, but in general, they will often cause trouble.

6 Likes

If you want to see the raw bytes of the text, do this:

let string = format!(r###"<Output OutputFormat="Text">{}</Output>"###, line);
let bytes = string.as_bytes();
println!("{bytes:?}");

This will show the same output on all platforms. You can even assert_eq! the bytes variable to the string b"<Output OutputFormat=\"Text\">ABC\0\0\0\0</Output>"[1].


  1. which is just a more readable way of describing the bytes [60, 79, 117, 116, 112, 117, 116, 32, 79, 117, 116, 112, 117, 116, 70, 111, 114, 109, 97, 116, 61, 34, 84, 101, 120, 116, 34, 62, 65, 66, 67, 0, 0, 0, 0, 60, 47, 79, 117, 116, 112, 117, 116, 62] ↩︎

1 Like

Keep in mind that format! doesn't understand XML at all. The {} placeholders are not smart. format! won't try to use correct syntax for the file format you're trying to generate.

In particular, if you don't escape your data yourself, your program using format! to generate XML will very likely generate invalid XML, and may even be vulnerable to injection attacks similar to XSS.

If you must use strings, remember to escape < as &lt; and " as &quot; in attributes. Ideally you should use some XML library to generate valid XML markup from structured data.

5 Likes