Format!-like writing to non-utf8 file?

Background: I'm writing a library for writing PDF files. PDF is a partially text-like binary format; it has no fixed encoding but large parts of the files are readable as ASCII text. However, sometimes that text contains text in any 8-bit encoding.

Question: Is there something like the format! macro where an argument can implement something like Display but write any binary data instead of a utf8? Or is there a way to write &[u8] instead of &str to a std::fmt::Formatter?

Or in other words, is there any way to replace this code:

try!(output.write_all(b") Tj\n"));

With anything like this:

try!(write!(output, b"{} Tj\n", encoded_string(text, &encoding)));

(Where encoded_string creates a struct that implements something like the Display trait.)

This may be a bad idea, but can you format! it and then use as_bytes?

I would think that shouldn't cost too much in terms of performance.

Edit: basically instead of answering your question I flipped it. Could you use a utf-8 foramteed String and then just write it as bytes to your output?

The non-utf8 bytestring I have is what I need as arguments to format, not (only) output. The following actually seems to work:

write!(self.output, "({}) Tj\n", unsafe { str::from_utf8_unchecked(&bytes)})

But the documentation for from_utf8_unchecked says "This function is unsafe because it does not check that the bytes passed to it are valid UTF-8. If this constraint is violated, undefined behavior results, as the rest of Rust assumes that &strs are valid UTF-8." So I assume this is not actually a good idea ...

1 Like

ah, you have non-utf8 INPUTS...

Hmm... well, I can't help you there BUT I would be surprised if "undefined behavior" actually hurt you. The way I would implement the formatter for writting strings would be to just write all it's bytes -- which is what you expect. I think "undefined behavior" would only happen if you tried to READ the string as UTF-8.

I would glance at the source and write a couple of tests to validate that things work as expected. If everything works as expected, I would open an issue against rust to get this behavior put in the documentation and tested with unit tests. I can't see why it should not be allowed.

actually, it kind of is documented right here: "as the rest of Rust assumes that &strs are valid UTF-8"

Since you are not going to read self.output as an &str I would think you should be fine.

Again, I would write a couple of unit tests and make sure everything works as expected, and maybe even open an issue. This is certainly an interesting use case.


No, unfortunately that's generally not OK. In this case, the formatting internals are Unicode aware and are free to assume that you're writing UTF-8 (parts of the formatting code will, e.g., parse the string as Unicode to count chars). It may be OK in this case but that could change at any time.


I think you may just be stuck doing this manually for now (or write some form of macro to make it nicer). Sorry.

1 Like