I get an error trying to convert Vec[u8] to string


#1

I have a vector with a length of 32. I want to convert that to a string of 32 characters. I’ve tried several different paths with no success. The most obvious is listed below. I get an panic, error:

the code:

println!("converting pub key {:?} length {}", pub_address, pub_address.len());
let pub_address_str = str::from_utf8(&pub_address).unwrap();

the output:

converting pub key [225, 158, 21, 197, 56, 190, 97, 51, 247, 219, 177, 62, 82, 68, 241, 125, 22, 213, 234, 1, 205, 96, 13, 30, 239, 27, 146, 76, 94, 198, 196, 88] length 32
thread 'success_create_payment_with_seed_returns_address' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(2) }', libcore/result.rs:945:5

On conversion, I expect pub_address_str looks like this "E3h79gOkwrgWehQw7MaguDHwyfTcDwls".

I appreciate the help. My apologizes if this is answered else where. I can’t find an answer.

Thnx
Matt


#2

Hi Matt,

You are trying to convert a vector of bytes, but the bytes are not a valid UTF-8 encoding. So str::from_utf8 returned an Err with a Utf8Error inside it instead of returning an Ok with a &str.

Looking at your input, the first byte is 225, which corresponds to á in the Latin-1 encoding. If you expect a string of pure ASCII characters (a-z, A-Z, 0-9, …) then all bytes will have values less than 128 when encoded as UTF-8. So that can be a quick guide to see if you’ve read the data you’re dealing with correctly.

I tried decoding the bytes using a few different encodings (UTF-8 with BOM, UTF-16, UTF-32, …), but didn’t manage to find one that worked. Running it through chardet.detect from Python’s chardet library also didn’t yield anything sensible.


#3

225 in binary is 11100001, which in UTF-8 would be the start of a 3-byte sequence: 1110xxxx 10yyyyyy 10zzzzzz for the codepoint xxxxyyyyyyzzzzzz. Your 158 (10011110) could be a continuation byte , but 21 (00010101) is not.


#4

If the bytes aren’t a proper UTF-8 string, you can fudge it this way:

[225u8, 158, 21, 197, 56, 190, 97, 51, 247, 219, 177, 62, 82, 68, 241, 125, 22, 213, 234, 1, 205, 96, 13, 30, 239, 27, 146, 76, 94, 198, 196, 88]
.iter().map(|&c| c as char).collect::<String>();

This gives wrong result for non-ASCII characters (giving it latin-1-ish encoding?) but it works.


#5

Yes, you’re right – that is exactly a Latin-1 decoding since Unicode embeds Latin-1 as its first 255 code points.


#6

but the bytes are not a valid UTF-8 encoding.

For simplicity I didn’t include what generated these bytes. I’m using sodiumoxide::crypto::sign to create key pair. I need to return a string to the code that “consumes” my method.

Thnx
Matt


#7

sodiumoxide::crypto::sign gives raw binary bytes, not text at all. You’ll need some kind of transformation to stringify it as needed, like hex or base64.


#8

Okay, so you’re really dealing with a vector of completely random bytes. You can encode that in many different ways: one way would be to base-64 encode it:

extern crate base64;

fn main() {
    let bytes: &[u8] = &[
        225, 158, 21, 197, 56, 190, 97, 51, 247, 219, 177, 62, 82, 68, 241, 125,
        22, 213, 234, 1, 205, 96, 13, 30, 239, 27, 146, 76, 94, 198, 196, 88,
    ];

    let base64_encoded = base64::encode(&bytes);
    println!("base64 encoded: {}", base64_encoded);
}

That prints for your vector:

base64 encoded: 4Z4VxTi+YTP327E+UkTxfRbV6gHNYA0e7xuSTF7GxFg=

For fun, I also tried to base85 encode the bytes:

base85 encoded: &HQ!6ike/S{Uh@pqBH567s?WJ=0Q#4)&dA-uDf]=

See the playground.


#9

Thank you @mgeisler and @cuviper. That is the solution.

Matt


#10

If you don’t care about out of range chars there’s also a lossy version that converts out of range u8s to the unknown Unicode char:

https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy


#11

@gilescope: I’d vehemently recommend against using lossy algorithms to serialise crypto secrets, which mattraffel is apparently doing:

although admittedly, he didn’t specify this in his opening post, so I can see where the confusion comes from (classical XY problem)


#12

@juleskers You’re absolutely right I didn’t understand the problem. I’m learning rust, and trying to implement use of crypto, which I’m just starting to get into. Thanks for your understanding. I appreciate everyone’s help.


#13

That’s perfectly fine, we were all new to this once! (And you could say, we all still are, and always will be)

Learning how to ask about problems is also quite a skill, because you need to know what you don’t know. That needs training just as much as debugging, writing documentation, or programming itself. :slight_smile:

General advice with crypto though: stay as far away from it as possible, and reuse whatever the specialised crypto-programmers offer (such as sodiumoxide). The offensive side of crypto is insane in what they can do, so defending is extremely hard. Think we-derived-your-password-from-the-power-fluctuations-in-the-usb-keyboard-insane, and don’t get me started about “Rowhammer”…


#14

In addition to what has been posted that fixes the actual problem, if you just want to avoid the panic call, use unwrap_or().


#15

That is sound general advice if there is a sensible default value you can return instead of the “real” result.
However, that might be exceedingly dangerous to use in this specific context, because it makes it harder to see if you got the “default” signing value back, or a successful “true” signing.
You wouldn’t want to accidentally use the default value, because that makes collision attacks (same crypto result for different input, used to impersonate, etc.) far easier.
The best way of avoiding the default, is to not have it in the first place.

Alternative suggestion: if let

If let Ok(signature) = crypto::sign(input) {
  println!("success, key is {}", your_format(signature));
} else {
  println ("oh noooo");
}

(Typed on my phone, apologies for the probably dozens of syntax errors…)


#16

Very good point. I generally use a default value that indicates specifically that it was the result of a failure, and ideally where the failure occurred. But I can see that in a crypto context that is undesirable.


#17

That sounds pretty much exactly like what to Result::Err type is intended for, so why not use that instead?
With a default value of the “correct” type, you can easily forget if you checked already (we are all only human, and I have done way dumber things already :wink: ), and thus accidentally use the error-value as a normal value.
With Result::Err, the compiler will always remind you :smile:
(This is a highly powerful concept that Rust borrows from the ML/Haskell world, it works perfectly with match expressions)

Recommended reading:
https://doc.rust-lang.org/book/first-edition/error-handling.html

Also useful, for the “where it comes from” part:
The highly powerful failure crate, which automatically generates stack traces etc.
It is the current “best practice” error handling library in the rust world. (But maybe overkill in simple settings).


#18

I feel like I’m derailing the thread, but I use Result::Err commonly as well. I think there are some valid instances when I want to both return the default type AND know that it was an error. Unwrapping an error generates a panic. For instance, if I’m parsing 1000s of XML records, I don’t want the process to stop if one record lacks a specific field value. I do however want to know if the returned value is valid later.

Granted I have mostly been doing relatively simple applications to this point. I’m sure for more complex cases this approach is counter-productive.


#19

You’re right, and i’ve contributed to the drift… let’s stop :slight_smile: (and my apologies to the thread)