Iterating over &[u8] produces unexpected output

I wrote an application that iterates over a reference to [u8] vector.
Just as simple as this:

#![allow(unused)]
use std::char;

fn main() {
    let vsparkle_heart = &[240, 159, 119, 150];
    let mut ic: usize = 0;
    let mut sasciiheart = String::with_capacity(vsparkle_heart.len());

    for uc in vsparkle_heart {
        print!(
            "; {} - {}:'{}', {:?}",
            ic,
            uc,
            char::from(*uc),
            (*uc >= 32 as u8 && *uc < 127 as u8) || (*uc == 10 as u8)
        );

        if (*uc >= 32 as u8 && *uc < 127 as u8) || (*uc == 10 as u8) {
            print!(" - ascii");

            //Add the valid ASCII Character
            sasciiheart.push(char::from(*uc));
        } else {
            print!(" - non-ascii");
        }

        ic += 1;
    }

    print!("; chr cnt: '{}'", ic);
    println!("\nascii: '{}'", &sasciiheart);
}

this works very nice on the Playground:

Standard Error

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.80s
     Running `target/debug/playground`

Standard Output

; 0 - 240:'ð', false - non-ascii; 1 - 159:'Ÿ', false - non-ascii; 2 - 119:'w', true - ascii; 3 - 150:'–', false - non-ascii; chr cnt: '4'
ascii: 'w'

but when I run it on my local system it stops writing the output at some point:
ASCII Heart Example

$ cargo run --example ascii-heart
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/examples/ascii-heart`
; 0 - 240:'ð', false - non-ascii; 1 - 159:'', false - non-ascii; chr cnt: '4'
ascii: 'w'

so it seems as if it has done the whole cycle but on finding the u8 = "119" it just stopped writing the output.
Also writing the output to a report String shows the same behaviour.

This is a bad roadblock when troubleshooting complex data.

And I wonder if someone can offer some insight about this.

1 Like

This has to do with how the terminal handles weird characters. The playground ignores them, but your terminal does something else. If you redirect the output to a file, you'll see that the characters really are there.

2 Likes

Thank you.

Based on your idea with the file I thought about how I can inspect the content of the String report.
And println!("report: '{:?}'", &sreport); did already do the trick:

#![allow(unused)]
use std::char;

fn main() {
    let vsparkle_heart = &[240, 159, 151, 119, 150];
    let mut ic: usize = 0;
    let mut sasciiheart = String::with_capacity(vsparkle_heart.len());
	let mut sreport = String::new();


    for uc in vsparkle_heart {
        sreport.push_str(&format!(
            "; {} - {}:'{}', {:?}",
            ic,
            uc,
            char::from(*uc),
            (*uc >= 32 as u8 && *uc < 127 as u8) || (*uc == 10 as u8)
        ));

        if (*uc >= 32 as u8 && *uc < 127 as u8) || (*uc == 10 as u8) {
            sreport.push_str(" - ascii");

            //Add the valid ASCII Character
            sasciiheart.push(char::from(*uc));
        } else {
            sreport.push_str(" - non-ascii");
        }

        ic += 1;
    }


    sreport.push_str(&format!("; chr cnt: '{}'", ic));

	println!("report: '{:?}'", &sreport);

	println!("report: '{}'", &sreport);

      let vsttrpt: Vec<char> = String::from_utf8_lossy(sreport.as_bytes()).to_mut().chars().collect();

      println!("stt rpt chrs (count : '{}'):\n{:?}", vsttrpt.len(), vsttrpt);

      println!("stt chrs ascii:");

      for c in &vsttrpt {
          if ! c.is_ascii() {
              print!("{}|", c.escape_unicode().to_string());
          } else {
              print!("{}|", c);
          }
      } //for c in &vsttrpt

      println!();


    println!("ascii: '{}'", &sasciiheart);

}

which then produces:

$ cargo run --example ascii-heart
warning: unused manifest key: build
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/examples/ascii-heart`
report: '"; 0 - 240:\'ð\', false - non-ascii; 1 - 159:\'\u{9f}\', false - non-ascii; 2 - 151:\'\u{97}\', false - non-ascii; 3 - 119:\'w\', true - ascii; 4 - 150:\'\u{96}\', false - non-ascii; chr cnt: \'5\'"'
report: '; 0 - 240:'ð', false - non-ascii; 1 - 159:'', false - non-ascii; 3 - 119:'w', true - ascii; 4 - 150:'', false - non-ascii; chr cnt: '5''
stt rpt chrs (count : '169'):
[';', ' ', '0', ' ', '-', ' ', '2', '4', '0', ':', '\'', 'ð', '\'', ',', ' ', 'f', 'a', 'l', 's', 'e', ' ', '-', ' ', 'n', 'o', 'n', '-', 'a', 's', 'c', 'i', 'i', ';', ' ', '1', ' ', '-', ' ', '1', '5', '9', ':', '\'', '\u{9f}', '\'', ',', ' ', 'f', 'a', 'l', 's', 'e', ' ', '-', ' ', 'n', 'o', 'n', '-', 'a', 's', 'c', 'i', 'i', ';', ' ', '2', ' ', '-', ' ', '1', '5', '1', ':', '\'', '\u{97}', '\'', ',', ' ', 'f', 'a', 'l', 's', 'e', ' ', '-', ' ', 'n', 'o', 'n', '-', 'a', 's', 'c', 'i', 'i', ';', ' ', '3', ' ', '-', ' ', '1', '1', '9', ':', '\'', 'w', '\'', ',', ' ', 't', 'r', 'u', 'e', ' ', '-', ' ', 'a', 's', 'c', 'i', 'i', ';', ' ', '4', ' ', '-', ' ', '1', '5', '0', ':', '\'', '\u{96}', '\'', ',', ' ', 'f', 'a', 'l', 's', 'e', ' ', '-', ' ', 'n', 'o', 'n', '-', 'a', 's', 'c', 'i', 'i', ';', ' ', 'c', 'h', 'r', ' ', 'c', 'n', 't', ':', ' ', '\'', '5', '\'']
stt chrs ascii:
;| |0| |-| |2|4|0|:|'|\u{f0}|'|,| |f|a|l|s|e| |-| |n|o|n|-|a|s|c|i|i|;| |1| |-| |1|5|9|:|'|\u{9f}|'|,| |f|a|l|s|e| |-| |n|o|n|-|a|s|c|i|i|;| |2| |-| |1|5|1|:|'|\u{97}|'|,| |f|a|l|s|e| |-| |n|o|n|-|a|s|c|i|i|;| |3| |-| |1|1|9|:|'|w|'|,| |t|r|u|e| |-| |a|s|c|i|i|;| |4| |-| |1|5|0|:|'|\u{96}|'|,| |f|a|l|s|e| |-| |n|o|n|-|a|s|c|i|i|;| |c|h|r| |c|n|t|:| |'|5|'|
ascii: 'w'

now I think the operation char::from(*uc) with the byte 159 has already produced a non-printable Control Character \u{9f} which than breaks all intent to write it to the terminal

Yes, \u{9f} is the APC control code which takes a string as an argument. That's eating some of your output. (It's supposed to be terminated by ST \u{9c}, but \u{96} is doing so in this case.) So you can probably also see the effect using this:

# \u9f is \xc2\x9f in utf8, etc
echo "something" $'\xc2\x9f' "followed by more things" $'\xc2\x96' "and more things"

And incidentally, as an alternative to using a file, you can also run

cargo run --example ascii-heart | cat -A

To escape tabs, non-printables, and add $ at the end of each line.

1 Like

Thank you.
The cat -A trick is a good hint.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.