Modern Rust style


#1

After not using Rust for about a year, I’m back. The language has changed a bit. I’d like style advice on how to write this better. This is a word wrap test that understands graphemes.

use std::io;
use std::str;
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;

fn main() {
    println!("Word wrap test.  Type some text.");
    let input = match readinputlines() {       
            Ok(s) => s,
            Err(_) => String::from("ERROR")
        };

println!("You typed: {}\n", input);
let ss = wordwrap(&input,30,15);
println!("Word wrapped: \n{}", ss);

}

//
//  readinputlines -- only reads one line for now
//
fn readinputlines() -> io::Result<String> {
    // read some lines from input
    let mut input = String::new();
    try!(io::stdin().read_line(&mut input));
    Ok(input)
    }


//
//  rfind for an array of graphemes
//
fn rfind(s: &[&str], key: &str) -> Option<usize> {
    (0 .. s.len()).rev().find(|&i| s[i] == key)     // search from right
}

//
//  wordwrap -- understands graphemes
//
//  Wrapping assumes all graphemes have the same visual width.
//
//  One grapheme is an &str.
//  A collection of graphemes is a Vec::<&str>
//
fn wordwrap(s: &str, maxline: usize, maxword: usize) -> String {
    assert!(maxword < maxline);                     // sanity check on params
    let mut outlines = Vec::<String>::new();        // accum output lines here
    for bline in s.lines() {                        // for all input lines
        let line = UnicodeSegmentation::graphemes(bline, true).collect::<Vec<&str>>(); // line as vec of graphemes
        let mut wline = &line[..];                  // mut ref to array of graphemes as slice                    
        let mut sline = Vec::<&str>::new();         // working vector of graphemes, whicgh are string slices
        while wline.len() > maxline {               // while line too long
            let ix = rfind(&wline[maxline-maxword .. maxline]," ");              // find rightmost space
            match ix {                              // usable word break point?
            None =>     { sline.extend(&wline[0 .. maxline]); // no, really long word which must be broken at margin
                          sline.push("\n");
                          wline = &wline[maxline ..]; }
            Some(ix) => { sline.extend(&wline[0 .. ix + maxline - maxword]); // yes, break at space
                          sline.push("\n"); 
                          wline = &wline[ix+maxline-maxword+1 ..]; }
            }
        }
        sline.extend(&wline[0..]);                  // accum remainder of line
        outlines.push(sline.join(""));              // combine graphemes into string
    }
    outlines.join("\n")                             // join output lines
}

Questions:

  1. Is “&line[…]” the right idiom for getting a slice? I understand “.as_slice()” is deprecated.
  2. Is there avoidable allocation here? “.extend()” implies cloning. But it’s just cloning a slice
    descriptor which then goes into an array. Does that actually do an allocation?

Thanks.


#2
  1. Yes :slight_smile:
  2. Extend doesn’t imply any intermediate allocation. Extend takes an IntoIterator type and iterates over it, moving the items yielded by the iterator into the receiving type. In this case, that means pushing them into a vector; this may cause the vector’s backing buffer to reallocate. In general, extend should not perform intermediate allocations.

I guess though you could be accumulating these into a String instead of creating a vector of &strs and then joining them together.


#3
  • You can either use let ref mut or &line[..].
  • Function comments have to have three forward slashes instead of two, and you must adhere to markdown syntax.
  • Instead of an assert! macro, you can use debug_assert! as assertion checks aren’t recommended for release builds.
  • Instead of asserting a panic for a release build, you can return and handle an error.
  • There is also a method where you can completely eliminate your Vec heap allocations if you design your algorithm using an Iterator approach.
  • You can create a handle to stdin and pass that to the read_input_lines() function so that you don’t have to continually create handles to stdin every time you read from it.

#4

OK, round 2. This has a more functional style. I considered doing “wordwrapline” recursively to get an entirely functional program, but since Rust doesn’t have a tail recursion optimization, that’s probably a bad idea.

///
///  rfind for an array of graphemes
///
fn rfind(s: &[&str], key: &str) -> Option<usize> {
    (0 .. s.len()).rev().find(|&i| s[i] == key)     // search from right
}

///
///  wordwrapline --  wordwrap for one line
///
fn wordwrapline(line: &[&str], maxline: usize, maxword: usize) -> String {
    let mut wline = &line[..];                  // mut ref to array of graphemes as slice                    
    let mut sline = Vec::<&str>::new();         // working vector of graphemes, which are string slices
    while wline.len() > maxline {               // while line too long
        let ix = rfind(&wline[maxline-maxword .. maxline]," ");              // find rightmost space
        wline = match ix {                      // usable word break point?
        None =>     { sline.extend(&wline[0 .. maxline]); // no, really long word which must be broken at margin
                      sline.push("\n");
                      &wline[maxline ..] }      // return shorter wline
        Some(ix) => { sline.extend(&wline[.. ix + maxline - maxword]); // yes, break at space
                      sline.push("\n"); 
                      &wline[ix+maxline-maxword+1 ..] } // return shorter wline
        }
    }
    sline.extend(&wline[..]);                   // accum remainder of line
    sline.join("")                              // return string
}
///
///  wordwrap -- understands graphemes
///
fn wordwrap(s: &str, maxline: usize, maxword: usize) -> String {
    debug_assert!(maxword < maxline);           // sanity check on params
    s.lines()
        .map(|bline| UnicodeSegmentation::graphemes(bline, true)    // yields vec of graphemes (&str)
            .collect::<Vec<&str>>())
        .map(|line| wordwrapline(&line, maxline, maxword))
        .collect::<Vec<String>>()        
        .join("\n")
}

#5

To create an iterator, you can essentially just create a structure that contains your three variables: s, maxline, maxword; along with a variable for keeping track of the current state of your iterator, such as a counter that knows where to start counting from, or your updated string slice; and then implement the Iterator trait on that structure like so:

impl<'a> Iterator for WordWrapper<'a> {
    type Item = &'a str;

    fn next<'a>(&mut self) -> Option<&'a str> {

    }
}

You can fill in the blank above with your actual code, that will loop and update it’s internal state, and either return None when finished or Some(value) when a value is found. You can then call your iterator like any other iterator:

for wrapped_line in WordWrapper::new(&input) {
    output.push_str(wrapped_line);
    output.push('\n');
}

#6

You can use Iterator::rposition.

&wline[a .. b].iter().rposition(|&s| s == " ")