Need help to convert example to Rust


#1

Hello! I am writing article and I need idiomatic example of Rust code that can be maximally equivalently to this code D code:

import std.algorithm, std.stdio, std.string;
// Count words in a file using ranges.
void main()
{
    auto file = File("file.txt"); // Open for reading
    const wordCount = file.byLine()            // Read lines
                          .map!split           // Split into words
                          .map!(a => a.length) // Count words per line
                          .sum();              // Total word count
    writeln(wordCount);
}

Could you help me and provide example if this code in Rust? If you can with comments plz! Thanks!


#2

I don’t think Rust is old enough for everyone to agree what “idiomatic” is in this case. The major issue is that the D code has absolutely no error handling, whereas Rust doesn’t allow you to ignore it like that. So:

Standard disclaimer: it is not idiomatic to use unwrap in Rust code where you know errors might happen. The only reason I’m doing it here is because the D code also doesn’t perform any error handling that doesn’t boil down to “explode”.

use std::fs::File;
use std::io::{BufRead, BufReader};

fn main() {
    let word_count =
        // Open file, discard errors, wrap in a buffer.
        BufReader::new(File::open("file.txt").unwrap())
        // Build line iterator from buffered file.
        .lines()
        // Discard IO errors caused during reading lines.
        .map(Result::unwrap)
        // Split each line into words and count them.
        .map(|line| line.split_whitespace().count())
        // Sum word counts.
        .fold(0, |a, b| a + b);
    println!("{}", word_count);
}

Also to note: the reason the “split” and “count” steps are a single step here is that separating them into two steps would require an intermediate allocation to “hold” the lines long enough for the split to be counted. It’s easier to just not make it two steps.

Here’s one that actually does error handling:

use std::fs::File;
use std::io::{self, BufRead, BufReader};

fn count_words() -> Result<(), io::Error> {
    let line_counts =
        // Open file, wrap in a buffer.
        BufReader::new(try!(File::open("file.txt")))
        // Build line iterator from buffered file.
        .lines()
        // Split each line into words and count them.
        .map(|rl| rl.map(|l| l.split_whitespace().count()));

    // Sum line counts, aborting on error.
    let mut word_count = 0;
    for count in line_counts {
        word_count += try!(count);
    }

    println!("{}", word_count);
    Ok(())
}

fn main() {
    // Discard errors.
    count_words().unwrap();
}

The reason this is so verbose is basically that there’s no way to combine fold and try! short of writing a custom try_fold iterator adaptor or something, which is what I would do if I was writing a lot of code like this.

You can do the fold with error handling inside of it, but this has the downside of continuing to try and process the input even after a failure. In this specific case, it should probably be fine, since I would expect the Lines iterator to stop on an error, but in general (where each element of a sequence could potentially fail independently), it’s a bit of a pain. Here’s a version with error handling in the fold:

use std::fs::File;
use std::io::{self, BufRead, BufReader};

fn count_words() -> Result<(), io::Error> {
    let word_count =
        // Open file, wrap in a buffer.
        BufReader::new(try!(File::open("file.txt")))
        // Build line iterator from buffered file.
        .lines()
        // Split each line into words and count them.
        .map(|rl| rl.map(|l| l.split_whitespace().count()))
        // Sum line counts, propagating errors.
        .fold(Ok(0), |a, b| a.and_then(|a| b.map(|b| a+b)));
    println!("{}", try!(word_count));
    Ok(())
}

fn main() {
    // Discard errors.
    count_words().unwrap();
}

Like I said: what counts as “idiomatic”, especially in a tiny example like this, isn’t entirely clear.


#3

As I’m learning Rust, I thought this was a good exercise.

Here is my version, without error handling (just like the D version)

    use std::fs::File;
    use std::io::prelude::*;
    use std::path::Path;
    use std::string::*;

    fn main() 
    {

        let mut text = String::new(); // writable/growable string instance

        File::open(&Path::new("hello.txt")) //open a file
                               .unwrap() //don't check errors and get the File instance from the Result returned
                               .read_to_string(&mut text).ok(); //insert the content of the file in the mutable string

        let word_count = &text[..] //convert the growable String into a fixed str type
                        .split(' ') //split by whitespaces
                        .count(); // return word count

        println!("file has {} words", word_count); //print
    }

I’ll be interested to read your article.


#4

That doesn’t split by whitespace: it splits by spaces. There’s more than one kind of whitespace.


#5

I tried to use split_whitespace on the str, but on 1.2, wasn’t able to for some reason.


#6

A few simplifications:

This import isn’t needed; String is in the default prelude.

Path isn’t needed, you can pass str to open() directly.

This isn’t an accurate description: it will check errors and exit the process when an error is found.

A version that just doesn’t put anything in text when opening fails could be

File::open("...").map(|mut f| f.read_to_string(&mut text)).ok();

This one you could call “don’t check errors”.

This isn’t necessary explicitly; you can call str methods on Strings.


#7

[quote=“raizam, post:3, topic:2658”]
convert the growable String into a fixed str type
[/quote]All str methods work on String directly thanks to deref coersions.


#8

Playpen says it works on stable (1.2).

Also, &text[..] doesn’t do what I suspect you think it does. & has low precedence, so what you’re actually doing is borrowing the result of .count(). The code works because println! will auto-deref to display something.


#9

Thank you @DanielKeep, @birkenfeld for explanations, I’m learning a lot :slight_smile:

Here is an updated version, let me know if it’s fine.

     use std::fs::File;
    use std::io::prelude::*;

    fn main()
    {

        let mut text = String::new(); // writable/growable string instance

        File::open("hello.txt") //open a file
                               .unwrap() //exits process if errors while opening, and get the File instance from the Result returned
                               .read_to_string(&mut text).ok(); //insert the content of the file in the mutable string, don't check read errors

        let word_count = text.lines() //split lines
                        .map(|x| x.split_whitespace().count()) //returns a Vec<u32> with a sum of words for each lines
                        .fold(0, |acc, val| acc + val); //aggregates word count from each line
              

        println!("file has {} words", word_count); //print
    }

split_whitespace is actually working using 1.2.
I was using Visual Rust has the IDE, and I don’t know which version of Rust it’s using, I thought it was 1.2.


#10

Nice! Note that there will be a sum() method on iterators, it’just still unstable at the moment.

In general the approach using a BufReader and its lines() iterator will be preferable, because it doesn’t need to read the whole file into memory at once.


#11

So from my understanding, implementing a deref trait on type A, returning a type B as *value would merge the functions of B to A? (Not sure I’m formulating correctly)

Sounds really amazing


#12

[quote=“raizam, post:9, topic:2658”]
returns a Vec<u32> with a sum of words for each lines
[/quote]map returns another iterator not a vector. It’s important to understand that lazy iterators process the data as it comes and allow you to not store what you don’t need.

[quote=“raizam, post:11, topic:2658”]
So from my understanding, implementing a deref trait on type A, returning a type B as *value would merge the functions of B to A?
[/quote]Yes. It mostly makes sense for smart pointers and wrappers as you can see from the list of implementations.


#13

First thing that came in my mind is:

impl<T, E> Deref for Result<T,E> {

fn deref(&self) -> &T {
    self.unwrap()
}
}

I know it’s a terrible idea, but so tempting =)
Also, I’m disappointed I couldn’t test it, it seems I cannot use impl on types from foreign crates : /
Any reason for this?

edit

error: type parameter T must be used as the type parameter for some local type (e.g. MyStruct<T>); only traits defined in the current crate can be implemented for a type parameter [E0210]


#14

No; the rule is that either the trait or the type you’re implementing it for must be local to your crate.

This is to avoid several problems:

  • local and third-party implementations of a trait for the same type
  • multiple third-party crates implementing the trait for the same type
  • how to get the impl into scope? how to select which one is the right one?

#15

I see, it make sence, thanks!