More ergonomy notes


#1

If you keep using Rust std library/Prelude for a while you keep finding several small annoyances that I’d really like to avoid. As example I show part of the solution of the Euler Problem #107 ( https://projecteuler.net/problem=107 ), a function that just loads the input data. The input file looks like:

-,-,-,427,668,495,377,678,-,177,-,-,870,-,869,624,...
-,-,262,-,-,508,472,799,-,956,578,363,940,143,...

In Python you can load the network like this:

# Code#1
def read_network(path):
    return sorted((int(v[l]), k, l)
                  for k, v in enumerate(line.split(',') for line in file(path))
                  for l in xrange(len(v))
                  if v[l][0] != "-")

This is a possible (working) Rust function that does the same (I don’t load the file lazily because the input file is less than 6000 chars long, so it’s not a problem):

// Code#2
fn read_network(path: &str) -> Vec<(usize, usize, usize)> {
    use std::fs::File;
    use std::io::prelude::Read;

    let mut data = String::new();
    File::open(path).unwrap().read_to_string(&mut data).unwrap();

    let mut result =
        data
        .lines()
        .map(|row| row.split(',').collect::<Vec<_>>())
        .enumerate()
        .flat_map(|(k, v1)| {
            let v2 = v1.clone();
            (0 .. v1.len())
            .filter(move |&l| !v1[l].starts_with("-"))
            .map(move |l| (v2[l].trim().parse().unwrap(), k, l))
        })
        .collect::<Vec<_>>();

    result.sort();
    result
}

Writing Rust code with flat_map() is sometimes a bit painful because of the borrowck.

This is “fantasy” Rust code, that should do about the same (some small bits could be wrong):

// Code#3
fn read_network(path: &str) -> Vec<(usize, usize, usize)> {
    seq!{(v2[l].trim().parse().unwrap(), k, l)
         for (k, v) in Text::new(path)
                       .unwrap()
                       .lines()
                       .map(|r| r.unwrap().split(',').to_vec())
                       .enumerate()
         for l in 0 .. v.len()
         if !v1[l].starts_with("-")}
    .sorted()
}

seq!{} is a standard Prelude macro to create lazy multi-for comprehensions similar to lazy Python generators, that avoids lot of map/filter/flat_map, and simplifies lot of code a lot.

.to_vec() replaces the very common and too much long .collect::<Vec<_>>().

Text::new(path).unwrap().lines() reads the buffered text file lazily by lines.

You can also add a simple eager load function read_file():
for (k, v) in read_file(path).unwrap().lines().map(|r| r.split(',').to_vec()).enumerate()

.sorted() is similar to Python sorted() function, it collects the items into an array, sorts it, and returns it.

Code#3 (11 lines) is still a little heavy compared to Code#1 (5 lines), but I think it’s quite more readable than Code#2 (20 lines).


#2

Your proposed Text::new(path).unwrap().lines() is equivalent to BufReader::new(File::open(path).unwrap()).lines(), I think?

seq! is sort of a neat idea… but it’s not really buying much over the imperative for ... { for ... { if ... { v.push(x) } } } pattern.

Cleaned-up version of the main part of your Code#2:

    let mut result =
        data
        .lines()
        .map(|row| row.split(','))
        .enumerate()
        .flat_map(|(k, v)| {
            v
            .enumerate()
            .filter(|&(l, v)| !v.starts_with("-"))
            .map(|(l, v)| (v.trim().parse().unwrap(), k, l))
            .collect::<Vec<_>>()
        })
        .collect::<Vec<_>>();

#3

There’s some difference:

fn foo1() {
    use std::fs::File;
    use std::io::{BufReader, BufRead};
    let rows = BufReader::new(File::open("data_file.txt").unwrap()).lines();
}

fn foo2() {
    use std::io::Text;
    let rows = Text::new("data_file.txt").unwrap().lines();
}

Shorter code, and less things to remember. I don’t know if it’s worth having in the std lib.

I don’t agree, that macro is worth having in the standard Prelude.

seq!{} is lazy, while those nested loops with a push are eager.

Also seq!{} could be chained with other things, as I’ve shown with a sorted(), unlike the nested loops. It works with and not against the iterators-chain style of programming.

Compared to for loops (lazy) sequence comprehensions get “chunked” better by the mind of the programmer. This means you see them better as a single “thing”, compared to for loops, this speeds up coding and code understanding. I have seen this while programming in both Python and Haskell.

That’s a bit nicer, thank you. Compared to the (hypotetical and untested) seq!{} version, your code has an extra array copy, with a .collect().

I can further modify it a bit like (also to remove a warning):

let mut result =
    data
    .lines()
    .map(|row| row.split(','))
    .enumerate()
    .flat_map(|(k, v)|
        v
        .enumerate()
        .filter(|&(_, v)| !v.starts_with("-"))
        .map(|(l, v)| (v.trim().parse().unwrap(), k, l))
        .collect::<Vec<_>>())
    .collect::<Vec<_>>();

#4

The inner collect can be eliminated if the k owned by the flat_map closure is moved into the map closure, thereby keeping the borrow-checker happy.

let result =
    data
    .lines()
    .map(|row| row.split(','))
    .enumerate()
    .flat_map(|(k, v)| {
        v
        .enumerate()
        .filter(|&(_, v)| !v.starts_with("-"))
        .map(move |(l, v)| (v.trim().parse::<i64>().unwrap(), k, l))
    })
    .collect::<Vec<_>>();

(Playground)

(Edited for correctness) To elaborate: The flat_map closure creates another closure that is passed into map using the k that’s on the stack when flat_map is invoked. But since the map iterator is lazy and not immediately consumed, the k is unavailable on the stack by the time the map is enumerated. Thus, by moving k from the stack into the map closure you ensure that the map iterator is self-contained and not referencing invalidated data.

Edit: Any reason that the rust formatting on the block of code does not survive the final rendering? (It shows up fine on the preview.)


#5

Sorry I just have to respond to all this, ergonomy is nice and all, less lines of code and all that, but personally I think readability is starting to suffer here. I don’t think the goal of programming should only be “Can I do this with the least number of characters?” You also should consider “Can I come back to this code in year and understand what the hell I was trying to do?”

Just my two cents…


#6

Quick note: I updated the explanation since I conflated a closure owning something with that something being on the stack, which is the real cause of the compiler error.


#7

Yes, that’s precisely the main point of this thread. There are several ways to improve the current Rust situation regarding code readability, code simplicity, and long-term manutenibility. Introducing some features like the seq!{}, sorted, and few more, is probably a win-win, because it shortens the code and makes it less noisy and more easy to fix and improve when you read it 6 months later.