Choosing randomly from zip files (beginner exercise)

I'm trying to learn Rust. To begin with, I set myself the exercise of
choosing a random picture from a bunch of .zip files. (The next step
is to incorporate this into a simple web framework to generate a slideshow
as I'm interested in learning Rust web frameworks, having explored
Python, PHP and Node.)

I can do this easily in Python, and the Python probably does a better job of what I'm trying to do in Rust.

import glob
import random
import zipfile

# Exercise 1: pick a .zip at random
zips = glob("*.zip")
z = random.choice(zips)
print(z)

# Exercise 2: pick a .zip at random, and pick a file at random
zips = glob("*.zip")
z = random.choice(zips)
with zipfile.ZipFile(z) as zf:
  fn = random.choice(zf.namelist())
  print(z,fn)

# Exercise 3: make a single list of pairs (zip_file_name,file_name)
# Then choose one at random
files = []
zips = glob("*.zip")
for zfn in zips:
  with zipfile.ZipFile(z) as zf:
    for fn in zf.namelist():
      files.append((zfn,fn))
print(random.choice(files))

My first attempt got lost in a pile of .unwrap()s and similar, and I was at a loss how to turn a glob into something I can choose randomly from. (For example, where in Python we can use list(x) for any sequence x to convert it to a list. In rust I've come across .collect(), but then I need to know the resulting type.)

For example

use glob::glob;                                                                                                                                                                                                                                                                                                         fn main() {                                                                                                                                                    
  let zips_glob = glob::glob("*.zip").expect("Failed to read glob pattern");                                                                                  
  let zips = zips_glob.collect();                                                                                                                             
  println!("{:?}",zips);                                                                                                                                   }            

yields

error[E0282]: type annotations needed
  --> src/main.rs:12:8
   |
12 |    let zips = zips_glob.collect();
   |        ^^^^
   |
help: consider giving `zips` an explicit type
   |
12 |    let zips: _ = zips_glob.collect();

so how to I work out the right type annotation? And do I need to do some map with unwrap of something so that I end up with a bunch of paths I can choose randomly from?

For example, where in Python we can use list(x) for any sequence x to convert it to a list. In rust I've come across .collect() , but then I need to know the resulting type.

The key difference here is that collect supports collecting into different types of collections. You can specify the collection type without necessarily specifying the item type:

let zips: Vec<_> = zips_glob.collect();

Or equivalently, using the type parameter on collect() itself:

let zips = zips_glob.collect::<Vec<_>>();

(The _ here just means "not specifying this type parameter", in the same way as you were previously not specifying the type of the variable at all.)

If you want to specify the item type too (which can be good for clarity), one way is to provoke a type error on purpose by setting the type to (), which will then tell you what it doesn't match:

let zips: Vec<()> = zips_glob.collect();
error[E0277]: a value of type `Vec<()>` cannot be built from an iterator over elements of type `Result<PathBuf, GlobError>`
 --> src/lib.rs:4:35
  |
4 | ... = zips_glob.collect();                                                                                                               ...
  |                 ^^^^^^^ value of type `Vec<()>` cannot be built from `std::iter::Iterator<Item=Result<PathBuf, GlobError>>`
  |
  = help: the trait `FromIterator<Result<PathBuf, GlobError>>` is not implemented for `Vec<()>`
  = help: the trait `FromIterator<T>` is implemented for `Vec<T>`

In this, look at the Item type of the Iterator: it is now a Result<PathBuf, GlobError>. So, the type you will probably want to collect is a Vec<PathBuf>, but there are some errors to handle first. The easy way to tackle those errors is to take advantage of collect's ability to produce a Result. Specifically, if you have an iterator of Result<T, E>, then collect() can give you a Result<Vec<T>, E>, where it aborts iteration at the first Err.

use std::path::PathBuf;

fn main() {
    let zips_glob = glob::glob("*.zip").expect("Failed to read glob pattern");
    let zips = zips_glob
        .collect::<Result<Vec<PathBuf>, glob::GlobError>>()
        .expect("failed to read directory");
    println!("{:?}", zips);
}

You could also map the iterator, zips_glob.map(|res| res.expect("oops")), but that is a less flexible pattern since the only error handling you can do that way is panicking or returning a different placeholder; it can't return the error.

All that is just a complication of the fact that this particular iterator can return errors, because it is performing IO. If there were no possible errors then it would be just

let zips: Vec<PathBuf> = zips_glob.collect();

Also, for comparison purposes, here is a function that behaves just like Python list():

fn list<I: IntoIterator>(it: I) -> Vec<I::Item> {
    it.into_iter().collect()
}
2 Likes

Thanks, that was very helpful. With that and a little googling I got 'exercise 3' working too. Obviously it's of a 'messy beginner' standard, but here's what I eventually came up with (which may be of use to other beginners) Comments on better ways welcome.

use std::fs::File;
use std::path::PathBuf;
use std::io::BufReader;
use zip::read::ZipArchive;
use rand::seq::SliceRandom;
use glob::glob;

// Exercise 2:
//   take a glob pattern (meant to be a collection of zip files)
//   choose one matching .zip at random (will crash if glob doesn't match)
//   from that matching filename, (assuming it is a .zip, else crash)
//      open it as a File
//      create a BufReader from the open file
//      create a ZipArchive from the BufReader
//      get a list of filenames in the .zip
//      choose one at random
//      print out the filename
fn ex2(pattern : &String) {
    let zips_glob = glob(&pattern).expect("Failed to read glob pattern");
    let zips : Vec<_> = zips_glob.map(|x| x.expect("oops_001")).collect();

    let chosen = zips.choose(&mut rand::thread_rng()).unwrap();
    println!("Zip {:?}",&chosen);

    let file = File::open(&chosen).expect("Failed to open zip file");
    let reader = BufReader::new(file);
    let zip = ZipArchive::new(reader).expect("Failed to make ZipArchive from file reader");

    let filenames : Vec<_> = zip.file_names().collect();
    let chosen_file = filenames.choose(&mut rand::thread_rng()).unwrap();

    println!("Filename: {:?}",chosen_file);
}

// Exercise 3:
//  The aim is to make a list of all files in all matching .zips

fn files_in_zip(zip_filename : &PathBuf) -> Vec<String> {
    let file = File::open(&zip_filename).expect("Failed to open zip file");
    let reader = BufReader::new(file);
    let zip = ZipArchive::new(reader).expect("Failed to make ZipArchive from file reader");

    zip.file_names().map(|x| x.to_string()).collect()
}

fn ex3(pattern : &String) {
    // this is the 'index' we're building
    let mut index : Vec<(PathBuf,String)> = vec![];

    // find matching .zips as before
    let zips_glob = glob(&pattern).expect("Failed to read glob pattern");
    let zips : Vec<_> = zips_glob.map(|x| x.expect("oops_001")).collect();

    // this is probably not the most efficient way to do this
    for zfn in zips {
        println!("Zip: {:?}",&zfn);
        let filenames = files_in_zip(&zfn);
        println!("  {:?} files",filenames.len());
        for f in filenames {
            index.push((zfn.clone(),f.clone()))
        }
    }
    let chosen_file = index.choose(&mut rand::thread_rng()).unwrap();
    println!("Chosen: {:?}",chosen_file);
}

fn main() {
    let pattern = "/path/to/*.zip".to_string();
    ex2(&pattern);
    ex3(&pattern);
}
  1. You don't necessarily need to collect to choose a random item: see rand::seq::IteratorRandom::choose.

  2. The clones in index.push((zfn.clone(),f.clone())) are probably unnecessary. In general you do not need to clone things when you're moving them from some temporary structure to their final destination.

    cargo clippy can warn you about unnecessary clones automatically.

  3. fn ex3(pattern : &String) { — Don't use &String; it is unnecessarily restrictive. Use &str instead, or String when the function needs an owned String. Similarly, use &Path in place of &PathBuf.

1 Like

Longer term, you're also going to want a Result strategy. Here are my rules of thumb, sorry they're so long, but there's a lot of info that isn't gathered in the same place that I wish I knew when I started:

Errors that "should never happen", eg they mean there's a bug somewhere, should panic. For example, if a function documents that it returns an error if you give it an odd number of items, and you are sure you're giving it an even number, you should simply .expect() the result. If that gets hit, either you were wrong, the docs were wrong, or the function is broken. You don't want to try and continue in any of those cases.

If an error can happen because of something at runtime, for example, trying to open a file that's not there, and you can't deal with it inside the function, you should return a Result so you can return it as an Err to the caller.

This will most often happen because you called a function that itself returned an error. Simply using ? will work most of the time: it will either immediately return the error from the whole function, or unwrap the success value.

You need a single error type returned for a function. Broadly, your best options are:

There's nothing wrong with doing your error types differently, but most of the time there's just not much reason to. (An example of when you might want to do something differently is when the error message needs some external context not available when the error is created).

Both of these let you provide a human readable message, and wrap an inner error with more context, so you're generally going to be covered.

You can mix and match error types between modules or functions, or have just one error type you use for the whole crate.

If you have a public error type, you should also have a public Result alias that binds the error type, see std::io::Result for the common example. You can either name these FooError and FooResult or just Error and Result, depending on how you expect them to be imported, see io::Result again, imported as use std::io.

These rules are mostly for being able to give your user a nice experience, especially if you're creating a shared library where you don't know how your code is going to be used, so use your judgment when you think an unwrap or expect is never actually going to happen or when it doesn't matter if you crash (maybe you're the only user, or it's some backend service where that's going to be just as good), but in general it's pretty easy to do the "right thing" with the libraries available now, so it doesn't matter too much.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.