Remove duplicate slashes: how to do this idiomatic?

Hi there,

I just began to learn Rust and am very new to this community, so please tell me when this kind of posts is unwanted.

I have a string describing a URL; it may contain duplicate slashes (like //foo//bar/baz). I want to remove the duplicate slashes and return a tidy URL. I came up with this solution; how would one write this in a more idiomatic way?

fn clean_url(path: &String) -> String {
    let mut i = 0;
    let len = path.len();
    let mut clean_path = String::with_capacity(len);
    let chars: Vec<char> = path.chars().collect();

    if &path[0..7] == "http://" {
        i = 7;
        clean_path.push_str("http://");
    } else if &path[0..8] == "https://" {
        i = 8;
        clean_path.push_str("https://");
    }

    while i < len {
        if i > 0 && chars[i - 1] == '/' && chars[i] == '/' {
            i += 1;
            continue;
        }

        clean_path.push(chars[i]);
        i += 1;
    }

    return clean_path;
}

Best regards,
CK

Someone else might be able to do a better job but this is my translation of your code into more idiomatic Rust. Feel free to ask if anything is unclear.

fn clean_url(path: &str) -> String {
    let len = path.len();
    let mut clean_path = String::with_capacity(len);
    
    // Don't clean `https://` and `http://` prefixes
    let sub_path = if path.starts_with("https://") || path.starts_with("http://") {
        // push the prefix to our clean string
        let pos = "http://".len(); // pos = 7
        clean_path.push_str(&path[..pos]);

        // assign `sub_path` to use a slice of the path that skips over the prefix
        &path[pos..]
    } else {
        // otherwise use the full path
        path
    };
    
    let mut last_chr = '\0'; // can be any value except `/`
    for chr in sub_path.chars() {
        if chr == '/' && last_chr == '/' {
            continue;
        }
        clean_path.push(chr);
        last_chr = chr;
    }
    
    clean_path
}

You can try it using the Rust playground.

Thank you. Your trick with ignoring if it is an http or https prefix is neat, I didn't notice that it doesn't matter when you set the start position to "http://".len() but you are right.

Why do you use a &str instead of a &String?

&str is usually preferred because you can also use static strings without having to allocate a String. For example, with a function that takes &str we can do:

clean_url("//foo//bar/baz")

And this will just work. Whereas with &String we would have to do:

clean_url(&String::from("//foo//bar/baz"))
1 Like

You can also use regexes if you like.

use regex::Regex;

fn clean_url(path: &str) -> String {
    // Match with two groups, first group is "http://", "https:// or the
    // beginning characters until the first slash, second group is the rest.
    let re_proto = Regex::new(r"^(https?://|[^/]*)(.*)$").unwrap();
    let caps = re_proto.captures(path).unwrap();
    // Match one or more slashes.
    let re_path = Regex::new(r"/+").unwrap();
    caps[1].to_string() + &re_path.replace_all(&caps[2], "/")
}

You can further optimize this if you use lazy_static to store the compiled patterns re_proto and re_path.

Try it on playground.

3 Likes

Here's my idiomatic version.

fn clean_url<T: AsRef<str>>(path: T) -> String {
    let path = path.as_ref();

    let mut prefix = path
        .find("://")
        .map(|i| (&path[..i + 3]).to_string())
        .unwrap_or_else(String::new);

    let mut slash = false;
    prefix.extend((&path[prefix.len()..]).chars().filter(|c| {
        let keep = !slash || *c != '/';
        slash = *c == '/';
        keep
    }));
    prefix
}
3 Likes

You could also use the url crate to parse the url and then reconstruct it ditching the empty path segments. Maybe it's even capable of that somewhat automatically, but I haven't tried it.

It might avoid issues with double slashes appearing anywhere except in the path segment.

2 Likes

I read that there is a regex module but wasn't yet able to get a regex running. Something was always wrong with the types. Thanks for a working example! :slight_smile:

To be honest, I don't really understand your version. The documentation of AsRef doesn't make thinks more comprehensible for me. Can you elaborate or give me a pointer to a tutorial explaining this?

The usage of extend and filter, on the other hand, taught me a lot. Thanks! :slight_smile:

I did ignore the URL crate because for learning purposes I wanted to build it myself, but thanks for the pointer! :slight_smile:

I believe the T: AsRef<str> in the function signature really just means your function argument can be anything that implements the AsRef trait for <str>. It just makes it so that your function can accept String or str types.

100% agree use a URL lib on production code, but for a learning exercise...

Maybe split on slashes and filter out empty strings, then recombine?

fn clean_url(url: &str) -> String {
    let mut url_parts = url.splitn(2, "://");
    let prefix = url_parts.next().unwrap_or("");
    let path = url_parts.next().unwrap_or("");
    let clean_path = path
        .split("/")
        .filter(|p| !p.is_empty())
        .collect::<Vec<&str>>()
        .join("/");
    [prefix, clean_path.as_str()].join("://")
}

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn test_clean_url() {
        assert_eq!(clean_url("https://a/b/c.html"), "https://a/b/c.html");
        assert_eq!(clean_url("https://a/b//c.html"), "https://a/b/c.html");
        assert_eq!(clean_url("https://a/b///c.html"), "https://a/b/c.html");
    }
}

I'm not sure about .next(), maybe there's a nice way to use .map() over the split?

Edit: here it is with just .map(). Although, I think this allocates and parses prefix as a String unnecessarily, if you care about that sort of thing.

fn clean_url(url: &str) -> String {
    url.splitn(2, "://")
        .map(|s| s
            .split("/")
            .filter(|p| !p.is_empty())
            .collect::<Vec<&str>>()
            .join("/"))
        .collect::<Vec<String>>()
        .join("://")
}

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn test_clean_url() {
        assert_eq!(clean_url(""), "");
        assert_eq!(clean_url("https://"), "https://");
        assert_eq!(clean_url("https://a/b/c.html"), "https://a/b/c.html");
        assert_eq!(clean_url("https://a/b//c.html"), "https://a/b/c.html");
        assert_eq!(clean_url("https://a/b///c.html"), "https://a/b/c.html");
    }
}

This also fails on the following test, which rust-url would probably pick up on https://crates.io/crates/url:

assert_eq!(clean_url("file:///a/b/c.html"), "file:///a/b/c.html");

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.