I just began to learn Rust and am very new to this community, so please tell me when this kind of posts is unwanted.
I have a string describing a URL; it may contain duplicate slashes (like //foo//bar/baz). I want to remove the duplicate slashes and return a tidy URL. I came up with this solution; how would one write this in a more idiomatic way?
fn clean_url(path: &String) -> String {
let mut i = 0;
let len = path.len();
let mut clean_path = String::with_capacity(len);
let chars: Vec<char> = path.chars().collect();
if &path[0..7] == "http://" {
i = 7;
clean_path.push_str("http://");
} else if &path[0..8] == "https://" {
i = 8;
clean_path.push_str("https://");
}
while i < len {
if i > 0 && chars[i - 1] == '/' && chars[i] == '/' {
i += 1;
continue;
}
clean_path.push(chars[i]);
i += 1;
}
return clean_path;
}
Someone else might be able to do a better job but this is my translation of your code into more idiomatic Rust. Feel free to ask if anything is unclear.
fn clean_url(path: &str) -> String {
let len = path.len();
let mut clean_path = String::with_capacity(len);
// Don't clean `https://` and `http://` prefixes
let sub_path = if path.starts_with("https://") || path.starts_with("http://") {
// push the prefix to our clean string
let pos = "http://".len(); // pos = 7
clean_path.push_str(&path[..pos]);
// assign `sub_path` to use a slice of the path that skips over the prefix
&path[pos..]
} else {
// otherwise use the full path
path
};
let mut last_chr = '\0'; // can be any value except `/`
for chr in sub_path.chars() {
if chr == '/' && last_chr == '/' {
continue;
}
clean_path.push(chr);
last_chr = chr;
}
clean_path
}
Thank you. Your trick with ignoring if it is an http or https prefix is neat, I didn't notice that it doesn't matter when you set the start position to "http://".len() but you are right.
&str is usually preferred because you can also use static strings without having to allocate a String. For example, with a function that takes &str we can do:
clean_url("//foo//bar/baz")
And this will just work. Whereas with &String we would have to do:
use regex::Regex;
fn clean_url(path: &str) -> String {
// Match with two groups, first group is "http://", "https:// or the
// beginning characters until the first slash, second group is the rest.
let re_proto = Regex::new(r"^(https?://|[^/]*)(.*)$").unwrap();
let caps = re_proto.captures(path).unwrap();
// Match one or more slashes.
let re_path = Regex::new(r"/+").unwrap();
caps[1].to_string() + &re_path.replace_all(&caps[2], "/")
}
You can further optimize this if you use lazy_static to store the compiled patterns re_proto and re_path.
You could also use the url crate to parse the url and then reconstruct it ditching the empty path segments. Maybe it's even capable of that somewhat automatically, but I haven't tried it.
It might avoid issues with double slashes appearing anywhere except in the path segment.
I read that there is a regex module but wasn't yet able to get a regex running. Something was always wrong with the types. Thanks for a working example!
To be honest, I don't really understand your version. The documentation of AsRef doesn't make thinks more comprehensible for me. Can you elaborate or give me a pointer to a tutorial explaining this?
The usage of extend and filter, on the other hand, taught me a lot. Thanks!
I believe the T: AsRef<str> in the function signature really just means your function argument can be anything that implements the AsRef trait for <str>. It just makes it so that your function can accept String or str types.