Transforming a String (URL decode)

I know there is an API for this but in order to help myself understand Rust more, I've written my own URL decoder. I've done it in two different ways. Both work but I wonder if there is a neater way in some iterator feature I've not met yet or which one you might prefer to use because of reasons I don't know yet when writing things like this. I'm using two different ways to iterate over the characters in a string and build up a new one based on it. One uses a filter_map and the other a for loop.

I found it interesting I could make a closure change data outside the closure!

What do you think?

fn main() {
    let url =
        "https%3A%2F%2Fdoc.rust-lang.org%2Fstd%2Fiter%2Ftrait.Iterator.html%23method.try_reduce"
            .to_string();

    println!("{}", decode_url1(&url));
    println!("{}", decode_url2(&url));
}

fn decode_url2(url: &str) -> String {
    let mut is_hex = false;
    let mut hex_char = String::new();

    let url_decode = |c| {
        if c == '%' {
            hex_char.clear();
            is_hex = true;
            None
        } else if is_hex {
            hex_char.push(c);
            if hex_char.len() == 2 {
                is_hex = false;
                Some(hex_to_char(&hex_char))
            } else {
                None
            }
        } else {
            Some(c)
        }
    };

    url.chars().filter_map(url_decode).collect()
}

fn decode_url1(url: &str) -> String {
    let mut ret = String::with_capacity(url.len());
    let mut is_hex = false;
    let mut hex_char = String::new();

    for c in url.chars() {
        if c == '%' {
            hex_char.clear();
            is_hex = true;
        } else if is_hex {
            hex_char.push(c);
            if hex_char.len() == 2 {
                is_hex = false;
                ret.push(hex_to_char(&hex_char))
           }
       } else {
           ret.push(c);
       }
    }
    ret
}

fn hex_to_char(s: &str) -> char {
    char::from((u8::from_str_radix(s, 16)).unwrap())
}

The point of a closure is that it can access ("capture") data outside its own scope.

I'd avoid mutable state, like this:

fn decode_url3(s: &str) -> String {
    let mut parts = s.split('%');
    let head = parts.next().map(str::chars).into_iter().flatten();
    let rest = parts.flat_map(|fragment| {
        let byte = u8::from_str_radix(&fragment[..2], 16).unwrap();
        core::iter::once(char::from(byte)).chain(fragment[2..].chars())
    });
    
    head.chain(rest).collect()
}

or even more purely:

fn decode_url4(s: &str) -> String {
    let (head, rest) = match s.split_once('%') {
        None => return s.into(),
        Some(pair) => pair,
    };
    let rest = rest.split('%').flat_map(|fragment| {
        let byte = u8::from_str_radix(&fragment[..2], 16).unwrap();
        core::iter::once(char::from(byte)).chain(fragment[2..].chars())
    });
    
    head.chars().chain(rest).collect()
}

Apart from that, you should probably be handling encoding errors.

1 Like

I find using next() directly more convenient than having is_hex and hex_char around:

fn urldecode(s: &str) -> String {
    let mut res = String::new();
    let mut iter = s.chars();

    while let Some(c) = iter.next() {
        if c == '%' {
            let left = iter.next();
            let right = iter.next();
            match (left, right) {
                (Some(l), Some(r)) => {
                    let byte = u8::from_str_radix(&format!("{}{}", l, r), 16).unwrap();
                    res += &(byte as char).to_string();
                }
                _ => panic!(),
            }
        } else {
            res += &c.to_string();
        }
    }
    res
}

Also, if you want to see an optimized version of urldecoding, look here: rust_urlencoding/dec.rs at main · kornelski/rust_urlencoding · GitHub

1 Like

@ jofas Thank you. I've been slowly working my way through your example and I think I understand what it is all doing now. My solutions look very awkward in comparison.
Thanks to the link to the API. That will be interesting study. Plus in my real little app I'll use the bona-fide API. It is good to see how things work under the covers.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.