What would be the proper way to fix extra slashes in URLs?

Context:
I got bit by bug as an atuin user: extra slash in user configuration, there's a PR that fixes that issue but raises another (follow the rabbit hole and you'll find out). Then I searched extensively and found this discussion on the standard Rust library that should solve all the problems (but doesn't): url::join behaviour.

The thing is, this is a very common scenario: you provide a way for your users to configure some URL and then you can have either an URL with an ending slash or not.

And that shouldn't be an issue, but problems arise when you start combining parts to build an end-point.

Anyway, I've searched again and found here on this forum this discussion from a starting ruster.

(Sorry, tried to link it but new users can only put 2 links, you can search it though: " Remove duplicate slashes: how to do this idiomatic?" by user cjk101010 aka Christian Kruse.)

At some point someone suggested this should be done using Url package functions, however I've been trying some stuff with no success: (sorry cannot link playgroun because first post rules)

Questions:

  1. is there a simple function I may have missed in the Url package or in another library that can do a simple join of two url parts without obliterating what I already have (path problem shown in my playground) and without concatenating empty slashes? In essence I'd want this behaviour:
fn url.append(a: &str, b: &str) -> Url {...}
fn url.append(a: Url, b: Url) -> Url {...}
// and also combinations of Url and string
  1. If not, what would be the suggested way to deal with this issue: a) clean the string parts before via RegExp and then try to build an Url, or b) Start by parsing the initial part of the string into an Url type and then work toward a proper appending by analyzing the parts brought by the parser?

  2. Any other possible simpler solution? I think someone must have been bitten by this already.

Thanks everyone for reading. Full disclosure: I'm not a ruster nor in the process of learning the language, so please don't assume I know anything.

1 Like

On OP's behalf: Remove duplicate slashes: how to do this idiomatic?

If you have code that you can't post on playpen, are you allowed to put it in a code block in a comment here? You can do that by surrounding the code in ```rust and ``` lines.

Also, are you able to provide a few examples of string/Url pairs that cause issues, and what output you want them to produce? You mentioned "path problem shown in my playground", but obviously couldn't link to it.

For future reference, I would prioritise linking to a playground showing the issue directly. You can always include more links in comments in the playground to get around the forum restrictions. :wink:

2 Likes

Url::path_segments_mut returns a result that contains a PathSegmentsMut that has a push method:

use url::Url;

fn main() {
    let mut no_trailing_slash = Url::parse("https://users.rust-lang.org").unwrap();
    let mut trailing_slash = Url::parse("https://users.rust-lang.org/").unwrap();
    
    no_trailing_slash.path_segments_mut().unwrap().push("path");
    trailing_slash.path_segments_mut().unwrap().push("path");
    
    println!("{no_trailing_slash}"); // https://users.rust-lang.org/path
    println!("{trailing_slash}"); // https://users.rust-lang.org/path
}

A little inconvenient, but you could write your own wrapper methods for this with proper error handling instead of unwraps. For example

fn join(a: &str, b: &str) -> anyhow::Result<Url> {
    let mut url = Url::parse(a)?;
    url.path_segments_mut()
        .map_err(|_| anyhow::anyhow!("{a} cannot be a base Url"))?
        .push(b);
    Ok(url)
}
1 Like

Thanks for the help linking Daniel. It slipped my mind to put the code here :man_facepalming: Here it goes:

use url::Url;

pub fn extend(addr: &str) -> Url {
    let mut url = Url::parse(addr).unwrap();
    url.path_segments_mut().unwrap().extend(["bar", "baz"]);
    url
}

pub fn join(addr: &str) -> Url {
    let url = Url::parse(addr).unwrap();
    url.join("/bar/baz").unwrap()
}

pub fn set_path(addr: &str) -> Url {
    let mut url = Url::parse(addr).unwrap();
    url.set_path("/bar/baz");
    url
}

pub fn segments_push(addr: &str) -> Url {
    let mut url = Url::parse(addr).unwrap();
    //url.path_segments_mut()?.push(addr);
    url.path_segments_mut().unwrap().push(addr);
    url
}

fn main() {
    // think of these constants as prepending WITH_
    // shortened for demo/clarity purposes
    const NO_SLASH: &str = "https://example.com";
    const SLASH: &str = "https://example.com/";
    const MULTIPLE_SLASHES: &str = "https://example.com//";
    const PATH: &str = "https://example.com/path";
    
    dbg!(extend(NO_SLASH).to_string()); // ok
    dbg!(extend(SLASH).to_string()); // ok
    // ! extend doesn't fix multiple slashes
    dbg!(extend(MULTIPLE_SLASHES).to_string()); // WRONG!
    dbg!(extend(PATH).to_string()); // ok
    
    dbg!(join(NO_SLASH).to_string()); // ok
    dbg!(join(SLASH).to_string()); // ok
    dbg!(join(MULTIPLE_SLASHES).to_string()); // ok
    // but join obliterates existing paths
    dbg!(join(PATH).to_string()); // WRONG!
    
    dbg!(set_path(NO_SLASH).to_string()); // ok
    dbg!(set_path(SLASH).to_string()); // ok
    dbg!(set_path(MULTIPLE_SLASHES).to_string()); // ok
    // idem join
    dbg!(set_path(PATH).to_string()); // WRONG!
    
    // ALL WRONG!, saw this on an old Url discussion and thought it could work
    // https://github.com/servo/rust-url/issues/333#issuecomment-1407648587
    dbg!(segments_push(NO_SLASH).to_string()); // WRONG!
    dbg!(segments_push(SLASH).to_string()); // WRONG!
    dbg!(segments_push(MULTIPLE_SLASHES).to_string()); // WRONG!
    dbg!(segments_push(PATH).to_string()); // WRONG!    
}
1 Like

Here's the best I've come up with:

pub fn join(addr: &str) -> Url {
    let mut url = Url::parse(addr).unwrap();
    url.path_segments_mut().unwrap().pop_if_empty().push("");
    url.join("bar/baz").unwrap()
}

The .pop_if_empty().push("") part first makes sure that there is no trailing double slash, then makes sure there is a single trailing slash.

I also removed the leading / from "/bar/baz". This is because that leading slash is the reason join is removing the existing path: you're joining an absolute URL, which is supposed to reset back to the root. If you want to allow a leading slash, your best bet is probably to use something like "/bar/baz".trim_start_matches('/') to get rid of them.

Hope that helps.

1 Like

What's the expected output from these?

Thank you very much Daniel. I had read about the leading slash making the replacement on similar functions working with filesystem paths and I didn't make the mental connection. Probably is also written on the url.join documentation.

I made a solution involving regexp to clean both the base url and the appendable part, but I like yours more. Although I'll probably keep the regexp to clean the second part.

Anyway, thanks a lot :slight_smile:

All should produce "https://example.com/bar/baz" or "https://example.com/path/bar/baz" when there's previous "path".

1 Like

I should have thought to mention this in one of my replies, but here it is for future reference: the absolute best kind of example code would include a test. Something like:

#[test]
fn help_my_code_is_broken() {
    assert_eq!(some_function(), "the answer you expected");
}

Those can be run in the playpen, and let people quickly see both what the code is doing, and what the expected output is supposed to be.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.