A `&'static str` version of `std::path::MAIN_SEPARATOR`?

std::path::MAIN_SEPARATOR is a char. I need a &'static str version of it in stable Rust.

There is std::sys::path::MAIN_SEP_STR but it is private.

Please advise, thanks.

Can you be a little bit more specific about why you need it?

Sure, see SEP_STR and dirname in https://github.com/danielpclark/faster_path/pull/132/files

To my untrained eye, your dirname() function seems to assume that "/" is a valid root directory, which AFAIK is only guaranteed to be true on Unices. If that is correct, then I think hardcoding SEP_STR to "/" in your code would not cause a loss of portability.

2 Likes

You can do this with some unsafe code:

use std::path;

static SEP: char = path::MAIN_SEPARATOR;

#[inline(always)]
fn sep_as_slice() -> &'static str {
    unsafe {
        std::mem::transmute(std::slice::from_raw_parts(&SEP as *const _, 1))
    }
}

lazy_static is probably another option.

I filed rust-lang/rust#46712 to make MAIN_SEPARATOR.as_ref() or something like that give you a &'static str. For now I agree with @vitalyd, go with their unsafe code or lazy_static.

1 Like

Beware that this assumes little-endian byte order!

And in general, only ASCII characters (< 0x80) can work directly as UTF-8.

How so?

In little-endian, '/' is 2F 00 00 00, but in big-endian it's 00 00 00 2F. So directly casting a pointer to the char will get you "\0" on big-endian targets. You could offset the pointer to get the little byte though.

I get the bit layout part. How many bytes will a read through a &str do? I thought it would be 4, but I guess that's either wrong or you're referring to pulling bytes out of the &str individually being wrong?

It will read only the length of the slice, which you told it was 1.

I told it 1 of type T, which is char here. I was going by the assumption (which sounds like it'd be wrong) that if I have the loader place a static SEP in memory (in whatever endianness the machine uses), with it being of type char (4 bytes), then I can form a slice to it and read data through it.

But then the transmute changes the type, without changing the contents. So that fat (ptr, 1) which was (*const char, usize) is now considered (*const u8, usize), a UTF-8 byte slice of length 1.

If it did read 4 bytes in the str, you'd get either "/\0\0\0" or "\0\0\0/" depending on endianness.

2 Likes

Right, ok - I see what you mean.

Presumably adjusting the slice length to account for char->byte views would work irrespective of endianness?

This works on big-endian:

use std::path;

static SEP: char = path::MAIN_SEPARATOR;

#[inline(always)]
fn sep_as_slice() -> &'static str {
    unsafe {
        let bytes = (&SEP as *const _ as *const u8).offset(3);
        std::mem::transmute(std::slice::from_raw_parts(bytes, 1))
    }
}

You could alternate this with #[cfg(target_endian = "...")].

2 Likes

Right, you could do that to get back to a single-byte [u8] for BE. But can the following work for either endianness:

#[inline(always)]
fn sep_as_slice() -> &'static str {
    unsafe {
        std::mem::transmute(std::slice::from_raw_parts(&SEP as *const _ as *const u8, ::std::mem::size_of::<char>()))
    }
}

No, that's what I said will get either "/\0\0\0" or "\0\0\0/". Those 0 bytes are each valid UTF-8.

1 Like

Ok, understood (I didn't see that part above). For some (now seemingly) silly reason I thought going through a slice would do a "fixup", but that's a thinko on my part :slight_smile:.

1 Like

Maybe a dumb question, but do we have a usability problem if "I need the path to the root directory" devolves into discussions of endianness and unsafety?

Isn't there a path:: filesystem_root or something?

That would need to be context-dependent since some popular operating systems (cough cough Windows) do not have a single user-visible filesystem root, but rather one root per mounted logical drive.

But it does seem to me that so far, this thread demonstrated the need for a filesystem_root() function returning a string, more than the need for a path separator string.

1 Like