Is thre a better way to get substring from String with Range?

Hello,

First I'm new to rust let me explain my issue :

I really would like an interface like that on String:

impl String { fn sub(&self, range: Range<usize>) -> String }

I'm not using .get(0..2) because I want unicode scalars

use std::ops::Range;

#[derive(Debug)]
struct BetterString(String);

impl BetterString {
    fn sub(&self, range: Range<usize>) -> String {
        self.0.chars().skip(range.start).take(range.end - range.start).collect()
    }
}
fn main() {
    let s = BetterString { 0: String::from("éèéè") };

    dbg!(&s);
    dbg!(s.sub(0..1));
}

This code compile and gives :

     Running `target/debug/general`
[src/main.rs:14] &s = BetterString(
    "éèéè",
)
[src/main.rs:15] s.sub(0..1) = "é"

Thanks for your help or suggestions.

If you want to have that method directly on String, you can use the extension trait pattern put it in a trait which you implement for String:

trait Substring {
    fn sub(&self, range: Range<usize>) -> String;
}

impl Substring for String {
    fn sub(&self, range: Range<usize>) -> String {
        self.chars().skip(range.start).take(range.end - range.start).collect()
    }
}

fn main() {
    let s = String::from("éèéè");

    dbg!(&s);
    dbg!(s.sub(0..1));
}
2 Likes

Just a nit, but in Rust-speak this is not an extension trait but simply a trait that is implemented for a foreign type. Extension traits are something like futures::future::FutureExt that extends the Future trait and has a blanket implementation for everything that implements Future.

2 Likes

Thanks for the correction, I have edited my post.

1 Like

Thanks you for you suggestion, but do you know if there is a native way to do the same thing ?

What is a native way? You can't just create an impl block on a foreign type, they are only allowed within the same crate the item is defined:

An implementing type must be defined within the same crate as the original type definition.

Sorry I meant why there is no such function that do similar thing right in the std implementation of String.

You mean getting a slice based on Unicode scalar value offsets? No. Very intentionally not. It's almost always not what you want.

3 Likes

Should we mention the unicode segmentation crate at this point, and perhaps point to some discussions about why strings are way more complex than I certainly ever realised before learning rust.

I believe you'd still need a trait to add a String method that slices on graphemes instead of chars.

2 Likes

This is somewhat subjective, but here's what I think. Because a String is stored as UTF-8, and a char is therefore from 1 to 4 bytes, there is no efficient way to index chars or ranges of chars. Indexing is assumed to be very fast in systems programming languages like Rust. So it would be misleading to provide indexing functions on chars, since these functions would have to internally iterate over chars. In the std lib, it is better to only provide char iteration, so the cost is not hidden.

3 Likes

Ok thanks for your replies.
I understand that make sense.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.