Abstract data type StringADT


#1

An abstract data type StringADT is to be stated that abstracts over String and Vec<char>. The problem arises how to return a borrow to a local variable. This will be solved by moving the value into a wrapper RefStringADT. I will abuse Deref due to ergonomic reasons.

I would like to know

  1. whether there is a better solution,
  2. whether RefStringADT is zero cost in case of string_string.

The implementation:

pub mod string_string {
    use std::ops::Deref;
    pub struct RefStringADT<'a> {
        data: &'a str
    }
    impl<'a> Deref for RefStringADT<'a> {
        type Target = str;
        fn deref(&self) -> &str {self.data}
    }
    pub struct StringADT {
        data: String
    }
    impl StringADT {
        pub fn to_str(&self) -> RefStringADT {
            return RefStringADT{data: &self.data};
        }
    }
    impl<'a> From<&'a str> for StringADT {
        fn from(s: &str) -> Self {Self{data: String::from(s)}}
    }
}

pub mod vec_char_string {
    use std::ops::Deref;
    pub struct RefStringADT {
        data: String
    }
    impl Deref for RefStringADT {
        type Target = str;
        fn deref(&self) -> &str {&self.data}
    }
    pub struct StringADT {
        data: Vec<char>
    }
    impl StringADT {
        pub fn to_str(&self) -> RefStringADT {
            RefStringADT{data: self.data.iter().collect()}
        }
    }
    impl<'a> From<&'a str> for StringADT {
        fn from(s: &str) -> Self {Self{data: s.chars().collect()}}
    }
}

use string_string::StringADT;
// use vec_char_string::StringADT;

fn main() {
    let s = StringADT::from("Text");
    let p: &str = &s.to_str();
    println!("{}",p);
}

#2

What are your goals? The standard library String type is implemented as a Vec<char> already. Look at its implementation of as_bytes


#3

Notably, it’s implemented as a Vec<u8>, not Vec<char>. Vec<char> has the advantage of constant-time indexing for unicode code points, but the disadvantage of taking up more space and being incompatible with String and &str.

The best thing I can think of is to have a type like

enum StringADT {
    String(String),
    CharVec(Vec<char>),
}
enum RefStringADT<'a> {
    String(&'a str),
    CharVec(&'a [char]),
}

Or having a trait which defines common operations, like:

trait StringADT: Display + IntoIterator<Item=char> {
    type Iter: Iterator<Item=char>;
    fn iter(&self) -> Iter;
    // any other common operations you want
}
impl StringADT for String { ... }
impl StringADT for Vec<char> { ... }

It would mean you wouldn’t be able to implement Deref<Target=str>, but since converting Vec<char> to String is as expensive as creating a new String, and done every time you use to_str, it seems to me like it would kind of defeat any reason for having a separate Vec<char> StringADT in the first place?


These two types, String and Vec<char> are fairly different. If you want to abstract over both of them, I would suggest thinking critically about what kinds of things you want to be able to do with both. For instance, indexing with string[loc] and char_vec[loc] will return different results, since one is a byte index and the other is a unicode code point index.

Yes, it is.


#4

Ah, right. My mistake.


#5

Please describe the problem you’re trying to solve first.


#6

The desire is indeed fast indexing, expecting normalized unicode and ignoring further combining characters. Let’s say you have a large interpreter which uses a string data type, but you don’t exactly know which implementation to choose. Changing the representation at a later point of time would mean to change a large code base. I thought, rather than finding the most ingenious implementation one can come up with, it is even better to not implement it at all, i.e. to provide an ADT that abstracts over many possible implementations.

Now, sometimes a method returns a borrow. But this means that some information about the internal structure of that type is leaked. This leads to the consequence that the return value must also be of an abstract data type.

So my desiere is for strictness, zero cost and some ergonomics. I was a little inaccurate, not enforcing the exact equality of both interfaces, including the lifetime signatures. Does Rust allow to state the interface separatly from the implementation or would this be a weak spot? Using traits, I was able to purify my example:

trait RefStringInterface<'a>: std::ops::Deref<Target=str> {}

trait StringInterface<'a,'s>: From<&'s str> {
    type RefString: RefStringInterface<'a>;
    fn to_str(&'a self) -> Self::RefString;
}

pub mod string_string {
    use std::ops::Deref;
    use super::{StringInterface,RefStringInterface};

    pub struct RefStringADT<'a> {
        data: &'a str
    }
    impl<'a> Deref for RefStringADT<'a> {
        type Target = str;
        fn deref(&self) -> &str {self.data}
    }
    impl<'a> RefStringInterface<'a> for RefStringADT<'a> {}

    pub struct StringADT {
        data: String
    }
    impl<'a,'s> StringInterface<'a,'s> for StringADT {
        type RefString = RefStringADT<'a>;
        fn to_str(&'a self) -> Self::RefString {
            return RefStringADT{data: &self.data};
        }
    }
    impl<'s> From<&'s str> for StringADT {
        fn from(s: &str) -> Self {Self{data: String::from(s)}}
    }
}

pub mod vec_char_string {
    use std::ops::Deref;
    use super::{StringInterface,RefStringInterface};

    pub struct RefStringADT {
        data: String
    }
    impl Deref for RefStringADT {
        type Target = str;
        fn deref(&self) -> &str {&self.data}
    }
    impl<'a> RefStringInterface<'a> for RefStringADT {}

    pub struct StringADT {
        data: Vec<char>
    }
    impl<'a,'s> StringInterface<'a,'s> for StringADT {
        type RefString = RefStringADT;
        fn to_str(&self) -> Self::RefString {
            RefStringADT{data: self.data.iter().collect()}
        }
    }
    impl<'s> From<&'s str> for StringADT {
        fn from(s: &str) -> Self {Self{data: s.chars().collect()}}
    }
}

use string_string::StringADT;
// use vec_char_string::StringADT;

fn main() {
    let s = StringADT::from("Text");
    let p: &str = &s.to_str();
    println!("{}",p);
}

#7

There’s no way to get &str from &Vec<char>, which makes it pretty useless.