A trait for generic character sequence?


#1

Hi! This is mostly a hypothetical question, but what trait(s) should I use, if I want to consume any character sequence (str, &[char], xi-rope, etc)? A lot of string functions are implemented in terms of &str (regex crate operates on str, for example), and this looks like a severe limitation. Looks like Iterator<Item=Char> won’t be enough, because strings a sort-of random access and have a length, although not every index is valid, and the length may be measured in different units.


Generic string buffer read/write traits
#2

Segmented iteration doesn’t optimize well, so you’d ideally have something like a sequence of &str chunks or a sequence of &[T], so that the inner loops can operate on sequential data.


#3

It’s tough, because you have to define what “character sequence” is. For example, &str is not a “character sequence” like &[char] is… by “character sequence” do you mean “string”?


#4

Like I said, it’s a vague question, so I don’t know :slight_smile: The problem is that if I write, say, a super fast text editor in Rust, and use a rope as an internal data structure ( :slight_smile: ), I won’t be able to use excellent regex crate, because it expects a &str. Though I think I should be able to do that?

For example, in Java this problem is solved by the CharSequence interface, and for example, regexes in Java operate on that. Though it’s somewhat easier in Java, because strings are random access 16-bit character arrays with surrogate pairs.

A fun story: there is a fork of JFlex inside IntelliJ IDEA ( https://github.com/JetBrains/intellij-community/tree/master/tools/lexer ) because JFlex uses char[] instead of CharString. And the original JFlex issue ( https://github.com/jflex-de/jflex/issues/153 ) was reported by someone named briansmith :slight_smile: