Strings, ropes, chunks and other text editing related stuff

Currently i'm writing tui text editor. I've ran into problem that i don't understand anything in strings, how to work with them in context of text editing etc etc.

I'm using this repo as reference:led

I still don't understand some concepts like Mark, MarkSet in marks.rs and some other things, but they seem pretty doable for me. Just need a bit more understanding where they are used.

I need some help in formatter.rs.

The concept of chunks, graphemes, ropes, gap buffers and working with text in context of editing overall is new for me.

I want to know more about it but i'm having trouble phrasing the question, so i don't even know how to ask google what i'm looking for.
Could you please help me find some resources on those topics?

languages have some form of "native" string types, for example, in rust, the type str (and String) must be valid unicode string encoded in utf8 format. in C, the "string" type is defined by convention: a NUL terminated array of chars.

however, when it comes to an editor, it's not the best choice to use the "native" string representation from the language, because they are not efficient, for example, to implement the most common edit operations, insert and delete, you must shift the entire contents after the cursor, each time a character is inserted or deleted.

basically, there are two main schools of approaches to solve the editing efficiency problem:

  • a dumb, simple array with a gap (or multiple gaps) in it, often called gap buffers, essentially just a non continous variant of native strings, and they are widely used in old school editors like emacs;

  • or a fancy, non-linear, usually tree-shaped data structure, such as many implementations of rope [1], functional persistent vectors, etc, popularized by "modern" editors like xi, which is (was?) a research project after all.

you can just search terms like "gap buffer", "data structure rope", or "string representation". if you are really into this, you may find ewig interesting, you can find the link a cppcon talk about its data structures and the implementation in the readme. I also recommend you read the very extended "rope science" series by Raph Levien:


side note:

when you search gap buffers, you'll find many discussions/debates about gap buffers vs rope-like data structures. many claim gap buffers are not suitable for "modern" editor usage patterns like multiple cursors, but personally, I like the simplicity of a gap buffer.

a dumb, simple data structure can beat a fancy, complex data structure in many cases, just by the virtue of being simple and not having unnecessary overhead. one classic example is Casey's refterm experiment: it's not about text editor, but still interesting to study:


  1. note the term rope is used very loosely here. I don't know who first coined the term, but the precise definition of "rope the data structure" can vary if you ask different people. when you search "rope data structure", you'll likely see the wikipedia entry, just keep in mind, when someone says "rope", he/she might refer to something (similar but) slightly different than others â†Šī¸Ž

2 Likes

Oh that's really helpful. Thank you for your answer