How to add text to string and update a grapheme-based cursor position?

Say I have a function like this as part of a text editor. It inserts text at a cursor position and returns the new cursor position. For simple cases, it seems easy, but …

Imagine I have the text “cafe” and then I type a unicode combining character \u{301}, which will create “café”. It seems I should also normalize my string storage, so it will replace the “e + combining” with the single code point version of “é”. So in this case the return value of insert_text("cafe", 4, "\u{301}") should be 4. It doesn’t move.

But I’m not sure how the function could know this. Am I missing something simple, or am I going to need to look at the graphemes before and after the current cursor position, and then deduce from that whether something “merged”?

use unicode_normalization::UnicodeNormalization;
use unicode_segmentation::UnicodeSegmentation;

/// Return the new cursor position
/// span.content is a String
/// offset is a visual offset, a grapheme cluster position
pub fn insert_text(span: &mut Span, offset: usize, text: &str) -> usize {
    let byte_offset = grapheme_to_byte_offset(content, offset);
    span.content.insert_str(byte_offset, &text);
    span.content = span.content.nfc().to_string();
    ???
}

pub fn grapheme_to_byte_offset(s: &str, grapheme_offset: usize) -> usize {
    s.grapheme_indices(true)
        .nth(grapheme_offset)
        .map(|(i, _)| i)
        .unwrap_or(s.len())
}

pub fn grapheme_count(s: &str) -> usize {
    s.graphemes(true).count()
}

Before I'd worry about that, I'd be dubious about normalizing strings to begin with. If the user types a particular sequence in, there are some cases where that's important.

1 Like

Note that if you're doing a text editor that you often want a more specialized storage for things. A Rope (data structure) - Wikipedia is the default expectation.

You can also use the cursor like a "finger", because you want to modify things not at the end, where String is efficient, but next to the cursor in the middle of the file in such a way that it's still highly efficient even if it's a giant file with lots of content before and after.

2 Likes

Yes, I was mistaken there. I might need to normalize when searching for a string, but for the basic editing, no. Thanks.

Sometimes I appear to be an idiot, but I tell myself it’s because I’ve been working too hard and lacking sleep… :slight_smile:

Cool data structure. This editor is actually for rich/formatted text - so I have block and spans and I think that will divide things into enough “chunks” that I can probably use plain strings within each span, for now anyway.