Hi, I have met an algorithm issue about how to calculate the difference between two text content, I am trying to implement a text editor.
Please let me explain it in detail.
I am using the ropey crate as the data structure to store the text content of a text file in my project. In short, ropey provide a set of APIs let you directly updates the text content by lines and every chars on a line, with a great performance. The line index and char index on each line start from 0. Here we have:
struct Rope { ... }
And then I am trying to implement the undo, redo features. I want to create a structure Edit, it saves each editing operation from the user: inserting, deleting, changing.
struct Insert {
line: usize, // Insert at this line
char: usize, // Insert at this char of the line
payload: String, // Inserted payload
}
struct Delete {
line: usize, // Delete at this line
char: usize, // Delete at this char of the line
count: usize, // Deleted chars from the position
}
struct Change {
line: usize, // Change at this line. Note: A change only changes 1 line at a time.
start_char: usize, // Change start from this char of the line.
end_char: usize, // Change end at this char of the line.
payload: String // New payload
}
struct MultiLineChange {
lines: BTreeMap<usize, Change>, // Multiple line change, maps from line to the change.
}
enum Edit {
Insert(Insert),
Delete(Delete),
Change(Change),
MultiLineChange(MultiLineChange),
}
struct EditHistory {
edits: Vec<Edit>
}
So in my thinking, the EditHistory is a list of Edit (i.e. Vec<Edit>), the Rope is always the latest text content. So we can recover the text content before last N edits by applying the last N Edit operation reversely. This operation is a O(N) time complexity operation, we need to do N reverse editing on the current Rope.
Finally, here are my questions:
- Is there any algorithm, that can merge 2 or more continuously edit operations (
&[Edit]) into 1 edit operation (Edit)? This can help me reduce some duplicated/repeated operations such as first add a char, then delete it. - With 1 or more edit operations, we actually have two ropes
Rope-latest(latest Rope after some edit operations) andRope-snapshot(snapshotted/origin Rope before these edit operations). How can I map every line and every char per line fromRope-latesttoRope-snapshot, i.e. I want to query the char(line,char)ofRope-snapshotbyRope-latestand these edit operations, and also query in the reverse direction. - Now we have two rope snapshots
Rope-AandRope-B, theBis updated fromAafter some editing operations. Is there any algorithm that can calculate the difference between them? The difference result should be represented by a singleEditstructure above.It seems this question can vary based on different requirements:
- Line-iteration compare: Go through all the lines on
AandB, for the same line, just string-compare the two lines inAandB(I found some algorithms here: wu-diff-rs, lcs-diff-rs, diff.rs). - Some more advanced algorithms to minimize the edit payload just like
git diff?
- Line-iteration compare: Go through all the lines on