this is kind of two questions rolled into one: I'm working on trying to parse a vector of strings, with multiple replacement operations that need to be done, and create a new vector of strings while minimizing repeated work.
essentially I'm working on a superset of the snippet syntax defined by the lsp specification that would support modular snippets, ie snippets that could be composed of snippets, which I believe would be easier to handle separately from the starting character for placeholder values and variables($
). Here's an example of a few snippets below:
[if]
name = "if"
body = ["if ${1:expression}:", "\t${2:pass}"]
[else]
name = "else"
body = ["else:", "\t$1"]
["if/else"]
name = "if/else"
body = ["@if", "@{else}"]
my first question(and probably a shining example of premature optimization) is how do I efficiently structure the replacement code for the syntax as is?
I'm not exactly trying to do the operation in place, as the resulting vector of strings is going to be longer, so I figured it would be more efficient to just create a new vector and either insert strings or append the contents of other vectors than to try to insert the contents of a new vector into the middle of an existing vector.
I'm also pretty sure this is an impossible task for a single regex, but it might be optimal to use multiple precompiled regexes depending on the current match group. I think I could use match arms on the capture group to do things like count placeholders and pass an offset to functions fetching the referenced snippets. the one thing I'm considering but not sure if it's viable is passing slices of the string to functions, based on indices of match starts and match ends(treating each submatch as independent), because I don't know if it's possible to do things like get the opening and closing placeholder without backreferences or lookaround( something like ${1:example_of_embedded_placeholder${2}}
would capture the first }
if operating on the entire string), though my initial idea to just grab the leftmost }
wouldn't work either(ex: ${1:somebody_who_hates_coders}={$2:None}
)
my second question is since there isn't really a hard requirement for the syntax being a true superset, if I wanted to make changes to the syntax to make it faster to parse, while still making it easy to write, what would you guys suggest?
EDIT
since this may not be clear, I'm having to do these operations:
- count and replace placeholders with placeholders plus offset
- recursively rebuild the children snippets with offsets,
- potentially make specific placeholders of the children snippets match supplied placeholders from the toplevel snippet. ex
@{else(1&)}
- potentially replace specific placeholder of the children removing them as a snippet. ex
@{else(1!:break)}
- variable substitutions(though not worried about that now)
- match things like
{}()
etc ONLY if inside a brackets after a character signifying that the snippet parser needs to do work
6.finally spit out a new vector of strings composed of the summation of all the previous work
the major subproblems seems to be matching brackets with the appropriate closing bracket, as well as finding ways of keeping track of indices, because I'm thinking the best solution would be if I could just split the string(s) into the different non-overlapping substrings, then passing the slices to different regex matchers/functions, then recombing the result. since the functions operate on slices I'm thinking that should still be inplace until the step where the results have to be concatenated.