Optimizing a function that recursively modifies vector of strings from a map structure

a bit of background info:

I'm working on a cross-editor snippet manager featuring modular snippets sets. the snippet manager will function similar to an LSP, in that it will be a process independent of the editor that communicates via a client-side mediator. In order for snippet sets to be modular, without trying to pass the contents of the snippet back and forth between client and server, the snippets will have to be parsed twice: once server side, and each snippet request client side. the function I'm currently working on is the server side parsing: which I'm not doing when the snippet is requested rather than loaded/deserialized. the reasoning being end users might only use a trivially small subset of the snippets loaded, so it makes since to defer work until it's necessary but do as much work as possible (to make future work lazier) at that point.

function explanation

here's the function I am working on

this function is basically checking whether a given snippet( here's the serialized data) needs to be rebuilt(true on first call), then parsing for embedded snippets, which need to be built, and tabstops, whose values need to be incremented by an offset before being returned.

I'm more than willing to spend a week on this function if I learn that the goal is feasible, given that this has the potential to be a noticeable bottleneck client side(it does all the work when a snippet is requested for the first time) but the more I work on it, and consider the complexity of the ideal scenario, the more I wonder if it's will actually work the way I'm hoping.

Goal

I'm trying to convert this:

[
    "@if", 
    "@elif", 
    "@else"
]

to this:

[
    "if ${1:expression}:",
    "\t${2:pass}",
    "elif ${3:expression}:",
    "\t${4:pass}",
    "else:",
    "\t${5:pass}",
]

in a single pass.
and currently trying to create these in the process:

{
    "if" : [
        "if ${1:expression}:",
        "\t${2:pass}",
    ],
    "elif": [
        "elif ${1:expression}:",
        "\t${2:pass}",
    ],
    "else": [
        "else:",
        "\t${1:pass}",
    ]
}

so here's the components I want feedback on:

parsing body without copying

right now, I'm working on a clone of the vector of strings that is the body. I would rather avoid this given that cloning comes with an associated memory and runtime cost. I'm hoping to parse the snippet body directly, but that currently isn't achievable given the embedded recursive call (at this line) means that self gets borrowed twice(tried that approach here

here's what I'm currently considering:

  1. removing the entry from the map, If I understand move semantics correctly this would be a transfer of ownership rather than a reallocation, the map structure shouldn't be resized which means that it will have the same capacity as before which should mean that reinsertion after modification should have the same cost as retrieval. of course I could be dead wrong, and if I am please correct my misunderstanding
  2. using some kind of (A)Rc<T> pointing to the snippet, this should allow me to parse the existing snippet body and reassign it's values without needing to interact with the map itself.

however depending on the strategy I use for handling the next problem

correct tabstops (and arguments) for subsnippets

so part of the problem with my current approach is the tabstops would be correct for top-level snippets but not sub snippets. In the last working state this was the value of the children snippets:

{
    "if" : [
        "if ${1:expression}:",
        "\t${2:pass}",
    ],
    "elif": [
        "elif ${3:expression}:",
        "\t${4:pass}",
    ],
    "else": [
        "else:",
        "\t${5:pass}",
    ]
}

there are three ways I can think of for handling this each with some major drawbacks for deeply nested calls

  1. make the return value a vector of vectors of strings, with len = depth+1 of the call. at the top level the return would be a vector of lenth 1 (only the top level snippet body), at call depth 1(in this example if, elif, or else) the vector would have two elements: the vector to containing the body for the parent snippet, and the vector containing the body of the current snippet.
    • if I'm not mistaken this means that the both required memory and the number of string operations necessary exponentially blows up, and while this might be acceptable for if/elif/else, deeply nested snippet structures could pose a problem.
    • this also means that snippet arguments need to be a copied with correct placeholders as well
    • i'm not sure if the vector should be resized after each call, or at the end of the call of top level snippet. the former would mean less memory is required during the lifetime of the function, but I think that would mean the vector would need to be resized at each branch of nested calls.
  2. having two different functions called (either depending on depth or at each step), with the second call being asynchronous. the first is the call to parse and rebuild the snippet, the second actually modifies the stored snippet.
    • I think this could lead to a deadlocked state if I use the second strategy to the previous problem (parsing body without copying)
    • this also could lead to issues if that threads work isn't done by the time the child snippet is requested if using the first strategy, and given that a major case for snippets made of snippets is conditional and branching structures, this could potentially be a problem. may still wind up at deadlock two threads remove or insert at once
    • my plan currently(liable to change) is to make most of the internal code serial, with access top level struct(sniper) being passed between threads, in other words the client side request would be asynchronous, the work performed to achieve the result would be serial. so this solution would mean rethinking a lot of the design of the project as a whole.
  3. building the current snippet by parsing it's children but deferring the actual building of children snippets until they are called
    • on one hand, given that nested snippets are likely to be used on code structures that are highly related and likely to be used together, this means more work is performed server side, but unless the snippets requested are a series of deeply nested structures, less chance of noticeable lag client side.
    • on the other, this cost seems minor compared to the cost of the previous two solutions if there isn't a way to get around the mentioned drawbacks.

using a different data structure altogether for snippet bodies, or adding fields to snippet struct to make handling easier

I don't think this is possible but I figured I would ask about it, do you think there a better way to represent snippets other than a vector of strings, as in something that would allow me to directly access tabstops rather than trying to parse the structure on initial request? or do you think there are fields I could add to the snippet struct to make this kind of manipulation more efficient without adding a ton of memory used per snippet?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.