I am in the process of improving my rowan syntax tree library, and there's a design question that has been bugging me for weeks, for which I don't know a good answer. So, I'd like to dump my thoughts here in case someone comes up with a good way to solve my problem
Simplifying, the syntax tree looks like this:
struct SyntaxKind(u16);
struct Node {
kind: SyntaxKind,
len: usize,
children: Vec<NodeOrToken>
}
struct Token {
kind: SyntaxKind,
text: String,
}
enum NodeOrToken { Node(Node), Token(Token) }
in the real implementation, an elaborate memory management scheme is used instead, but that is not relevant for the present API question.
The interesting bit is the kind
. The idea is that each language defines it's own set of kinds, so, for example, Rust would have const IF_KEYWORD = SyntaxKind(10)
and TOML would have const L_BRACKET = SyntaxKind(10)
.
However, when I print a Token
of a specific language, I want to see SyntaxToken("if keyword")
and not SyntaxToken(SyntaxKind(10))
.
The easiest way to do that would be to make all types generic over K: Copy + Debug
syntax kind. I don't like this solution for the following reasons though:
- as there's a family of types, threading
K
everywhere is a pain - the implementation doesn't actually use
K
anywhere, so making it generic seems unnecessary - the implementation is tricky (unsafe), so not woring about generics helps quite a bit
- similar to the previous point, each language will get it's own monomorhised copy of a tree, which is wasteful
- because trees for different languages are fundamentally of different types now, you can't, for example, put TOML and Rust trees into the same hash-map, which might be useful.
So, this is the fmt::Debug
problem. There's another one: syntax tree is a pretty foundational, public datatype. For this reason, I'd love to be able to write inherent and trait impls for syntax trees, even if I am using rowan
as a library.
Because of this two issues, I want to find a way to conveniently wrap a family of types in newtypes.
Specifically, I expect the clients like Rust-analyzer to do something like
pub struct RustNode(rowan::Node);
impl fmt::Debug for RustNode { ... }
pub struct RustToken(rowan::Token);
What would be the most convenient way to do that?
The API for SyntaxNode
is pretty big, and complex (it returns iterators). Just wrapping each and every function in RustNode
seems like a lot of busywork, which needs to be repeated by every language.
I can imagine this can be helped a bit with roughly the following trait setup:
trat TreeApi {
type Node;
type Token;
fn node_from_rowan(node: rowan::Node) -> Self::Node;
fn node_to_rowan(&node: Self::Node) -> &rowan::Node;
// a ton of other wrappers-unwrappers
}
trait NodeApi {
type TreeApi = TreeApi<Node=Self>;
fn parent(&self) -> Option<Self> {
Self::TreeApi::node_to_rowan(self).parent().map(Self::TreeApi::node_from_rowan)
}
...
}
but this seems very complicated, especially for iterator-returning methods.
I can also write a macro that generates the stupid boilerplate, but I'd love to avoid macro-based solutions: a macro would make this completely impenetrable for people new to the code base.
Here's how API that I would like to wrap looks like: rowan/syntax_node.rs at e7a34eafdc0ccc1a6938cf74d07d88a69eae27cb Β· rust-analyzer/rowan Β· GitHub