Choosing the right syntax for small syntax extensions?

I want to implement a procedural function-like macro where inside the user will write code that is a super set of regular rust. Basically 99% regular rust, but with the addition of support for a new field access operator. My question is what kind of syntax is prudent to accept versus not?

For example maybe I want foo->bar to be valid and for it to compile to if !foo.is_null() { unsafe { (*foo).bar } } else { panic!("null!") }.

I see a few potential problems:

  • Does -> tokenize? Since my understanding is that everything in a procedural macro invocation still needs to.
  • Can I be sure that there is no existing language context where -> is used?
  • Can I be sure that Rust won't add it as valid syntax in the future?

Basically wondering if there are any idioms or patterns around crafting these sorts of syntax extensions, where you want to allow all valid rust but just want to add something on top. The closest thing I've seen is the use of @ for dispatching from one macro pattern to another, which seems to be based on the assumption that @ will never be given an important use in the language.

I hope that was just for the sake of an example, because that code is unsound.

Yes, it's a valid punctuation token.

It's already used for specifying return types of function declarations, closure expressions, function pointer types and closure traits.

Generally editions can make breaking changes to the syntax. Moreover, given that -> is not allowed after expressions inside macro_rules, I don't think this would be considered a breaking change.

It was just for the sake of example but what is inherently unsound about it? Doesn't the unsoundness entirely depend on whether the pointer is valid, rather than anything in that snippet?

The pointer could be valid when performing the null check but invalid when actually dereferencing. I've run into that exact situation before in unsafe Rust.

I don't see how that can be the case unless we're also assuming the pointed to object/memory is known to another thread? Again it seems like it depends on whether the user knows the pointer is still good.

Take, for example:

let foo = 42 as usize as *const String;

foo would pass the null test but invoke UB upon dereferencing it.

Well yeah, a race condition generally implies the existence of threads or some other form of concurrency. But that in and of itself is enough to mean that it can't be done in a guaranteed manner. Again, this isn't theoretical — I have issued a CVE for a situation that checked to see if the pointer was null and then used it. Between those two operations, the pointer was invalidated such that I got a segfault.

So... yes the soundness entirely depends on context and whether you know the pointer is valid as long as it's non-null and that won't change in-between when you do the check and use it due to actions on a separate thread. That's always true for pointers, which is why blanket describing it as unsound struck me as weird. To be clear I never meant to imply checking for null is always enough to know a dereference is safe. That doesn't prevent it being an extremely common pattern in contexts where it is known to be though.

I think this is a digression from the original question which is: is there a future proof way in rust to write proc macros that let the user write blocks where a small syntax extension to rust is allowed? The answer I'm inferring is no, there is no approved way to superset the syntax directly to add things like custom operators. You can have a proc macro that takes a block and interprets some completely different languages inside, or you can have a proc macro that takes a block of valid rust and copies it or transforms it, but you can't safely blend. Or rather, an actual macro invocation surrounding your syntax is the only official way to blend.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.