Is possible do macros with function tracking?

Hi, I have seen some C++ apis which makes some curious things... like allow to call functions which do not always are supported, is not an issue but in some bindings and rust things is not ideal.

So the question here is that to handle that types of circumstances right, we would need to keep a track of which functions are called from a struct, a simple use case would be this:

struct Foo;

impl Foo {
        fn f1(&self) {
                todo!()
        }
        fn f2(&self) {
                todo!()
        }
        fn f3(&self) {
                todo!()
        }
        fn f4(&self) {
                todo!()
        }
}

fn main() {
        let foo = Foo;
        // We need to know here which functions will be used in this struct
        let f_list = foo.functions_use_in_the_future();
        assert_eq!(f_list, ["f1", "f2", "f3"]);
        foo.f1();
        a(&foo);
        b(&foo);
}

fn a(x: &Foo) {
        x.f2()
}

fn b(x: &Foo) {
        c(x);
}

fn c(x: &Foo) {
        x.f3();
}

One challenge here is the moment, we would need to know before each call, which functions will be called, in this case, we need to know right after the creation of foo which functions will be called later...

I don't know if rust Macros can do a trick with this, I have not much experience with them either to know if this could be done.

Thx!

That's what cargo features are for, used as:

impl Foo {
    #[cfg(feature = "f1")]
    fn f1(&self) { }
    #[cfg(feature = "f2")]
    fn f2(&self) { }
}

As for the list of all the functions available for a given struct -

- you most certainly can, though I wouldn't recommend it. For one, you'd only be able to create an f_list of functions_use_in_the_future() for a given impl block at a time, as in:

#[fn_list] 
impl Foo {
    fn f1(&self) { }
    fn f2(&self) { }
    fn f3(&self) { }
    fn f4(&self) { }
}

which would traverse through the AST and compile a list of ["f1", "f2", "f3", "f4"]. This one would easily become out-of-sync with the actual list of methods available to a Foo however, as anyone (including you) could easily declare additional impl Foo blocks elsewhere, making your functions_use_in_the_future() quite useless. Stick with #[cfg(feature = "...")], instead.

Hi, sadly that solution don not works fine, because in C++, you could have all the functions available, or only some specific ones just having a specific input like:

// case inner:
//    0: f1, f2 available
//    1: f4 available
//    2: f1, f2, f3, available
struct Foo {inner: usize};

inner can be anything that behid C++ uses to know which functions would works and which not, in this case we have the assumption we know that before hand, so only know inner is enough to us know which functions will be available, to then check them.

One foo can have different available functions, they are not available all of them or a subset in general, all depends on the code behind this function.

Glad to know this can be achieved with macros! but I don't get very well why it would becomes out-of-sync, can you explain more pls?

For simplicity, if a function is called from from a if statement or any other place, we can think of it being used, just being called once in any place.

Why would you declare it as an opaque struct then? From the few things you've mentioned so far, you're a few steps behind recreating your own enum. Why not make it an obvious:

enum Foo {
    Case0(F1F2),
    Case1(F4),
    Case2(F1F2F3),
}

where

struct F1F2(...);

impl F1F2 {
    fn f1() {}
    fn f2() {}
}

// +

struct F4(...);

impl F4 {
    fn f4() {}
}

// +

struct F1F2F3(...);

impl F1F2F3 {
    fn f1() {}
    fn f2() {}
    fn f3() {}
}

which you could then use as

fn main() {
    // compute somewhere
    let foo = Foo::Case1(F4);
    // ... much later on
    match foo {
         Foo::Case0(f) => f.f1(),
         Foo::Case1(f) => f.f4(),
         Foo::Case2(f) => f.f3(),
    }
}

?

Hi, sorry about make this so confusing, the C++ Apis are not always very clear nor have a rust equivalent.

I try follow a method like that, sadly I could not find a way to do it works fine due to this reasons:

The issue with this method is that you need to know at compile time the option you want to use, if we could know this we can use the impl way, a lot more simple.

Sadly, Foo depends on a user input, so we don't know at compile time which option will be chosen.

So at compile time we can know which functions will be used, but we don't know if the requested type by the user will be able to handle that options.

Also one big point is avoid wait to the calling function, some of them could be very expensive.

This put us back on "how to know which functions will be used right after the struct creation".

Which part of Foo depends on the user input, exactly?

How are you letting users "request" the "type" of the Foo - and, more importantly, why?

Still the most confusing part, in my mind. The struct allows to impl whatever methods you require it to. If "at compile time we can know which functions will be used" - why would you force your users to "request" a "sub-type" of it, which would only implement a fraction of them?

Increasingly curious at to what kind of API's in C++ those would be. Can you share a link?

I'm curious, what does it mean that this depends on user input? Does it mean that after the program is compiled, the user inputs different options during the program's interaction? Or does it mean that the program, as a library, is used by the user when writing their application, and they write different code to call the library functions?

If it's the former, as C++ is a compiled language, I find it hard to understand how C++ could achieve this with zero-cost abstraction.
If it's the latter, since this situation is still determinable at compile time, then theoretically Rust should also be able to implement it.

To avoid the XY problem, could you provide the original problem you encountered when binding C++ to Rust? I believe an experienced Rust programmer might be able to propose a better solution that doesn't require using functions_use_in_the_future.

Oks, lets go to the start, I have been playing with the GDAL Api, GDAL link internally to a lot of libraries, mainly to use vector data and satellite one.

The whole point of GDAL, in a summary would be two aspects:

  1. Be able to transform and handle between different formats
  2. Have a common Api to handle spatial information

This leads to the actual API designs, in the most complex scenario, a Dataset (any open spatial file), allows you to call all the API functions, here the trick, each format's file have specific capabilities.

In GDAL you write a spatial algorithm, and you pray to the out file chosen by the user allows to run each function you used in the algorithm, even if it means wait 2h running lines until you reach a function which is not supported by that format, obvs when that happens the app stops.

Obvs, you can use the satellite functions on vectors, and the Api will throw an error.

We can request to GDAL know which are each format capabilities, the hard part, how you check this capabilities are compatible to an arbitrary GDAL algorithm which we or other one write.

Which comes to the start, we have a struct, which its capabilities (functions) needs to depends on a user input, and we need to know beforehand if what the user choose is compatible to random algorithm which can use in a arbitrary way any function.

Part of this is about my learning how to design this things, but at the same time I think is a waste of time wait to run a algorithm which we could check what it needs to be executed.

Interesting. Had to ask for some clarifications from an LLM, but I think I'm getting the picture.

Correct me if I'm wrong:

  1. you have arbitrary GDAL/output formats, provided by the user - let's call them A/B/C;
  2. each of these formats, as it's being processed, relies on the same basic algorithm, the implementation of which is provided by you and/or your team - with methods X/Y/Z;
  3. different formats support different capabilities/functions - X for A, Y for B, Z for C;
  4. your implementation might only include capabilities X and Z, but not Y -
  5. making your algorithm suitable for formats A and C, but not B;
  6. which is precisely what you want to check for/against?

Continuing the analogy: one user might select a format C, which requires a capability Z (implemented by your algo). Another user might select a format B, which is not.

You want to be able to tell, from the get go, whether your implementation supports it?

Fairly trivial to accomplish with a basic HashSet<Capability>, as required by the given format, matched against your own implementation, I'd say. If you're up to have your:

struct Capability(&'static str);

being a transparent container for the name of a given function, sure - a macro would do. You could just as easily extract all the fn <name>() from your files using a basic regex search, though.

To be honest, I'm not a Rust expert, nor do I have a complete understanding of GDAL. So, if there are any glaring errors in my approach below, please forgive me.

Based on your description of your goal, it seems that you're not looking for the existence or non-existence of an API. Instead, your aim is to make a quick, high-performance decision about which functions to call without actually running the code. Compared to pre-determining the functions to be used, I think you could adopt a "shadow type" that implements the same functions, and then use this shadow type for summarizing and checking at the end.

PS: Although, after writing it, I realized that with macros and const, it might also be possible to implement a functionality similar to functions_use_in_the_future.

use std::collections::HashSet;
pub trait Methods {
    fn f1(&mut self);
    fn f2(&mut self);
    fn f3(&mut self);
}
struct Foo;

struct ShadowFoo {
    used_methods:HashSet<&'static str>,
}
impl ShadowFoo {
    pub fn new()->Self{
        Self{
            used_methods:HashSet::new()
        }
    }
    pub fn check(&self){
        // check according to used_method
        //...
    }
}
impl Methods for ShadowFoo {
    fn f1(&mut self){
        self.used_methods.insert("f1");
    }
    fn f2(&mut self){
        self.used_methods.insert("f2");
    }
    fn f3(&mut self){
        self.used_methods.insert("f2");
    }
}

impl Methods for Foo {
    fn f1(&mut self){
        todo!();
    }
    fn f2(&mut self){
        todo!()
    }
    fn f3(&mut self){
        todo!()
    }
}

fn main(){
    // shadowFoo just to check
    {
        let mut foo = ShadowFoo::new();
        foo.f1();
        a(&mut foo);
        b(&mut foo);
        foo.check();
    }
    // real Foo to compute
    let mut foo = Foo;
    foo.f1();
    a(&mut foo);
    b(&mut foo);
}

fn a<T:Methods>(x: &mut T) {
        x.f2()
}

fn b<T:Methods>(x: &mut T) {
        c(x);
}

fn c<T:Methods>(x: &mut T) {
        x.f3();
}

Thanks

The boilerplate code here, since the only differences are at the beginning and the end, can be easily handled using macros to automatically add the code for the shadowFoo part.

The implementation of Methods for shadowFoo is also simple and orderly, and can be simplified using macros as well.

Hi, very close, GDAL as an API is not designed to write something in a specific format, so when you write an algorithm with it, from the available functions, X/Y/Z, you can the ones you need.

This also means you can have two algorithms, one using Z/Y and other with Y/Z.

GDAL allow use X/Y/Z functions, and why write algorithms with them, like K with X/Y and J with Y/Z.

The user can chose a format, for example A, which allows X/Y/Z or B: Y/Z.

GDAL Functions:
X, Y, Z

Formats
A: X/Y/Z
B: Y/Z

Algorithms:

K: X/Y
J: Y/Z

Here, two important questions that we want to answer:

Which GDAL functions uses an algorithm without need to run it? So we don't need to waste time with expensive ones and the Format does not allow a function.

Which formats the user can use to run a Algorithm?

GDAL's concept is based on unification, one single API for everything, this also implies that usually you will not have two ways of do the same thing, we can't replace Function X with Y or a mix.

I don't get very well how to extract with regex properly, a lot of ppl will.. have a lot of files with a lot of calls, where can be collisions with function names, and functions that can use more than one variable, so a single function can request to a variable use X, and to other one Y/Z.

Things are getting clearer. Conceptually, then:

  • GDAL's API (G.A.): defines (all) functions/capabilities - from A to Z.
  • Algorithm: implements a chain of functions - A then B then X then C.
  • Format: supports a sub-set of API's functions - A and X, but not B or C.

The G.A. is defined for you. The format is chosen by the user. The algorithm is yours.

Correct?

If so, start with the format itself.

  1. how many different formats are there?
  2. what are the exact functions/capabilities they support?
  3. which of the different formats do you want your crate/library/project to handle?

When given a format by the user, you'll only have to check if your algorithm knows how to deal with that particular format - and nothing else. No macros, no functions_use_in_the_future(). If it's the user who defines it, and by doing so restricts the set of all available functions in G.A. to the sub-set of the ones supported by the format itself, use that as your foundation. G.A. determines the format, determines your implementation. Not the other way around.

GDAL can handle arbitrary formats, there is a lot, more than 20, you can register them in runtime.

The capabilities are also in runtime, maybe a new driver which has been added, if is a default one, you can check GDAL metadata to get a list of capabilities, I should manually then the work to know which capabilities implies have a set of functions.

GDAL is the manager, in that API we do not choose which formats we want to use, is the user who choose the format, so we end in a situation where you don't know if the current algorithm can be run with the runtime specified format.

We can't determine the implementation from the format, because the format works behind scene in GDAL, the format is arbitrary and runtime chose by the user, or how do you think the implementation can be handled in this circumstance?

A very important key part in GDAL, is that the user usually do not need to think a lot on the formats, while they choose one that is compatible they can have all done.

Whether there's a lot of them or not isn't relevant. Are they standardized in any way? If so, just extract the functions/capabilities they support into arbitrary traits of your own choice, as in:

Summary
trait FormatA {
    fn map_vectors_to_rasters(&self);
    fn geolocate_coordinates(&self);
    fn coordinate_geolocations(&self);
}

Then implement your algorithm:

Summary
struct AlgorithmX {
    inner: TODO // whatever data you might need to share in between your methods
}

impl AlgorithmX {
    // your custom implementation
    fn your_method_1(&self) {}
    fn your_method_2(&self) {}
    fn your_method_3(&self) {}
    fn your_method_4(&self) {}
    // then do what you need to:
    fn execute(&self) {
        self.your_method_3();
        self.your_method_1();
        self.your_method_4();
        self.your_method_2();
    }
}

And map your implementation into the trait of a format it supports:

Summary
impl FormatA for AlgorithmX {
    #[inline] // as necessary
    fn map_vectors_to_rasters(&self) {
        self.your_method_1();
    }
    fn geolocate_coordinates(&self) {
        self.your_method_2();
        self.your_method_4();
    }
    fn coordinate_geolocations(&self) {
        self.your_method_3();
        self.your_method_1();
    }
}

Sure, it's tedious - but you're saving yourself from quite a lot of headache down the line.

How are these capabilities loaded, and what does the metadata contain? If these bindings are to be trusted, you only have a handful of string slice names to check against. Is that what you need?

Will your implementation depend on these capabilities, or do the capabilities themselves depend on your algorithm and its implementation? What's the relationship between them?

You do not choose the formats you want to use/support - yet you need to determine whether those formats can be used/supported by your algorithm? The first thing doesn't quite mix with the second. If your user can define an arbitrary format with an arbitrary list of capabilities you have no way of knowing about before asking your algorithm to just "do its thing" - something is way off.

It sounds like we're back to square one. What, exactly, does the format allow you to inspect? If the GDAL's API is set, and the list of all the possible/available functions/capabilities is limited to a number N and/or a set of method names {S} - how can the format chosen be "arbitrary"?

Use of precise language makes enormous difference when describing any technical problem, but English is not known for precision. We seem to be guessing too much from OP's wording, and reading too little of their code. YX problem if you would.

So let's work from what we already know. I'm going to assume we can modify the Algorithms for the following reasons:

  1. It's been made abundantly clear that they have no control over the Formats.
  2. The problem is impossible to solve, if they weren't controlling the Formats or the Algorithms.
  3. Therefore, to have any chance of helping, we must assume some control over the Algorithms.

Now, the problem is:
Given a function(but not the values of arguments), query whether it might transitively call another function.

I believe there are no automated solutions to do this. A macro around the outermost function cannot inspect its dependencies, so such a macro would have to decorate the entire codebase, which is rather inconvenient.

A solution on the language level would amount to an effect system. That would be a very difficult solution to implement. Also, you'd still have to annotate your functions.

For any realistic number of Algorithms, it'll be most practical to manually go through your codebase to hard-code functions_use_in_the_future. It might seem tedious to do this by hand―it is an inefficient process―but know this is a hard problem to automate, so much that after many years of language design experts' works, no one has managed to do better.

Hi, sorry took me some time to reply.

@00100011 Everything is standarized, but only the set of functions of each capability, but standarized do not implies each function will be available, each set of functions will only be available if the driver capability specify it.

If we know at compile time which driver (format) will be used, we can use something like that, make traits for each capability, and to each driver implement the capability, but the driver name is mostly used at runtime.

I did not thought in use impl of a driver to a algorithm, worth to be checked, but I think would be more useful in cases where we know the driver.

There is a global capabilities we can use, which are not always available.
A driver which allow some ones.
A Algorithm will need other ones.

Know if the algorithm and the driver al compatibles beforehand help us to not run the algorithm and panic before it.
Know which drivers (formats) have the needed capabilities of the algorithm is nice to say to the user which formats can use.

The supported Drivers depends on two things, the compiled ones in GDAL and the custom ones that are runtime loaded by the user.

¿How can a chosen driver be arbitrary? In Rust would be like make an enum with all capabilities for each format, all the options have the same functions, and when you call one you call all the inner ones let a = Dirver::FormatA; a.foo() this would call foo in all the options, and fill with todo!() all the functions that are not supported in that driver (format), so will panic only when you call it.

@doublequartz Your assumptions are very reasonable.

I didn't know with macros we can't inspect dependencies, right that is key, as you say we can't have one macro to inspect everything.

Mmm, based in you ideas, maybe a macro which inspect only its function, and make a new type which also store the functions that will be used, sounds annoying but this could help to automate even on dependencies, in this case will can't use functions like we usually do, but could help to automate at some extent.

I have never worked with this type of macros, last week I went inside the procedural ones, this is new to me.

How about the following approach?

  1. make a trait for each capability:
    trait CapA {
        fn do_a(&mut self);
    }
    
    trait CapB {
        fn do_b(&mut self, value: i32);
    }
    
    // ... more capabilities
    
  2. make an enum to identify them and add one more trait to detect support (or you can use strings instead):
    enum Cap { CapA, CapB, /* ... */ }
    
    trait HasCap {
        fn has_cap(&self, name: Cap) -> bool;
    }
    
  3. join all of them to a single trait
    trait Format: CapA + CapB /* + ... */ + HasCap {}
    impl<F> Format for F where F: CapA + CapB /* + ... */ + HasCap {}
    
  4. each format will implemet all of the above traits, e.g.
    struct SomeFormat;
    
    // we support CapA
    impl CapA for SomeFormat { fn do_a(&mut self) {} }
    
    // we do not support CapB
    impl CapB for SomeFormat { fn do_b(&mut self, _: i32) { panic!() } }
    
    impl HasCap for SomeFormat {
        fn has_cap(&self, name: Cap) -> bool {
            // this must be consistent with the above impls
            matches!(name, Cap::CapA)
        }
    }
    
  5. each function working with format(s) will be generic and declare what capabilities it requires (we will use only dyn Format, but do not tell the compiler now)
    fn do_something<F: ?Sized + CapA>(format: &mut F){
        format.do_a();
    }
    
  6. actual algorithms would take a dyn Format and look like this:
    fn some_algorithm(format: &mut dyn Format, value: i32) -> Option<()> {
        // this will be called only on `dyn Format` but we do not tell it to compiler so it can check what we are using
        //                   vvvv --- THIS ----------------------------------------+
        fn inner<F: ?Sized + CapB>(format: &mut F, value: i32) {                // |
            // actual implementation of the algorithm (still statically checked)   |
            format.do_b(2);                                                     // |
        }                                                                       // |
        // now the runtime checks                                                  |
        //                      vvvv --- and THIS must be consistent --------------+
        if !format.has_cap(Cap::CapB) { return None; }
        Some(inner(format, value))
    }
    
    the bounds on the inner and the calls to format.has_cap(...) need to be consistent for everything to work
  7. the functions of the above shape can now be generated by a macro. This is what I quickly made just to test the idea, you can go for something nicer (e.g. attr macro)
    macro_rules! algorithm {
        ($fmt:ident : $F:ident : $($C:ident),* : $name:ident($($arg:ident : $ty:ty),*){ $($body:tt)* }) => {
            fn $name($fmt: &mut dyn Format $(,$arg : $ty),*) -> Option<()> {
                fn inner<$F: ?Sized $(+$C)*>($fmt: &mut F $(,$arg : $ty)*) {
                    $($body)*
                }
                $(if !$fmt.has_cap(Cap::$C) { return None; })*
                Some(inner($fmt $(,$arg)*))
            }
        }
    }
    
    ... it can be used like this:
    algorithm!{format:F:CapB: some_algorithm(value: i32) {
        format.do_b(2);
        // do_something(format);
    }}
    
  8. now you can do:
    fn main() {
        let mut format: Box<dyn Format> = Box::new(SomeFormat);
        dbg!(some_algorithm(&mut *format, 7));
    }
    

This approach does not generate list of capabilities required/used by an algorithm, you have to declare them explicitly wherever you use a format. But it will:

  • statically check that you are not using more than you declare
  • dynamically check (before the algorithm is run) that format supports what you declared (because both the "final static check" and the runtime check are generated by macro from the same input list)

So in the end the compiler will allow you to run any algorithm on any format and make sure that all capabilities the algorithm uses are runtime checked before it is actually run.