Discussion about code design - connecting/gluing RUST libraries by python and minimizing costs for type conversion

SUMMARY: I want to be able to operate on data that are provided from the python side with functions written in RUST or python likewise. I would like feedback about the tradeoffs between flexibility, ease of use and efficiency.
For operation, I propose using a data object that has methods to add new data and to modify existing data (python syntactic sugar welcome). Questions are: i) what is the best way to interact between the python and RUST software and ii) is there a way to modify "RUST objects" in python that are then used in RUST that minimizes type conversions?

MOTIVATION: I interfaced a RUST library to python (using pyo3). I chose python because it is easy to use and allows me to glue different libraries together. Many things are possible in python e.g. it's easy to read files and to prepare various input; one is flexible, can load data to the objects; one has a big set of tools to run on the data; the python language is widely used and using python makes it easier for people to use my RUST library. What's also appealing is that users could combine my library easily with third party libs in python e.g., for refinements, visualization.., if no RUST tool is available (yet) or if they want to try out things that sh/could be implemented. They would not depend on me for things that I cannot (readily) do or they simply want to try out one time in a fast and practicable way.

Now I am thinking about how to prepare the input for the interfaced RUST function. What would be great is if the data object could be modified both by i) methods implemented on the RUST struct and ii) python procedures.

POTENTIAL REALIZATION: I imagine to have one central data object (per problem) that holds various data (such as measurement data, model parameters, model output,..) and is passed to functions and modified or evaluated by those. I think the data object is best based on a sufficiently flexible RUST struct (maybe containing a vector of enums or a hashmap to access and change elements in the data object with a set of methods that can alter the data in the data object). The object should be suited for future extensions.

Some examples to make clearer what I imagine are:

  • Example 1: (crude) data object is holding an array of data. Some points are changed with a method implemented in RUST e.g. obj.change_arr(index=[3:7], [3., 4., 5., 6.]) or multiplication with scalar or subtraction of same sized array

  • Example 2: obj.coords[4].x = 0.25 or obj.coords['Ar_1'].y = 0.25 or obj.coords['car_27'].z = 0.25

  • Example 3: fast search of datapoints (e.g., neighboring points in case of coordinates or another characteristic); maybe try to store in a kd-tree (that could implement periodic boundary conditions)

  • Example 4: independent_interfaced_rust_function(python_data) or independent_interfaced_rust_function(data_object.data)

I do not want to constrain the user too much (someone might want to work with their own python object and prefer to just pass the suited data e.g., a field of a class or an array to the functions) but we need to keep in mind that the RUST functions cannot take just any data structure and would need conversion if something else than the suited type from our data object is fed to them (e.g., list instead of array, wrong format of tuple). So either I implement a check which data are coming to the RUST function in the RUST function and if they can be converted to the required types; or I give the user the responsibility to perform a transformation beforehand. (which is better?)

POTENTIAL PROBLEMS: I feel that conversion between python and RUST types should be kept to a minimum to have maximum amount of efficiency (conversions, restructuring and copying in general!).

  • P1: I have not yet fully understood how much of an issue type conversions would really be and if there are clever ways to circumvent them or bring them down to a minimum (e.g., if I change only a few elements); in the case that I run very many loops and change parts of the stored data in each cycle, I suspect that the time used for conversions piles up.
  • P2: If type conversion could become an issue, is it in theory possible at all to have the isolated object "living in python" that contains data in RUST types that is accepted by the interfaced RUST function without need of type conversions? Otherwise one would need to either i) provide all methods one wants to use on the data already implemented on that object in RUST or ii) if one wants to keep the elements "slim" and functionality separated, it might be best to send data to a RUST software in memory that interacts with the other RUST libraries (which can be addressed from python nevertheless, like a handler that transmits the data object to some server). (Btw: would a python interpreter written in RUST change something?)

I hope my description is easy to read. I feel that I am still unclear about the conventions and realization I should pick for my combined python/RUST project.
What strikes you? Where do I need to cut down? I would be happy to receive feedback, tips, discussion, .. to refine my thoughts. Thank you.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.