Apache Arrow and newtype idiom


Can Apache Arrow play nicely with the newtype pattern?


For a while now, I've had half an eye on Apache Arrow and related technologies as a potential replacement for sequence-of-struct in-memory representation and HDF5 persistence, in my scientific number-crunching codes. But I struggle to find the time get a good understanding of the potential pros and cons.

One question that I have repeatedly failed to answer relates to my heavy use of uom to encode physical quantities in Rust's type system. Fundamentally, uom is a glorified newtype pattern whose purpose is to ensure things in the following categories

  • adding distance to time is a compile-time error,
  • dividing distance by time gives a result with velocity type,
  • adding metres to inches automatically takes care of converting between the different but compatible units.

Would using Arrow allow me to continue to use uom types ergonomically?

I don't see how Arrow would relate to this, so I don't see any obvious ways in which it would make the usage of uom any harder by itself.

I don't see how the types mentioned here allow me to express the difference between metres and kiloelectronvolts.

Specifically, arrow documentation is full of things like Float32Array which, for me, have to represent a limitless number of different types.

uom allows me to write code like this:

fn foo(d: Length, t: Time) -> Speed { d / t }

How would Arrow's Float32Array allow the compiler to keep track of the difference between Length, Time and Speed?

It doesn't. You wouldn't use Arrow for that. Arrow is an in-memory representation for fast analytic queries, not unlike a database. It's not a domain modelling framework.

You'd have to convert between the dynamically-typed Arrow representation and your domain types at the API boundaries.

1 Like

Which was exactly my fear.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.