Over-engineered? Generics over item type and item ownership


I’m working on an open-source library for tensor network computations for quantum physics and other sciences. I’d be grateful for some feedback regarding the following architectural ideas.

One basic concept of the library is a tensor, a kind of generalized n-dimensional matrix that can have various internal representations (dense, sparse, but there are also others).

One basic operations is contraction of two tensors, which is a generalization of matrix multiplication. It is important that contractions of different kinds of tensors are optimized, i.e. dense-dense, dense-sparse, sparse-sparse, etc. I’m considering having methods that allow to probe capacities of tensors (i.e. whether it has a dense representation, whether it has a sparse representation). I might also define a hierarchy of traits and use that.

The other basic concept is a tensor network, that is a graph of interconnected tensors that offers various algorithms that transform the network while preserving some invariants. For example there would be various algorithms to contract a network into a single tensor.

I would like to employ Rust generics for performance reasons, but also not to overdo with generality.

For example the whole library could be made generic over the floating point type, but I think that this is not necessary. That’s why I’m leaning towards using cargo features for specifying the floating point type: by default all computations would be done in f32 (say), and if the cargo feature “f64” is activated, all computations and data will switch to f64. Does this sound like a good idea?

Here are two cases where I think the use of generics is justified:

  • It is crucial that a network can hold a mixture of different tensor types. But in cases where the item type is fixed (all tensors in a network are dense, say), knowing this at compile time could save dynamic lookups of capabilities.

  • A network does not modify its tensors. It may no longer need some, or replace some. So tensors could be shared over networks within one thread or even across threads. But it feels overkill to use Arc<dyn Tensor> everywhere. After all quite often tensors will be created specifically for a given network and the network could own them.

In order to address both requirement I’m considering using Deref<T> where T: Tensor as the item type of the network. Then one optimized network could hold items of type Box<DenseTensor> and another more general network could hold Arc<dyn Tensor>.

Does this sound reasonable?

Is there a way to avoid using Box for the case where the network owns the tensors? I think I could define something like PseudoBox<T> that is simply a wrapper type and doesn’t put its payload on the heap. But is this even a good idea? Does something like this already exist, or is there a better way to achieve that?


For some prior art, look at ndarray

One thing to consider is whether or not you want to encode the tensor dimensions in the type system. On the one hand, that would let the compiler reject operations between incompatible tensors. On the other, you might end up with code bloat if the compiler emits a separate copy of every operation for the different tensor sizes in the program.

Thanks, but I know ndarray of course. Have been reading its source.

My library is going to use ndarray as a dependency, but itself it's located at a conceptually higher level. Like https://itensor.org/ but, with an alternative (and superior, I believe) approach to indexing.

Yeah, I'm not going to encode the number of dimensions in the type system (like the nalgebra crate). The low-level approach is really more like the one of ndarray/NumPy.

I've been on this project for over a year now. I've iterated through several designs using a Python prototype. Currently I'm reflecting on implementation details in Rust - see my questions above.

I'll try to be more concrete.

I meant rather like ndarray::ArrayD, because the ndarray crate has also fixed-dimensional arrays.

A simple model of a tensor network in Rust would be a petgraph::graph::UnGraph that holds ndarray::ArrayD for each node in some way (either directly, or wrapped as an Rc, Arc, or whatever).

Now I'd like to be able to implement algorithms that transform such a graph and make them generic over the kind (and even use or not) of smart pointer. (In this simple model no information is stored about the indices and how to contract them, but this is a detail that can be ignored for now.)

Most likely I'd like to follow the example of petgraph by defining a couple of network types that fulfill common traits, and algorithms that each require some of these traits. The difference to petgraph is that I need to work on weights (arrays), that - depending on the situation - could be stored differently (as Arc, or Rc, or Box, or owned directly, ...).

Is the set of tensor types (dense, sparse...) known in one central place at compile time? If so, you could also try an enum-based approach as an alternative to a trait/generics approach:

enum Tensor {

As I wrote here a long time ago, the tradeoff between trait objects and enums has many sides. To paraphrase my old comment with some extra considerations applicable to your specific problem...

  • With enums, you don't need to Box and Arc everything (although you will likely want a Box or Arc somewhere for the actual floating-point data, at least for larger tensors. For smaller tensors, something like the smallvec approach may allow you to avoid those tiny allocations that are more expensive than actually manipulating the tensor behind them.)
  • In exchange for this avoidance of heap allocation, you will pay the price of having every individual Tensor whose subtype the compiler does not infer use a slot on the stack which is as large as the largest Tensor subtype. This can become a problem if the subtypes have a widely different stack size.
  • With enums, dispatching over the right contraction implementation becomes as simple as a match (tensor1, tensor2), which can be compiled into a simple jump table when the compiler's optimizer is in a good mood. Code for this will also be a lot easier to read than an emulation of multiple dispatch over Rust traits, which only provide single dispatch.
  • Empirically, the compiler optimizer seems better at "de-enumifying" than devirtualizing traits in contexts where it actually knows at compile time which type of tensor you're dealing with.
  • In terms of code quality and cache locality, the trait approach tends to break down when you have too many methods, whereas the enum approach tends to break down when you have too many tensor types.

Concerning f32 vs f64 genericity, as we discussed previously I think that's extremely expensive in terms of code complexity with the current state of num-traits, which is why I try to get as much mileage as I can out of the feature-based approach until it breaks down and I am absolutely forced to implement such genericity.

1 Like

Also, as a more general point, I think using generics to avoid the performance costs of dynamic dispatch only makes sense if you are doing lots of "small" operations, whose runtime cost is comparable to or much cheaper than that of dynamic dispatch. With "larger" operations, the cost of dynamic dispatch should become invisible in the face of other runtime costs in your program.

For example, in linear algebra, it makes sense to have genericity over matrix size (like nalgebra) when optimizing for the performance of small matrices, because this saves you from the dynamic allocations, dynamic dispatch points, and unspecialized code (which the compiler tends to tune for large problems only) of a purely dynamic approach. But if you are only or mostly dealing with large matrices, a purely dynamic approach along the lines of ndarray or BLAS will be simpler in terms of code, compile faster, and provide good enough runtime performance.

Sometimes, a hybrid like smallvec also makes sense. In this approach, you have a general case which uses dynamic constructs, but have optimized a few special cases with static constructs. Every time a method is called, you check if one of the static specializations is applicable, and if so you use it instead of the general code. But beware that this approach only makes sense when you are often mixing objects and operations of different "sizes":

  • If most of your objects are well handled by the general code path, then writing static specializations and checking for them at runtime is unnecessary development, compilation and runtime checking overhead.
  • If most of your objects need the static code path for performance, then better performance is achievable with a purely static approach like nalgebra, because you can skip the overhead of the entire checking/dispatch machinery used to dispatch from the dynamic case to the statically optimized cases.
1 Like

Performance isn’t the only reason to choose a strongly-typed API, though. Compile-time correctness verification can also be a strong motivator— If I’m producing matrices that are incompatible with each other, it’s much nicer for the compiler to tell me up front that I can’t multiply them together than to have my program panic halt halfway through its calculation.

Whether this advantage is enough to balance out all of the downsides you’ve listed is a tricky question, and probably one that doesn’t have a definitive correct answer, but it should be considered during the analysis— The right approach may even be a hybrid one that exposes a strongly-typed interface to a dynamic calculation core.