Request for early design review - General Intervals crate

Hello All!

As background, I am working on a Link Budget simulation tool in Rust /w MIT license - I've built these before at prior employers. This time I want to carry signals through the system as piecewise step functions - and I want to make liberal use of units of measure to ensure computation units throughout (super helpful! Wish I had something similar in prior link budget simulators). But a piecewise step function implementation really wants a rigorous concept of interval representations - and when I look at what libraries are available in almost any language of choice (C++ / Haskell / Rust / you name it) I find the generality is lacking or the use is more dangerous than it could be (See my RFC discussion of Alternatives to writing a new crate). So I'm taking a crack at an intervals_general crate.

If you made it this far - thanks! Wondering if parties are open to providing me discussion on the design doc (presently the landing README.md in my github repo) - and some early prototype code against the implementation goals. I've got it up on Github now Here.

Thanks for taking a look!

Nice proposal!

My association with interval arithmetic goes back to the early formulations in the mid-1960s, when proponents decried the hexadecimal floating point of the then-new IBM 360 architecture. I worked again with it in the early 1970s, concurrent with the genesis of IEEE floating point and its interval-arithmetic-supporting rounding modes.

My recommendation is to start with the full Interval Enum variant taxonomy and to offer two sets of constructors: strict (your alternatives 1 and 3) and bound-order-independent (your alternatives 2 and 4), both returning an Option. Your alternative 5 is disjoint, so can be added to (or deleted from) the Interval Enum at any time before crate stabilization.

Given the mathematically-discontinuous nature of all finite-precision floating point representations, I do not consider strict enforcement of interval width to be generally achievable, so I would ignore that issue during initial development.

Whether or not to change the Enum variant on overflow/underflow is a design choice somewhat impacted by the method signature. I recommend using a feature flag to code both ways, then see if one variant has a significant run-time or code-size or usability penalty relative to the other.

Again, nice proposal! I am particularly pleased that it uses uom; unfortunately not many crates do. Errors in units of measurement have led to some spectacular failures, including NASA's Mars Climate Observer.

2 Likes

Thanks for the review Tom!

Really valuable to have floating point experts about - there is a lot to learn there w.r.t. corner cases (so often float behavior does not match intuition - I presume because it is not actually reasonable to have performant representation that is a true behavioral equivalence with real numbers+). As I snag the tip of that iceberg with this general_intervals crate design I got nervous. I really want to avoid assertions / exceptions if at all possible, but that becomes substantially more burdensome (as one example I think I would need to add support for num-traits Checked operations to detect underflow / overflow in Units of Measure?).

My recommendation is to start with the full Interval Enum variant taxonomy and to offer two sets of constructors: strict (your alternatives 1 and 3) and bound-order-independent (your alternatives 2 and 4), both returning an Option. Your alternative 5 is disjoint, so can be added to (or deleted from) the Interval Enum at any time before crate stabilization.

I don't know why - but I had gone into this thinking I had to settle on just one constructor. I guess because I thought that StructName::new() was such a "reserved keyword" (muscle memory if nothing else) and therefore you needed just one approach. Is it normal to offer multiple constructors and if so what is a nominal naming convention? I'll go dig through some Crates looking - but would it be something like:

  • BoundPair::new_strict_by_bounds(left_bound: T, right_bound: T) -> Option<BoundPair> // Alternative 1
    • Option returns None if left_bound >= right_bound
  • BoundPair::new_relaxed_by_bounds(one_bound: T, other_bound: T) -> Option<BoundPair> // Alternative 2
    • Swaps one_bound, other_bound if necessary when forming left_bound, right_bound
    • Option returns None if one_bound == other_bound
  • BoundPair::new_strict_left_and_width(left_bound: T, width: T) -> Option<BoundPair> // Alternative 3
    • Option returns None if width <= 0
  • BoundPair::new_relaxed_left_and_width(one_bound: T, width: T) -> Option<BoundPair> // Alternative 4
    • one_bound left / right-ness determined by positive vs negative width
    • Option returns None if width == 0

I had hoped that the relaxed alternatives would remove the Option needs, and they almost do - but the Singular Interval scenarios seem to be ruining that? At least if I intend to pass a BoundPair into the Interval Enum variant constructors (as there is no error handling possible in Enum construction?).

Instead of error checking exclusively in the BoundPair construction - I had wondered if there was a method to do so in the Enum constructors? But I wasn't sure what that would look like (would I need a constructor per variant? And could it then emit a differing type of Enum Variant than requested if for example bounds indicated a Singleton? But are we then hiding from the developer the fact that this was a Singleton when they though it was e.g. LeftHalfOpen?).

Given the mathematically-discontinuous nature of all finite-precision floating point representations, I do not consider strict enforcement of interval width to be generally achievable, so I would ignore that issue during initial development.

I worried about this - good to hear another opinion from a Floating Point expert. I could at least protect against width e.g. underflow / overflow if I manage to get Checked operations working? I'll de-prioritize dealing with this for now (which was sort of where I had settled in fear of deadlocking and not moving forward due to ratholes).

Whether or not to change the Enum variant on overflow/underflow is a design choice somewhat impacted by the method signature. I recommend using a feature flag to code both ways, then see if one variant has a significant run-time or code-size or usability penalty relative to the other.

I had not considered feature flags and I like that option. I'll add this to the design doc! Thank you.

Again, thanks for your time Tom!

That depends on the use cases that you want to support. I often use it when I have what I consider a default simple case, which gets the default name new, as well as more complex cases that need additional setup before use. The purpose of my comment here was simply to point out that you could develop all your mentioned alternatives, then prune before release if you decide to reduce the API surface.

I don't see how you can avoid providing some kind of "invalid use" feedback. Even adding an Invalid Interval Enum variant (your alternative 5) will just lead to users checking for this variant as a means of usage-error detection.

I feel you should detect underflow/overflow. Depending on the feature gate, you could either adapt the interval accordingly or return some sort of error. As in my earlier post, I question the utility of width as an interval attribute that can be meaningfully propagated, given the non-associative, non-distributive nature of finite-precision floating-point computation when loss-of-significance is occurring
(e.g., (x - y) * z ≠ x * z - y * z when x is numerically within 𝜖 of y)

Note that the Interval enum variant taxonomy from proofwiki is missing both unbounded singletons:
(-inf] and [inf). I find that taxonomy to be more problematic than helpful. Why not permit your bounds to include floating-point INF and -INF, and decide whether the distinction between open and closed intervals is of significant value? (That distinction clearly is potentially most useful at ±0.0, but is that sufficient to warrant the complexity of tracking openness/closedness vs simply detecting underflow and (maybe) denormalized results?)

As for constructors, for me only two constructors make sense: one using a pair of strict bounds including INF and -INF (e.g., (70, 80) km), and the other using a nominal (center) value and a symmetric error tolerance (e.g., 75 ± 5 km). Of course which of those constructors are relevant depends on your use case(s).

1 Like

Thanks again Tom! I'm going to need some time to digest and edit the design doc. I'm also working on the prototype piecewise function crate to see how use of Intervals goes.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.