Inversion-of-control resource management

shelby3 · May 9, 2016, 6:26pm

I don't understand. If that is the trait interface and I call it with two mutable iterators from the same collection, won't the compiler complain?

keean · May 9, 2016, 7:10pm

I would guess because the pointer arithmetic requires a raw pointer, it does not get the same safety checks as other references. I am not sure all collections would need unsafe code (for example to iterate a linked list may not), but for a slice (Rust equivalent of an array) we need pointer arithmetic, which means some unsafe code in the increment and dereference functions:

struct SliceIterator<T> {
    ptr : *mut T
}

fn increment(&mut self) {
    unsafe {self.ptr = self.ptr.offset(1);}
}

fn deref(&self) -> &mut T {
    let v : &mut T;
    unsafe {v = &mut *((*self).ptr);}
    v
}

But thats just the implementation of a specific iterator. The generic methods and traits have no unsafe code, and you would need unsafe code to implement a collection like a vector or an array in the first place. My aim here is to replicate the EoP algorithms as they are for the moment. I have some ideas about adding safety checks at only specific points that will make the library safe to use, although it will require the library implementation to be correct. This follows my general rule of expert library implementors get to do unsafe things internally, but expose safe interfaces to the user (if I can make that bit work of course).

shelby3 · May 10, 2016, 7:47pm

keean:

I would guess because the pointer arithmetic requires a raw pointer, it does not get the same safety checks as other references. I am not sure all collections would need unsafe code (for example to iterate a linked list may not), but for a slice (Rust equivalent of an array) we need pointer arithmetic, which means some unsafe code in the increment and dereference functions:
...
But thats just the implementation of a specific iterator. The generic methods and traits have no unsafe code, and you would need unsafe code to implement a collection like a vector or an array in the first place. My aim here is to replicate the EoP algorithms as they are for the moment. I have some ideas about adding safety checks at only specific points that will make the library safe to use, although it will require the library implementation to be correct. This follows my general rule of expert library implementors get to do unsafe things internally, but expose safe interfaces to the user (if I can make that bit work of course).

It seems to me that enforcing mutability exclusiveness at the low-level is not desirable. There will always be unsafe code. There is no such thing as a compiler that only produces safe code. Semantic mutability bugs and "memory leaks" will still occur with Rust. The only way to improve matters is high-level encapsulation, of which the examples I provided in this thread exemplify.

keean · May 10, 2016, 8:04pm

I actually agree with that I think component writers are responsible for providing safe components, that users can use without worrying about safety. Rusts collections for example the Vec cannot be implemented purely in safe code.

This is why Rust's unsafe block does not propagate into the type system like a monad in Haskell. If you consider impurity to be Haskell's version of unsafety, you can see in Haskell you can only call monadic code from other monadic code, and the top level has to be monadic.

I think it is good to prevent the user making such mistakes without being aware they are doing something dangerous, so for me Rust makes a good compromise between 'C/C++' everything is unsafe and Haskell's impure code can only be called from impure code approach.

notriddle · May 10, 2016, 8:10pm

Rust's idea of "unsafe" is pretty much C's idea of "undefined behavior." Memory leaks and other nastiness is undesirable, but only dangling pointers, memory races, exclusive pointer aliasing, and invalid primitive types can cause demons to fly out of your nose (BTW, this list is significantly shorter than the equivalent list for C; it was chosen by the Rust devs to get maximum bang-for-buck in terms of compiler optimizations).

Maybe I'm just too blind to see it, but just because Rust's mutability rules do not fix all bugs does not make them useless; it rules out the insidious kind of bug that produces effects that have no relation to the source code, and it does it without requiring five times as much memory as languages that don't.

In other words, it seems low-level because it is.

shelby3 · May 10, 2016, 9:46pm

How do I in Rust allow two mutable iterators into the same collection so I can do algorithms that require it? Afaik, the mutability checker assumes those two can cause problems because the checker is too low level and can't determine the semantics that insure safety.

Upthread I devised a way to encapsulate the mutable iterators so I can could potentially satisfy Rust's compile-time check, but it is going to still require some complex code (and inefficient closure design pattern similar to my use of yield upthread to invert-the-control for iterators that @keean noted made Ada slower) that wouldn't be necessary using @keean's approach. And at no other gain.

@keean argues that the unsafe for mutability checking escapes only needs to be in the libraries, but I am doubting that. For example, if I want to receive an array of mutable iterators as indices in userland code. Rather it seems the iterators have to enforce safety. As for multi-threaded access safety, then we need immutable collections, which then does increase memory and CPU load. But I don't see Rust's low-level mutability checking providing any solution for a shared, multi-threaded access to a collection. I'm pondering that perhaps mutability seems to be a higher-level semantic issue that can't be typed (modelled) with low-level scopes. Do you disagree with my hypothesis that low-level scopes can't model higher-level encapsulation of mutability safety?

notriddle · May 10, 2016, 11:13pm

Get disjoint views of the collection and iterate over those. The standard library already has a variety of "split" methods for plain slices, for example, and the number of ways to get disjoint views like that is certainly not going to go down.

Or you can just use indices. If your algorithm's access pattern isn't just a forward or reverse walk, then that's probably what you need anyway. You pay the bounds check penalty, but if you haven't proved to the compiler that all accesses are in-bounds, then that's what we want it to do.

shelby3 · May 11, 2016, 6:29am

+1 Lol. Good one. I'll probably borrow that phrase.

That won't suffice for efficiently reversing a collection in place. Afaics, it won't allow for example some filtering algorithm that operates on a sliding window of the data, e.g. bicubic interpolation.

This is duplication of work, thus very low performant. If a function has already determined the bounds and wants to return iterators, then you propose throwing away information and returning only indices. And then if the underlying data structure is not O(1) for random access (e.g. a List) then the indices cause duplicate work to traverse the data structure more than once.

Afaics, the generative essence appears to be that scoping mutation at brace-delimited blocks (which thus includes function boundaries) is not resonant with the generalized semantics of mutation invariants. For example, a mutable iterator has a precondition disallowing two instances pointing at the same element in a multi-threaded context, and in most data structures even in the single-thread context there may be a precondition disallowing certain mutable operations on the collection while there exists a mutable iterator instance. Yet there is not a precondition disallowing two or more mutable iterators to modify elements of a collection in a non-threaded context. Mutation lifetime scoping at blocks can't model these invariants. Afaics, the only way to model those stated invariants is semantically in the design of the iterator library.

I am contemplating that Rust's compile-time checking of lifetimes (not just for mutation but also for memory deallocation) at the block scope, is probably incongruent with many of the general semantics programmers need to express. If that is true, then it means we are littering the code with lot of type annotations which have very low utility. And the problem is that for example, lifetime parameters end up being required every where (even if they are elided by the default ad hoc rules). That is why I am pondering about localizing the borrow checker to just function bodies, so that we can get some of its benefits without having lifetime checking grow tentacles into everything where it can't even model general semantics. Note that none of my comments are intended to advocate changing Rust (unless of course the super majority of the community somehow decided to, without me pushing it because I am not going to be the one to rock the boat and upset so many people who have invested in a lot of code already). I am trying to figure out what I want and then I have to figure out how to get what I want. I am sharing my thoughts and investigation of this matter because I believe in open sourcing ideas and providing a permanent record for others to access (including for myself to refer back to).

As @keean wrote, there can be a difference of priorities and focus that I have to respect and I do. But perhaps it could also be the case that those priorities are not rationally justified. I haven't yet answered that open question (and I may not be able to since omniscience is impossible and even getting a good cross-section of experience is time consuming and also maybe too burdensome on other readers here and their priorities).

emoon · May 11, 2016, 8:40am

While all of these discussions are interesting I'm thinking that may be better to have them on the internals list:

"Here is the hub for discussing everything related to the implementation and design of the Rust programming language"

shelby3 · May 11, 2016, 1:55pm

Thanks. That makes sense. My mistake for not being aware of that forum. If I create any new threads like this one (very unlikely), I'll create them there.

P.S. I think I've already made a decision. But I'll be paying attention to counter examples and such. Thanks.

shelby3 · May 11, 2016, 1:56pm

Discussion from private conversation that I think expounds on my prior reply to @notriddle:

But @keean's point was that the mutability checking would work fine in userland code and that only the library programmer would need to write unsafe code, but my example in my reply to @notriddle pointed out that is not true. So the mutability checking of Rust just ends up forcing unsafe code in userland code as well. Afaics, the "house of cards" (of globally disallowed mutability aliasing) comes crashing down.

That is why I asked you:

But afaics disallowing it always by default causes annotations of unsafe on that which might not be semantically unsafe.

There are other ways to get those same and even more aggressive optimizations such as declaring the data to be immutable.

I have no qualms with an optional annotation that disallows aliasing.

I presume you mean compile-time protection.

Perhaps this is the reason my proposed encapsulation by inverting the control with yield might be superior to @keean's design pattern employing instances of iterators? But still we'd need iterator instances to pass around positions in a collection. Seems we'd need to guarantee that the world can't get a mutable reference to the collection while an iterator instance exists. That sounds like dependent-typing, which is not Turing complete.

The only "solution" I can think of is the iterators should panic! or print a log file error entry.

notriddle · May 11, 2016, 3:13pm

Yes it will:

fn reverse<T>(slice: &mut [T]) {
    let (front, back) = slice.split_at_mut(slice.len()/2);
    for (f, b) in front.zip(back.rev()) {
        swap(f, b);
    }
}

You can either iterate over the entire list repeatedly with chunks_mut, or use some streaming-iterator implementation (that is, a modified iterator that ties the lifetime of the values to the iterator itself). The latter has better spacial locality, the former is in the standard library.

Go ahead, but I didn't invent that phrase. It goes back to 1992 as an example of a valid result of performing undefined behavior in a C program (Rust has the same lack-of-semantics for UB, but fewer and more clearly marked ways to invoke it).

We're talking about a bounds check; people write performant Java programs with fewer ways to avoid them than Rust has.

Except for the Linked Lists example. In that case, we probably could use a RefCell-esque mutable Iterator, like you proposed later. But that adds a highly-predictable branch, just like an index does, so it's not all that useful for a Vec. In fairness, though, linked lists are pretty much only useful for stacks and queues, because heavily iterating over them is a very cache-unfriendly operation even if you do it right.

Strictly speaking, the Rust team is working on a rewrite of half of rustc that will have a better borrowck than the current one. It won't be able to solve all the problems you're having, but it will result in less dumb scopes added just to make a mutable borrow not be in scope. It's not tied to block scopes.

Actually, you're not allowed to do that in any context. The aliasing roles allow optimizing functions like this:

fn killjoy(x: &T) {
    x.a();
    x.b();
}

killjoy can load the contents of x into a CPU resister once and perform both functions on it, because the object cannot be mutated while the shared reference exists. Classic example of nasal demons if you use unsafe code to do it.

notriddle · May 11, 2016, 3:18pm

I should hope that all annotations of unsafe would be semantically safe. That's the whole point.

keean · May 11, 2016, 3:25pm

Is this a joke Every Java service I have ever been unfortunate enough to be involved with professionally has been a massive resource hog. We have a lot less problems with Python, and even less with Node.js. You cannot do any fast signal processing in Java on mobile devices either, but it is possible in native 'C'.

fn reverse<T>(slice: &mut [T]) {
    let (front, back) = slice.split_at_mut(slice.len()/2);
    for (f, b) in front.zip(back.rev()) {
        swap(f, b);
    }
}

This looks like a good example for my 'why functional coding style is bad' folder . How would you implement an in-place quicksort? Actually its probably not that hard, I think maybe swapping odd and even elements is more pathological for this style though.

I think the interesting thing is that a slice is effectively a pair of iterators defining a range, so I suspect you can mutably borrow a range and then operate on that.

shelby3 · May 11, 2016, 4:06pm

Afaics, my point not addressed by @notriddle's reply is that a set of two mutable iterators in one borrow is afaics not a generalized abstraction. What if my algorithm needs three mutable iterators or two mutable ranges?

I didn't keep up with such arcana. Now I see John F. Wood employed colorful language.

My point is afaik (and I may be incorrect on this assumption) unsafe annotations leak/litter into user land code on code that is semantically safe, or which can't be compile-time safe. In either case, it seems we are fighting with the type system. My vision of a type system is one that we almost never have to cast away. I hate corner cases. They complicate everything from documentation to understanding to readability to consistency of executable and community. I hate incidental complexity.

That being said, nothing is perfect in life. And Rust's use cases and community may have good rational justifications for such priorities. I don't wish to interfere. I would like to know if any of my understanding is incorrect.

notriddle · May 11, 2016, 4:52pm

... more splitting? We're talking about a function that takes a slice, and getting two slices back. You can then split those slices in half, too, until you're just splitting a slice with one or zero items. The Rust standard library has a pure safe implementation a binary search done like this.

Python does bounds checking, too. I'm going to take a wild guess that the Java services you've had the misfortune of working with were enterprise software, and were slow despite running on HotSpot, not because of it.

Pointer chasing that comes with using way too many objects will kill you, as will thrashing the GC by spawning way too many short-lived objects.

shelby3 · May 11, 2016, 6:31pm

Okay I didn't understand that the implementation is employing a raw pointer which turns off all borrow checking. So that is why thought it was employing a borrow checked pair, but now I understand it instead turns off all safety in user land code.

If the implementation allows overlapping ranges then it is no longer safe. Perhaps some algorithm may require overlapping ranges.

The borrow checking is doing nothing to help insure safety with multiple mutable iterators. We are relying on the semantics of the implementation which the borrowck can't check.

--- no reply required to the following, just expressing my philosophy as it currently stands --

What I am trying to explain is that philosophically I support a PL design principle that type systems are valuable when they aid documentation, clarify, simplify, and promote consistency— not when they must be routinely avoided and casted away. Type systems can't and shouldn't have the goal of catching all bugs. Of course I want a compiler to check for errors that don't require a globally enforced contract which is routinely casted away (especially silently).

Afaics, global lifetime and borrow checking is fundamentally fighting against the fact that the universe is composed of partial orders, not a total order. And for what gain? To guarantee non-aliasing contracts where it hasn't been SILENTLY turned off with raw pointers. Where is the consistency, clarity, etc? Afaik, the significant gains from tracking mutability come from partitioning to enable parallelism or immutability which enables concurrent scaling, but these are typically high-level, macro level design decisions. A low-level check that attempts to enforce a global, total order seems to me to be incongruent with the physics of our existence.

I do roughly envision a possibly very significant advantage of compile-time checking the borrowing of memory lifetimes local to the function body, if that can help offload from GC the destruction of temporary objects and if it won't require lifetime parameters. And that is because thrashing the GC with temporary objects appears to be one of the most significant weaknesses of GC. That is only enforcing a partial order, and not attempting to compile-time check a global consistency which can't exist. In short, I don't want a type system that silently lies to me very often. As another poignant example, subclassing is an anti-pattern because it routinely lies about the implicit invariants.

I like typeclasses without subtyping, because afaik it doesn't ever silently lie, unless you cast it away somehow. The way that you cast away traits is by destructuring the trait object with RTTI employing some cast to data type such as a match-case or instanceOf. But at least it is explicit and not silent in every instance. This becomes an anti-pattern in the sense that it disallows extension in one dimension of the Expression Problem. That is why I devised that complete solution to both dimensions of the Expression Problem.

P.S. I was drawn to Rust by the excellent typeclasses and Crate system. It is unfortunate that I can't adopt Rust for my current project because:

The emphasis on these low-level global borrow checking for resources lifetimes and mutability.
Lack of GC semantics.
Lack of runtime for asynchronous programming.
Lack of Javascript compile output target (which would provide the JIT compilation portability I need).

I do realize the above are due to Rust's different focus and use case target.

notriddle · May 11, 2016, 10:45pm

split_at_mut cannot be used to alias mutable pointers, and is thus a safe API. It turns off safety within itself, as part of the implementation of slices, but whatever algorithm you implement on top of it is safe.

split_at_mut cannot be used to alias mutable pointers, and is thus a safe API. The type system is not powerful enough to prove that the function itself is correct, but borrowck is verifying that the function that uses it is correct (because the returned two slices have the same lifetime as the one input lifetime, the two returned slices cannot be dangling).

The implementation of spit_at_mut is not being checked, but borrowck is still checking that whatever user code calls split_at_mut is correct.

Am I misunderstanding you, or are you claiming that Rust's ownership model is a list rather than a tree? Because I've always understood that an owner can own multiple things, and I've actually worked on large Rust projects. Rooted trees are partial orders, not total orders.

notriddle · May 11, 2016, 10:54pm

Actually, that's fairly similar to how Servo manages its layout tree. The big persistent data structure is a root BlockFlow with some number of child Arc<Flow>s. It uses Arc instead of a true GC, but that's a fairly minor difference.

shelby3 · May 12, 2016, 4:01am

Since borrow checking is turned off by the raw pointers employed by split_at_mut, what can prevents invoking it more than once on the same slice? I presume the mutable reference to the slice is not borrowed by the split_at_mut since the raw pointers employed by split_at_mut are said to turn off borrowck.

How can borrowck know that split_at_mut mutably references the slice if it doesn't check what a raw pointer references? Are you saying that split_at_mut inputs a move of a reference from the slice and indicates to the borrowck that it moves that reference to the output which contains two raw pointers?

Also I noticed the code sample you provided seemed to show the front and back raw pointers being assigned to two different instance variables. Since you say these can be split further, how can the borrowck track that these are only split once each, given your example split the mutable reference into two instances? It sounds to me like you are claiming that borrowck allows a tree of mutable reference duplicates for as long as the split_at_mut claims that these are disjoint?

If the last sentence of the prior paragraph is the way split_at_mut works, then I still ask how to implement an algorithm that requires overlapping mutable ranges? For example, if we invoke two separate algorithms (or same algorithm twice with different input criteria) in which each return a list of mutable references to the same collection, and we keep both results around, then how can we be sure that the two lists are disjoint?

The point is that afaics there is no way to express general abstraction by tracking the borrowing of mutability in the way that Rust wants to.

I don't understand how it can do this, per my statements/questions/bewilderment above. Especially it seems to be impossible in the case of general abstraction.

I appear to be claiming the opposite, that Rust checks borrowing in a tree of disjointness, but that is a total ordering of disjointness whereas in real life we need multiple partial orders (e.g. a list or lists per my example above) on disjointness per the example I provided above.

Topic		Replies	Views
Rust From Pure-Functional Scala	45	5591	September 4, 2023
Functional Effects Vs. Rust	43	3138	March 2, 2024
RFC and paper: Experiences Building an OS in Rust	8	4705	January 12, 2023
Negative views on Rust ( not mine! )	83	10848	April 17, 2022
RAII language "more functional" than Rust?	6	1861	November 10, 2019

Inversion-of-control resource management

Related Topics