Code Review - First Rust Crate

Hey,

This is my first rust project where I ported a Lockless Work Stealing Job System from C++ to Rust. I would appreciate if someone with more rust experience could go over the existing code and provide some insights into things that could be improved or be updated to be more idiomatic.

You can find all the info/code at crates.io page.

Thanks in advance for your time!

Some comments in no particular order:

  • Very few of your public types implement standard traits like Debug, Clone, Default, Eq, Hash, etc.
  • These lines look like you are using unsafe to circumvent the borrow checker. This part does the same, and so does this code and this code. In 99% of the cases, this means that the resulting code has Undefined Behavior. What's the problem you are trying to solve here?
  • Please don't roll your own PRNG. There's the official rand crate by the Rust developers that is battle-tested, robust, fast, has a variety of predictable-quality PRNGs, and should be preferred.
  • This could be written simply as vec![Z::default(); vec_size].
  • Do not put trait bounds on struct and enum type definitions like this or this; they make instantiation of the types unnecessarily restrictive. Add such where clauses in impl blocks instead.
  • Here you could just round up to the next power of two and make the new() function infallible.
  • This line has a spurious move.
  • This doc comment sandwiched between two uses looks weird. What are you documenting here?
  • If you are using null pointers for signaling the absence of a pointed value, use Option<NonNull<T>> instead of *mut T. It has the same size due to niche optimization, but it is a lot more expressive.
  • Here it seems odd to intentionally downgrade a useful Result type to Result<(), ()>. You shouldn't do this unless specifically needed.
5 Likes

Hey @H2CO3, thanks for taking the time to take a look. In order of appearance:

  • Good Point
  • Regarding the first two links, I'm trying to divide the slice into separate sub-slices which are unique to each worker thread I couldn't find any way to achieve this without resorting to unsafe. Regarding the latter 2 links, I was under the impression that de-referencing pointers always needs to happen in an unsafe block, is this not the case?
  • Fair point! Old habits :sweat_smile:
  • Ah nice, didn't know about that :slight_smile:
  • Should I still avoid that even though I want those type to be restrictive or will having this on the implementation side actually achieve the same? In the end I don't want users of the API to use the JobHandle outside of that context.
  • The new function could still fail due thread failing to start, but I do agree this would make it nicer to use!
  • Noted.
  • Good catch, must have missed that in my cleanup.
  • That check is only there for a debug assert. Under no circumstances should that pointer be null. Good to know about Option<NonNull<T>> though.
  • I downgraded it since, the caller site can't really do anything useful. Would it bet better/more idiomatic to just pass the original error up?

split_at, split_at_mut.

3 Likes

Yes, but the question is why are you using raw pointers in the first place, instead of references or slices?

In that case, you should use a plain NonNull which is statically known to be not null, rather than falling back to a runtime assertion.

I wrote a Stack Overflow post about this topic. In your case it looks like your desire to put a bound on the type itself led you to use an improper PhantomData<T>, at least in StorageView – I'm not sure about JobHandle because JobHandlePrivate just contains a *mut Job and I don't know what that means as far as ownership. The type argument you use for PhantomData shouldn't always be T. Sometimes it can be &'a T or fn(T) -> T or *mut T – the choice is subtle and I don't understand the code well enough to tell you which one is most appropriate here.

In the face of ambiguity, normally, you should err on the side of not using PhantomData and moving the bounds to the impls and fns that actually use T. However, in the case of StorageView<T> that looks wrong because you're using it as a handle to a buffer (JobStorage) that actually does only hold one specific T at a time. This raises questions about whose responsibility it is to drop the T. If the answer is StorageView<T>, maybe my initial guess is wrong and PhantomData<T> actually is the right choice; however, that doesn't seem to be the case because StorageView<T> itself doesn't implement Drop. Since I don't grasp the ownership of the T I can't say with confidence what your PhantomData should be. If you also don't know... well, you probably shouldn't use unsafe with it until you can figure that out.

Other stuff:

  • Don't use transmute to cast between one raw pointer and another raw pointer; as works fine.
  • Put comments on your unsafe blocks to show how you guarantee the code inside is sound. In some cases I don't think you can guarantee it, which means it's unsound.
  • Case in point. This is unsafe because the output lifetime is not bound to the input lifetime, so you can call as_trait() two times in a row and get aliased mutable references. If changing this to return &mut dyn FnMut() breaks the rest of your code, it's almost certainly because you were using it incorrectly. Yeah, it's a private interface and maybe the unsafety isn't exposed to the outside world. But how do you know that for sure, if you don't use unsafe to document the places where soundness requirements are created / satisfied?
  • Another issue. You can cause unsoundness by calling drop twice on the same self. This should probably be part of a Drop impl, which can't be called manually, or be a function that takes self by value, or maybe be an unsafe fn. (Also note ptr::drop_in_place exists.) Again, maybe it's not exposed to the world, but I can't easily confirm that because the way you use unsafe isn't accurately flagging all the places where special soundness constraints need to be upheld.
  • Similarly, StorageViewTrait::do_drop should probably be an unsafe fn as well (do_run seems fine?, as long as you can guarantee StorageView<T> is valid for T)
  • On that topic, why is StorageViewTrait a thing? Seems like you could write those functions directly on StorageView itself. Maybe I'm missing something.

Stray idea: in Job, the combination of Option<JobFunctions> and JobStorage feels like a partial reimplementation of dyn Trait. You're storing stuff inline in Job itself, which means you can't use dyn Trait inside Job. But the only way it's exposed to the outside world (that I see) is through JobHandle, which is a pointer. Perhaps you could do something like push the T parameter into Job and JobStorage, but use Job as Job<dyn Stored> in JobHandle? You might be able to reduce the unsafe code surface by a lot, and it might also be better for inlining. You could possibly even do some const generics ninjitsu (currently in beta) and restrict T to sizes less than 64/128 at compile time, instead of having to rely on runtime checks.

1 Like

I took some more time to look over this today and here are some more observations:

  • TLS_THREAD_DATA_INDEX could easily be a Cell instead of RefCell
  • JobThread::new doesn't seem like it should exist. You only use it to populate a Vec and then overwrite the contents immediately. You can replace resize_with with reserve (which increases the capacity but not the length) to avoid needing "empty" JobThreads.
  • Similarly, with JobSystem::new, you start by creating the JobSystem with dummy values, and then you replace the dummy values with real values, when you could just create the real values and wrap them all up into a JobSystem at the end of the function.
  • ThreadDataWrapper and ThreadDataAccessor don't, in my opinion, seem to make access to ThreadData any better. It's still just as unsafe as if you were using UnsafeCell<ThreadData> and *mut/*const ThreadData directly.
  • You've got a lot of &muts for a lock-free queue. Atomics only require & (shared) access, which makes me suspect that the ThreadData bit is actually unsound because you're going through &mut to call functions like steal, push and pop that, to my understanding, can actually be called concurrently. It looks like you could replace most of those loads and stores with get_muts (note: that's a safe function), which strongly hints that they're not actually buying you anything.
  • What happens when there are multiple JobSystems? The JobHandles don't seem to be branded at all, so one could, for instance, get a JobHandle from sys1.create(...) and pass it to sys2.run(...). That seems very likely to be unsound.
  • Documentation looks very nice. Tests are very nice. Comments are a bit sparse.
  • You can use Box<[T]> instead of Vec<T> when the buffer won't shrink or grow, and similarly Arc<[T]> instead of Arc<Vec<T>>. It saves a pointer's worth of memory but the real advantage in this case is limiting the surface area of the code that deals with Vec so you don't have to remember not to call any methods that might reallocate.
1 Like

Yes, but the question is why are you using raw pointers in the first place, instead of references or slices?

In the original C++ design, after all the job pools and queues are allocated the data never moves. When one allocates a job you get back a pointer to said job as a handle. The same handle is then pushed to some queue and processed. I stuck with the pointer since I need to be able to transition from read to write handle on demand.

If I passed a reference around I wouldn't be able to change it to a mutable reference on demand. I think I would also run into problems trying to get multiple mutable references from the job pool.

Finally, since I know the usage is safe (only on thread ever writes to a handle at any given time), I didn't wan to pay the additional run-time check by using a Cell/RefCell.

Hey @trentj , thanks for the extensive feedback, much appreciated :slight_smile:

In your case it looks like your desire to put a bound on the type itself led you to use an improper PhantomData<T> , at least in StorageView

This was the oly way I found to solve the problem of trying o use a mutable reference multiple times in a loop. E.g.:

let js = JobSystem::new()?;
let mut v:u32 = 0;
for _ in 100 {
    let x = &mut v;
    let handle =  js.create(|| {
        x += 10;
    })?;
   ...
}

If I don't add the PhatomData<T>, the above code will compile just fine. I'm open to sugguestions on how to improve this.

However, in the case of StorageView<T> that looks wrong because you're using it as a handle to a buffer ( JobStorage ) that actually does only hold one specific T at a time. ... On that topic, why is StorageViewTrait a thing? Seems like you could write those functions directly on StorageView itself. Maybe I'm missing something.

The JobStorage can hold any type T. I don't know ahead of time which type this is. I could have use a Box<dyn FnMut>(), but my main focus with this code is to provide a 0 allocation job system after setup. The JobStorageView<T> allows me to do the right thing via Type Erasure. In the end you could argue that I'm just reinventing the wheel, but I couldn't find any other way to store the type data, since you can't calculate a size for dyn FnMut().
The JobStorageView<T> will generate 2 static functions which are unique to the type T which I then store in the JobStorage . One of those functions do_drop is used to drop the data when the job finishes or the job pool is destroyed.

his is unsafe because the output lifetime is not bound to the input lifetime, so you can call as_trait() two times in a row and get aliased mutable references ... created / satisfied?

While what you said is true, I can guarantee that this is only happens once and in one place, therefore making it safe. As you said, extra documentation wouldn't hurt to illustrate this point.

Again, maybe it's not exposed to the world, but I can't easily confirm that because the way you use unsafe isn't accurately flagging all the places where special soundness constraints need to be upheld
Are you suggesting I mark these things as unsafe to better get my point across/make it more obvious that things should be handled with care?

TLS_THREAD_DATA_INDEX could easily be a Cell instead of RefCell

Noted

JobThread::new doesn't seem like it should exist. You only use it to populate a Vec and then overwrite the contents immediately. You can replace...

Noted.

Similarly, with JobSystem::new , you start by creating the JobSystem with dummy values, a

Noted

ThreadDataWrapper and ThreadDataAccessor don't, in my opinion, seem to make access to ThreadData ...

This was the only way I could find to have multiple write access for the thread queues. Each Job thread needs to able to steal jobs from another thread's queue. I looked at how Rust's mutex worked under the hood and I mimicked the pattern. RefCell wasn't an option since it's multiple threads will request mutable borrows.
The design is still safe, but is not something that you can easily express in Rust.

You've got a lot of &mut s for a lock-free queue. Atomics

Even though the atomics don't require mut references, I still need to mutate the underlying queue. I don't see any way around this.

What happens when there are multiple JobSystem s? The JobHandle ...

That's a valid point. Do you have any suggestions on how to best achieve this branding? Other than this issue, the only other reason for concern would be to have multiple job systems created on the same thread. I shall address this. Thanks for pointing this out.

You can use Box<[T]> instead of Vec<T> when the buffer won't shrink or grow, and similarly Arc<[T]> instead of

Ah, nice. Thanks for the tip.

Ah nice, don't know how I missed that the first time around. Thanks!

Not just to make it more obvious, but maybe also to make it sound. It may be counterintuitive, but using unsafe more can actually be safer than using unsafe less, because the unsafe ways of doing things are more or less unrestricted – basically what you would use in C++ – whereas the safe ways have more rules that are easier to break.

&mut T is related to, but more limiting than, T* restrict in C. I wrote a Stack Overflow question about what I thought would be the C equivalent; I was wrong. The nearest Rust equivalent to a plain T* would be &UnsafeCell<T>. By translating T* "safely" as &mut T you've imposed some requirements on those references that I'm pretty sure you didn't mean.

Here's a much simpler example. This code has undefined behavior:

#[derive(Default)]
struct Counter {
    n: AtomicI32,
}

impl Counter {
    fn next(&mut self) -> i32 {
        self.n.fetch_add(1, Ordering::Relaxed)
    }
}

#[derive(Default)]
struct UnsafeCounter {
    q: UnsafeCell<Counter>,
}

impl UnsafeCounter {
    fn new() -> Self {
        Default::default()
    }
    
    fn next(&self) -> i32 {
        unsafe { (*self.q.get()).next() }
    }
}

unsafe impl Sync for UnsafeCounter {}

fn main() {
    let data = Arc::new(UnsafeCounter::new());
    let data_clone = Arc::clone(&data);
    thread::spawn(move || 
        data_clone.next()
    );
    let _y = data.next();
}

If both threads are in Counter::next at the same time, that means that two independent &muts exist referring to the same Counter, which is UB. Miri flags it too (click Tools → Miri). It looks like your ThreadData/Queue system has the same bug. This one's trivial to fix because you can just change &mut self to &self, but...

I'm thinking the original sin is using UnsafeCell in ThreadData instead of in (Lockless)WorkStealingJobQueue. It's the atomic accesses that "guard" mutable access to the contents of the queue, so the unsafe code that turns & into &mut should go next to them, inside push and pop and steal. If those methods take &self then I don't think you even need any unsafe in thread, and the unsafe impl Sync can go on the queue, which makes sense to me because that's the bit with the atomic shenanigans inside it.

Perhaps something like this:

pub(crate) struct LocklessWorkStealingJobQueue {
    bottom: AtomicI32,
    top: AtomicI32,
    jobs: Vec<UnsafeCell<JobHandlePrivate>>,  // or Box<[...]>
}

In the implementation, only upgrade a &UnsafeCell<JobHandlePrivate> to a real &mut JobHandlePrivate when you provably have exclusive access to it and are ready to mutate.

Eh, not really. "Branding" usually means attaching a type parameter to all your handles and the job system itself so that the compiler raises a type error if you mix them up. But runtime checks would probably work too – for example, maybe you could add an owner field to Job so that a JobHandle can check whether or not it belongs to a particular JobSystem.

Another possibility would be to store all the Job data in the JobSystem itself, and reduce JobHandle to a simple key, like an integer index instead of a pointer. That way if you use a handle with the wrong system, you might get logic bugs like accidentally running job 12 on system A when you meant to run job 12 on system B, but no undefined behavior.

1 Like

Oh, one more thing I noticed as I was going through earlier, but forgot to mention: pointers to JobStorage aren't necessarily properly aligned to contain anything of higher alignment than u8. UB is at least here, here, and here. You can either align JobStorage to some maximum alignment and panic when trying to store something of higher alignment (which will probably work virtually all the time), or switch to read_unaligned/write_unaligned (not sure how you could do that with as_trait; you might have to find a different way of approaching that).

1 Like

Good catch!, In my enthusiasm of getting this to run I completely forgot about that, doh!

Thanks for all the feedback. I'll definitely check out that Miri tool you mentioned.
I'll be back once I've incorporated all the changes.

Cheers.

I have updated the repo and crates.io with a new version that address most of the issues in this review.

I would love me some more feedback, if you guys are still up for it :).

I'm mention that there's no runtime cost to using Cell. It basically just prevents you from taking references.

Okay, if the data is Large there may be a runtime cost, the cost of copying the data.

A &Cell<T> has no runtime cost over other types of references. In fact, besides raw pointers, it is the closest that Rust has to the kind of pointers you find in C.

1 Like