My generic heap with keys that can be modified efficiently

So I thought I would try making my Heap code generic. This is the result. The part that is a bit messy is converting U (which will typically be instantiated as either u32 or usize) to usize and vice-versa, I have functions z and u to do this, which I have to use a lot... I was wondering if there was a cooler way to do this ( I don't think so... but there is so much I don't know ). Doc for this code is here: GHeap in rustdb::pstore - Rust

#[derive(Default)]
/// Heap Node.
pub struct HeapNode<K, T, U> {
    /// Index of node from heap position.
    pub x: U,
    /// Heap position of this node.
    pub pos: U,
    /// Node id.
    pub id: T,
    /// Node key.
    pub key: K,
}

/// Heap for tracking least used page.
pub struct GHeap<K, T, U> {
    /// Number of heap nodes, not including free nodes.
    pub n: usize,
    /// 1 + Index of start of free list.
    pub free: usize,
    /// Vector of heap nodes.
    pub v: Vec<HeapNode<K, T, U>>,
}

impl<K, T, U> Default for GHeap<K, T, U>
where
    K: Default,
    T: Default,
{
    fn default() -> Self {
        Self {
            v: Vec::new(),
            n: 0,
            free: 0,
        }
    }
}

impl<K, T, U> GHeap<K, T, U>
where
    K: Default + Ord + Copy,
    T: Default + Copy,
    U: Copy + Default + TryFrom<usize>,
    usize: TryFrom<U>,
{
    /// Insert id into heap with specified key (usage). Result is index of heap node.
    pub fn insert(&mut self, id: T, key: K) -> U {
        let pos = self.n;
        self.n += 1;
        let x = self.alloc(pos);
        self.v[x].id = id;
        self.v[x].key = key;
        self.move_up(pos, x, key);
        Self::z(x)
    }

    /// Modify key of specified heap node.
    pub fn modify(&mut self, x: U, newkey: K) {
        let x = Self::u(x);
        let pos = Self::u(self.v[x].pos);
        let oldkey = self.v[x].key;
        self.v[x].key = newkey;

        match newkey.cmp(&oldkey) {
            Ordering::Greater => self.move_down(pos, x, newkey),
            Ordering::Less => self.move_up(pos, x, newkey),
            Ordering::Equal => (),
        }
    }

    /// Remove heap node with smallest key, returning the associated id.
    /// Note: index of heap node is no longer valid.
    pub fn pop(&mut self) -> T {
        assert!(self.n > 0);
        self.n -= 1;
        let xmin = Self::u(self.v[0].x); // Node with smallest key.
        let xlast = Self::u(self.v[self.n].x); // Last node in heap.
        self.v[xlast].pos = Self::z(0); // Make last node first.
        self.v[0].x = Self::z(xlast);
        self.move_down(0, xlast, self.v[xlast].key);

        // De-allocate popped node
        self.v[xmin].pos = Self::z(self.free);
        self.free = xmin + 1;

        self.v[xmin].id
    }

    fn move_up(&mut self, mut c: usize, cx: usize, ck: K) {
        while c > 0 {
            let p = (c - 1) / 2;
            let px = Self::u(self.v[p].x);
            if ck >= self.v[px].key {
                return;
            }
            // Swap parent(p) and child(c).
            self.v[p].x = Self::z(cx);
            self.v[c].x = Self::z(px);
            self.v[px].pos = Self::z(c);
            self.v[cx].pos = Self::z(p);
            c = p;
        }
    }

    fn move_down(&mut self, mut p: usize, px: usize, pk: K) {
        loop {
            let mut c = p * 2 + 1;
            if c >= self.n {
                return;
            }
            let mut cx = Self::u(self.v[c].x);
            let mut ck = self.v[cx].key;
            let c2 = c + 1;
            if c2 < self.n {
                let cx2 = Self::u(self.v[c2].x);
                let ck2 = self.v[cx2].key;
                if ck2 < ck {
                    c = c2;
                    cx = cx2;
                    ck = ck2;
                }
            }
            if ck >= pk {
                return;
            }
            // Swap parent(p) and child(c).
            self.v[p].x = Self::z(cx);
            self.v[c].x = Self::z(px);
            self.v[px].pos = Self::z(c);
            self.v[cx].pos = Self::z(p);
            p = c;
        }
    }

    fn alloc(&mut self, pos: usize) -> usize {
        let x = if self.free == 0 {
            self.v.push(HeapNode::default());
            self.v.len() - 1
        } else {
            let x = self.free - 1;
            self.free = Self::u(self.v[x].pos);
            x
        };
        self.v[pos].x = Self::z(pos);
        self.v[x].pos = Self::z(x);
        x
    }

    fn z(x: usize) -> U {
        match U::try_from(x) {
            Ok(y) => y,
            Err(_) => panic!(),
        }
    }

    fn u(x: U) -> usize {
        match usize::try_from(x) {
            Ok(y) => y,
            Err(_) => panic!(),
        }
    }
}

Those explicit matches could be replaced with .unwrap() or .expect() for a more informative error message instead of the completely baffling "expicit panic" that an argument-less panic prints.

Other than that, just don't do this. Keep the index as usize. People expect to index collections with usize indices, because that's what usize is for. If they want to use a different type, the burden of conversion should be on them.

The idea is to keep the Vec of HeapNodes as small as possible so it has a better chance of being in the processor cache.

Although I tried it more just to see how messy it would be, and to see if I could even do it as an exercise.

People using this don't index anything really, they just get a value back which is a parameter for the Modify function, and they can choose to use usize if they want, but in reality I am using u32, as that is sufficient. I mean this isn't really public ( although it is declared as public for now, although I will probably gate it with a non-default feature at some point ), I just wanted the possibility of tracing what is going on from a program to check what is going on, it is really an internal struct.

[ I should say I am toying with the idea of publishing this as a distinct crate, which is why I thought I would try making it generic. ]

1 Like

In that case, create a wrapper around Vec or [T] that can be indexed by all of the primitive integer types, and hides the necessary conversions to usize. That would make the code a lot cleaner.

2 Likes

A tried the wrapper approach, as below. Whether it is cleaner I will leave you to judge. I am still using usize (rather than U) to do arithmetic on index values. It is possible the client might want to use a type that doesn't have arithmetic ( such as [u16;3] say ), where they only need to provide conversions to and from usize, so maybe this is the way to go anyway. I also eliminated the Copy constraint on K and T.

#[derive(Default)]
/// Heap Node.
pub struct HeapNode<K, T, U> {
    /// Index of node from heap position.
    pub x: U,
    /// Heap position of this node.
    pub pos: U,
    /// Node id.
    pub id: T,
    /// Node key.
    pub key: K,
}

use std::ops::{Index, IndexMut};

#[derive(Default)]
/// Vector of HeapNodes indexed by U.
pub struct HeapVec<K, T, U>(Vec<HeapNode<K, T, U>>);

impl<K, T, U> Index<U> for HeapVec<K, T, U>
where
    U: Default + Copy + TryFrom<usize>,
    usize: TryFrom<U>,
{
    type Output = HeapNode<K, T, U>;
    fn index(&self, x: U) -> &Self::Output {
        &self.0[usize::try_from(x).ok().expect("HeapVec overflow")]
    }
}

impl<K, T, U> IndexMut<U> for HeapVec<K, T, U>
where
    U: Default + Copy + TryFrom<usize>,
    usize: TryFrom<U>,
{
    fn index_mut(&mut self, x: U) -> &mut Self::Output {
        &mut self.0[usize::try_from(x).ok().expect("HeapVec overflow")]
    }
}

#[derive(Default)]
/// Heap for tracking least used page.
pub struct GHeap<K, T, U> {
    /// Number of heap nodes, not including free nodes.
    pub n: usize,
    /// 1 + Index of start of free list.
    pub free: usize,
    /// Vector of heap nodes.
    pub v: HeapVec<K, T, U>,
}

impl<K, T, U> GHeap<K, T, U>
where
    K: Default + Ord,
    T: Default,
    U: Default + Copy + TryFrom<usize>,
    usize: TryFrom<U>,
{
    /// Insert id into heap with specified key (usage). Result is index of heap node.
    pub fn insert(&mut self, id: T, key: K) -> U {
        let pos = Self::z(self.n);
        self.n += 1;
        let x = self.alloc(pos);
        self.v[x].id = id;
        self.v[x].key = key;
        self.move_up(pos, x);
        x
    }

    /// Modify key of specified heap node.
    pub fn modify(&mut self, x: U, newkey: K) {
        let pos = self.v[x].pos;
        let cf = newkey.cmp(&self.v[x].key);
        self.v[x].key = newkey;

        match cf {
            Ordering::Greater => self.move_down(pos, x),
            Ordering::Less => self.move_up(pos, x),
            Ordering::Equal => (),
        }
    }

    /// Remove heap node with smallest key, returning the associated id.
    /// Note: index of heap node is no longer valid.
    pub fn pop(&mut self) -> T {
        assert!(self.n > 0);
        self.n -= 1;
        let xmin = self.v.0[0].x; // Node with smallest key.
        let xlast = self.v.0[self.n].x; // Last node in heap.
        self.v[xlast].pos = Self::z(0); // Make last node first.
        self.v.0[0].x = xlast;
        self.move_down(Self::z(0), xlast);

        // De-allocate popped node
        self.v[xmin].pos = Self::z(self.free);
        self.free = Self::u(xmin) + 1;

        std::mem::take(&mut self.v[xmin].id)
    }

    fn move_up(&mut self, mut c: U, cx: U) {
        while Self::u(c) > 0 {
            let p = Self::z((Self::u(c) - 1) / 2);
            let px = self.v[p].x;
            if self.v[cx].key >= self.v[px].key {
                return;
            }
            
            // Swap parent(p) and child(c).
            self.v[p].x = cx;
            self.v[c].x = px;
            self.v[px].pos = c;
            self.v[cx].pos = p;
            c = p;
        }
    }

    fn move_down(&mut self, mut p: U, px: U) {
        loop {
            let mut c = Self::u(p) * 2 + 1;
            if c >= self.n {
                return;
            }
            let mut cx = self.v.0[c].x;
            let mut ck = &self.v[cx].key;
            let c2 = c + 1;
            if c2 < self.n {
                let cx2 = self.v.0[c2].x;
                let ck2 = &self.v[cx2].key;
                if ck2 < ck {
                    c = c2;
                    cx = cx2;
                    ck = ck2;
                }
            }
            if ck >= &self.v[px].key {
                return;
            }
            let c = Self::z(c);
            // Swap parent(p) and child(c).
            self.v[p].x = cx;
            self.v[c].x = px;
            self.v[px].pos = c;
            self.v[cx].pos = p;
            p = c;
        }
    }

    fn alloc(&mut self, pos: U) -> U {
        let x = if self.free == 0 {
            self.v.0.push(HeapNode::default());
            Self::z(self.v.0.len() - 1)
        } else {
            let x = Self::z(self.free - 1);
            self.free = Self::u(self.v[x].pos);
            x
        };
        self.v[pos].x = pos;
        self.v[x].pos = x;
        x
    }

    fn z(x: usize) -> U {
        U::try_from(x).ok().expect("GHeap overflow")
    }

    fn u(x: U) -> usize {
        usize::try_from(x).ok().expect("GHeap overflow")
    }
}

Why are you calling .ok() on the Results before .expect()? You are losing all the information in the error message…

Well there isn't any information, overflow means the heap overflowed the indexing capacity - the largest usize that can be represented by U, or the largest usize - but I think the latter would result in a Vec panic from the push onto the underlying Vec.

Here is a version with all the arithmetic done in U, which is quite neat ( I think ):

#[derive(Default)]
/// Heap Node.
pub struct HeapNode<K, T, U> {
    /// Index of node from heap position.
    pub x: U,
    /// Heap position of this node.
    pub pos: U,
    /// Node id.
    pub id: T,
    /// Node key.
    pub key: K,
}

use std::ops::{Index, IndexMut};

#[derive(Default)]
/// Vector of HeapNodes indexed by U.
pub struct HeapVec<K, T, U>(Vec<HeapNode<K, T, U>>);

impl<K, T, U> Index<U> for HeapVec<K, T, U>
where
    usize: TryFrom<U>,
{
    type Output = HeapNode<K, T, U>;
    fn index(&self, x: U) -> &Self::Output {
        &self.0[usize::try_from(x).ok().expect("HeapVec overflow")]
    }
}

impl<K, T, U> IndexMut<U> for HeapVec<K, T, U>
where
    usize: TryFrom<U>,
{
    fn index_mut(&mut self, x: U) -> &mut Self::Output {
        &mut self.0[usize::try_from(x).ok().expect("HeapVec overflow")]
    }
}

#[derive(Default)]
/// Heap for tracking least used page.
pub struct GHeap<K, T, U> {
    /// Number of heap nodes, not including free nodes.
    pub n: U,
    /// 1 + Index of start of free list.
    pub free: U,
    /// Vector of heap nodes.
    pub v: HeapVec<K, T, U>,
}

impl<K, T, U> GHeap<K, T, U>
where
    K: Default + Ord,
    T: Default,
    U: Default
        + Copy
        + From<u8>
        + TryFrom<usize>
        + std::cmp::PartialOrd
        + std::ops::AddAssign
        + std::ops::Add<Output = U>
        + std::ops::Sub<Output = U>
        + std::ops::SubAssign
        + std::ops::Mul<Output = U>
        + std::ops::Div<Output = U>,
    usize: TryFrom<U>,
{
    /// Insert id into heap with specified key (usage). Result is index of heap node.
    pub fn insert(&mut self, id: T, key: K) -> U {
        let pos = self.n;
        if pos * 2.into() + 2.into() <= pos {
            panic!("GHeap overflow");
        }
        self.n += 1.into();
        let x = self.alloc(pos);
        self.v[x].id = id;
        self.v[x].key = key;
        self.move_up(pos, x);
        x
    }

    /// Modify key of specified heap node.
    pub fn modify(&mut self, x: U, newkey: K) {
        let pos = self.v[x].pos;
        let cf = newkey.cmp(&self.v[x].key);
        self.v[x].key = newkey;

        match cf {
            Ordering::Greater => self.move_down(pos, x),
            Ordering::Less => self.move_up(pos, x),
            Ordering::Equal => (),
        }
    }

    /// Remove heap node with smallest key, returning the associated id.
    /// Note: index of heap node is no longer valid.
    pub fn pop(&mut self) -> T {
        let zero = 0.into();
        let one = 1.into();
        assert!(self.n > zero);
        self.n -= one;
        let xmin = self.v[zero].x; // Node with smallest key.
        let xlast = self.v[self.n].x; // Last node in heap.
        self.v[xlast].pos = zero; // Make last node first.
        self.v[zero].x = xlast;
        self.move_down(zero, xlast);

        // De-allocate popped node
        self.v[xmin].pos = self.free;
        self.free = xmin + one;

        std::mem::take(&mut self.v[xmin].id)
    }

    fn move_up(&mut self, mut c: U, cx: U) {
        while c > 0.into() {
            let p = (c - 1.into()) / 2.into();
            let px = self.v[p].x;
            if self.v[cx].key >= self.v[px].key {
                return;
            }

            // Swap parent(p) and child(c).
            self.v[p].x = cx;
            self.v[c].x = px;
            self.v[px].pos = c;
            self.v[cx].pos = p;
            c = p;
        }
    }

    fn move_down(&mut self, mut p: U, px: U) {
        loop {
            let mut c = p * 2.into() + 1.into();
            if c >= self.n {
                return;
            }
            let mut cx = self.v[c].x;
            let mut ck = &self.v[cx].key;
            let c2 = c + 1.into();
            if c2 < self.n {
                let cx2 = self.v[c2].x;
                let ck2 = &self.v[cx2].key;
                if ck2 < ck {
                    c = c2;
                    cx = cx2;
                    ck = ck2;
                }
            }
            if ck >= &self.v[px].key {
                return;
            }

            // Swap parent(p) and child(c).
            self.v[p].x = cx;
            self.v[c].x = px;
            self.v[px].pos = c;
            self.v[cx].pos = p;
            p = c;
        }
    }

    fn alloc(&mut self, pos: U) -> U {
        let x = if self.free == 0.into() {
            self.v.0.push(HeapNode::default());
            let x = self.v.0.len() - 1;
            U::try_from(x).ok().unwrap()
        } else {
            let x = self.free - 1.into();
            self.free = self.v[x].pos;
            x
        };
        self.v[pos].x = pos;
        self.v[x].pos = x;
        x
    }
}

Your trait bounds can be simplified by quite a bit using the Integer trait in the num crate. Furthermore, the derived Default bounds on many of your types cause overly eager bounds (e.g. you don't need any of the types to impl Default in order to create an empty heap). Here's an improved version.

2 Likes

Using num pulls in quite a number of crates, so I think I will pass on that ( if you are already using num it would make sense ).

It is true that K doesn't need a Default, and by extension HeapNode doesn't need to implement Default either, but T does for std::mem::take to work. Clone is not needed (although T could be Clone instead of Default, but I think Default is more appropriate). I don't understand why you were manually implementing default?

My revised code here:

Edit: the TryFrom<usize> constraint on U is no longer needed, due to my revisions to how insert works.

That's not what I'm saying. What I'm saying is that the Default impls generated by the derive macro include the Default bound for all type parameters, even when it's unnecessary. Thus your structs won't implement Default for every concrete substitution that would allow it given the manual, less constrained impl.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.