Consuming a vector to replace placeholders in another; some style questions

I was wondering if there's a more idiomatic way to implement the following code. It's part of a compression library, and interleaves bits from one vector into another wherever placeholders are found. The use of an 'if' to select one of two prebuilt Bit()s, and the use of 'q' to index into arith_bits, both seem quite clumsy to me. Also, arith_bits is never used again, so perhaps I should be moving it out of self then draining it?

Not sure whether the map-{if-let}-before-match I used to avoid duplicating the Bit() processing logic is the sanest approach either. Any thoughts?

enum StreamSymbol {
    Byte(u8),
    Bit(u8),
    ArithPlaceholder,
}
struct Encoder {
    output_stream: Vec<StreamSymbol>,
    arith_bits: Vec<u8>,
    denom: u8,
}
impl Encoder {
    fn collapse_stream(&mut self) -> Vec<u8> {
        let mut nbits = 8;
        let mut bits_index = 0;

        {
            // propagate carries in numerator accumulator
            let mut acc = 0;
            for e in self.arith_bits.iter_mut().rev() {
                assert!(255 - *e >= acc);
                acc += *e;
                *e = acc & 1;
                acc >>= 1;
            }
            self.arith_bits[0] += acc * 2;
        }

        let mut res: Vec<u8> = vec![self.arith_bits[0]];
        let mut q = 0;
        for i in self.output_stream.iter().map(|i| {
            if let StreamSymbol::ArithPlaceholder = i {
                q = q + 1;
                if self.arith_bits[q] == 0 {
                    &StreamSymbol::Bit(0)
                } else {
                    &StreamSymbol::Bit(1)
                }
            } else {
                i
            }
        }) {
            match i {
                StreamSymbol::Byte(i) => res.push(*i),
                StreamSymbol::Bit(b) => {
                    if nbits == 8 {
                        nbits = 0;
                        bits_index = res.len();
                        res.push(0);
                    }
                    res[bits_index] += b << (7 - nbits);
                    nbits += 1;
                }
                StreamSymbol::ArithPlaceholder => {
                    assert!(false);
                }
            }
        }
        res
    }
}

(Playground)

This looks a bit better, I think..

    fn collapse_stream(&mut self) -> Vec<u8> {
        let mut nbits = 8;
        let mut bits_index = 0;
        let mut arith_normalised = vec![];
        let mut acc = 0;

        for e in self.arith_bits.iter_mut().rev() {
            acc += *e;
            arith_normalised.push(StreamSymbol::Bit(acc & 1));
            acc >>= 1;
        }

        let mut res: Vec<u8> = vec![acc];
        for i in self.output_stream.drain(..).map(|i| {
            if let StreamSymbol::ArithPlaceholder = i {
                arith_normalised.pop().unwrap()
            } else {
                i
            }
        }) {
            match i {
                StreamSymbol::Byte(i) => res.push(i),
                StreamSymbol::Bit(b) => {
                    if nbits == 8 {
                        nbits = 0;
                        bits_index = res.len();
                        res.push(0);
                    }
                    res[bits_index] += b << (7 - nbits);
                    nbits += 1;
                }
                StreamSymbol::ArithPlaceholder => {
                    assert!(false);
                }
            }
        }
        res
    }

assert!(false) should probably be panic!() :smile:

I would suggest to use a enum Bit for your StreamSymbol::Bit where you either have one of the two possible values.

You can "simplify" your acc/arith_normalised creation:

let (acc, mut arith_normalised) = self.arith_bits.iter().rev().fold((0, vec![]), |(mut a, mut v), &e| {
    a += e;
    v.push(StreamSymbol::Bit(a & 1));
    a >>= 1;
    (a, v)
});

I would convert your for each map into something like

self.output_stream
    .drain(..)
    .map(|i| {
        if let StreamSymbol::ArithPlaceholder = i {
            arith_normalised.pop().unwrap()
        } else {
            i
        }
    })
    .fold(vec![acc], |mut res, i| {
        match i {
            StreamSymbol::Byte(i) => res.push(i),
            StreamSymbol::Bit(b) => {
                if nbits == 8 {
                    nbits = 0;
                    bits_index = res.len();
                    res.push(0);
                }
                res[bits_index] += b << (7 - nbits);
                nbits += 1;
            }
            StreamSymbol::ArithPlaceholder => {
                panic!();
            }
        }
        res
    })

This way you don't have this awkward long if iteration thing, but instead two nice closures.

In general I try to reduce the amount of mut variables as much as possible.

Or even better: unreachable!()

3 Likes

So, basically closures closures closures :slight_smile:

Ok, I will continue down that path; cheers!

Thanks for the reminder about unreachable!(), Yandros - I think I had its existence niggling at the back of my brain somewhere.

2 Likes

Hmm, I'd quite like to use filter_map or similar for the final closure to remove the push()s, but there are few obstacles to inserting the last seven bits of each group of eight that way.

Starting from @hellow nice suggestions, I'd like to suggest some additional changes though, based on a personal preference:

  • you either move/copy immutables things around

    fn example (s: &'str) -> String
    {
        format!("{} !!")
    }
    
  • or you mutate things around:

    fn example (s: &mut String) /* -> &mut String */ // optionally
    {
         s.push_str("!!");
    }
    

That's why I don't like that much generating Vectors with fold (it both moves and mutates it).

Thus,

let (acc, mut arith_normalised) = self.arith_bits.iter().rev().fold((0, vec![]), |(mut a, mut v), &e| {
    a += e;
    v.push(StreamSymbol::Bit(a & 1));
    a >>= 1;
    (a, v)
});

becomes

let mut arith_normalised = Vec::<bool>::
    with_capacity(self.arith_bits.len()) // preallocate the right amount for performance
;
let acc = Iterator::fold( // function syntax for readability
    self.arith_bits.iter().rev(),
    0, // acc0
    |acc, &e| {
        let acc = acc + e;
        arith_normalised.push(acc & 1 != 0);
        acc >> 1
    },
);

As you can see, generating acc did not need mutation, hence being indeed a good candidate for fold ; arith_normalised, on the other hand, is mutated with .push anyways, so let's set it up through an &mut arith_normalised implicitily captured by the closure.

Aside

If there was no acc returned here, and the loop was used just to generate a vec, we could have "folded acc=()" through the more idiomatic for_each:

let mut arith_normalised = Vec::<bool>::
    with_capacity(self.arith_bits.len()) // preallocate the right amount for performance
;
self.arith_bits
    .iter()
    .rev()
    .for_each(|&e| {
        arith_normalised.push(e & 1 != 0);
    });

at which point we could generate the Vec without mutation with a classic call to collect() (or its twin FromIterator::from_iter):

use ::std::iter::FromIterator; // sadly not in scope by default :(

// no `mut`, no `with_capacity`; same performance!
let arith_normalised = Vec::
    from_iter(
        self.arith_bits
            .iter()
            .rev()
            .map(|&e| e & 1 != 0)
    )
;

End of Aside

.

You may have noticed that I have replaced the u8 type with a bool type, since a bool is guaranteed to be 0b0000_0001 or 0b0000_0000.

  • given x: u8, if we store b: u8 as x & 1, we know that the layout of b is the one just above, but it is not enforced at the type level; hence, when transforming it back to a u8 (when doing bit operations), we either trust ourselves that b has indeed the good layout, and use b: u8 as is (which is error prone in the long run), or we mask with & 1 again as a safeguard (not zero-cost).

  • On the other hand, if we store b: bool as x & 1 != 0, the comparison should be replaced by a zero-cost cast; and now we have a value which expresses our "either 1 or 0" invariant at the type level, thus allowing to safely use it in bit operations through a zero-cost cast back: b as u8;

In the same vein, if we remember the (in)famous assert(false) -> panic!() -> unreachable!(), I have gone one step further and safely removed that branch: How? by expressing our invariant at the type level:

enum StreamSymbol {
    Byte(u8),
    Bit(bool),
    ArithPlaceholder,
}

enum BitOrByte {
    Byte(u8),
    Bit(bool),
}

Since we were already performing a map to "map" ArithPlaceholder to the Bit variant, by defining a new enum and making that map slightly more explicit, now the type system is on our side for asserting the exhaustiveness of the match.

Lastly, and this is just really something I am experimenting with as of late (not (yet) recommended in real code), I have replaced the following kind of closures

|x| match x {
    PatternA => "A",
    PatternB => "B",
}

with OCaml's style function keyword:

function!{
    PatternA => "A",
    PatternB => "B",
}
3 Likes