Faster alternative of BitVec

Hi everyone,

I am new to Rust, and I am working on rewriting Open vSwitch's flow extracting logic using Rust to take advantage of Rust's memory safety feature. Basically, the flow extraction logic is used in the fast path packet processing to parse the packet headers. The rust implementation is build as a shared library, and it is used with the FFI to link with the rest of the Open vSwitch's C implementation.

When I am comparing the performance between the original and Rust implementations, I found that my Rust implementation is much slower on the packet forwarding performance. I then use the Linux perf tools to do some profiling. From the perf report as below, it looks most of the time consuming parts are the BitVec processing which is used to mark if a particular field in the packet header is set or not.

I am wondering if there are some faster bit vector crate for this use case? I found a couple of them from crate.io, such as Vector of Bits (Vob), bitvector, etc.., but not sure which one will be the best fit. I would appreciate any comment and feedback.

Thanks!

16.23%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::pointer::BitPtr<T>::from_bitslice
5.10%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::store::BitIdx::offset
3.40%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::slice::BitSlice<C,T>::bitptr
2.84%  pmd-c06/id:6  libovsflowrust.so   [.] <T as core::convert::Into<U>>::into
2.79%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::slice::BitSlice<C,T>::get_unchecked
2.70%  pmd-c06/id:6  libovsflowrust.so   [.] core::num::<impl isize>::overflowing_add
2.69%  pmd-c06/id:6  libovsflowrust.so   [.] core::ptr::<impl *mut T>::is_null
2.29%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::slice::BitSlice<C,T>::len
2.16%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::store::BitStore::get_at
1.84%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::pointer::BitPtr<T>::head
1.69%  pmd-c06/id:6  libovsflowrust.so   [.] core::ptr::non_null::NonNull<T>::new_unchecked
1.58%  pmd-c06/id:6  libovsflowrust.so   [.] <bitvec::store::BitIdx as core::convert::From<u8>>::from
1.56%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::miniflow::mf_ctx::assert_bv_map_not_set
1.56%  pmd-c06/id:6  libovsflowrust.so   [.] <bitvec::cursor::LittleEndian as bitvec::cursor::Cursor>::at
1.49%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::slice::BitSlice<C,T>::as_ptr
1.48%  pmd-c06/id:6  libovsflowrust.so   [.] <bitvec::slice::BitSlice<C,T> as core::ops::index::Index<usize>>::index
1.48%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::slice::BitSlice<C,T>::set_unchecked
1.46%  pmd-c06/id:6  libovsflowrust.so   [.] core::slice::<impl [T]>::len
1.45%  pmd-c06/id:6  libovsflowrust.so   [.] <bitvec::pointer::Pointer<T> as core::convert::From<*const T>>::from
1.41%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::pointer::Pointer<T>::w
1.39%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::store::BitIdx::is_valid
1.28%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::pointer::BitPtr<T>::pointer
1.22%  pmd-c06/id:6  libovsflowrust.so   [.] core::slice::<impl [T]>::as_ptr
1.11%  pmd-c06/id:6  libovsflowrust.so   [.] <bitvec::store::BitIdx as core::ops::deref::Deref>::deref
1 Like

Are you building in release mode (i.e. with cargo run --release/cargo build --release)?

2 Likes

Thanks for your feedback. I was not building with release mode. With release mode the packet forwarding rate increase quite a lot :smile:

BitVect still takes quite some time in the new perf profile.

 20.38%     0.00%  pmd-c06/id:6  [unknown]           [k] 0000000000000000
 16.50%    13.11%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::miniflow::mf_ctx::miniflow_assert_in_map
 14.59%     8.70%  pmd-c06/id:6  libovsflowrust.so   [.] <bitvec::store::BitPos as core::ops::deref::Deref>::deref
 13.82%     8.01%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::store::BitStore::get
 13.77%    10.53%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::store::BitIdx::offset
 10.42%     8.67%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::slice::BitSlice<C,T>::set
 10.07%     0.00%  pmd-c06/id:6  [unknown]           [.] 0x0000000000040401
  9.32%     0.08%  pmd-c06/id:6  ovs-vswitchd        [.] dp_netdev_process_rxq_port
  8.56%     5.45%  pmd-c06/id:6  libovsflowrust.so   [.] <bitvec::store::BitPos as core::convert::From<u8>>::from
  4.63%     4.60%  pmd-c06/id:6  ovs-vswitchd        [.] miniflow_hash_5tuple
  3.95%     3.47%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::miniflow::mf_ctx::miniflow_push_uint16_
  3.76%     2.95%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::store::BitStore::set_at
  3.41%     3.40%  pmd-c06/id:6  ovs-vswitchd        [.] __netdev_afxdp_batch_send
  3.23%     3.14%  pmd-c06/id:6  ovs-vswitchd        [.] dp_netdev_input__
  3.00%     0.00%  pmd-c06/id:6  [unknown]           [.] 0x0000000000000401
  3.00%     2.93%  pmd-c06/id:6  ovs-vswitchd        [.] netdev_afxdp_rxq_recv
  2.89%     2.88%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::parser::parse_l3
  2.77%     2.75%  pmd-c06/id:6  ovs-vswitchd        [.] dp_packet_use_afxdp
  2.67%     2.53%  pmd-c06/id:6  libovsflowrust.so   [.] rust_miniflow_extract
  2.66%     2.32%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::miniflow::mf_ctx::miniflow_push_uint32_
  2.40%     2.23%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::parser::parse_l2
  1.72%     1.71%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::parser::parse_metadata
  1.39%     1.33%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::parser::parse_l4
  1.27%     1.13%  pmd-c06/id:6  libc-2.23.so        [.] __memcpy_avx_unaligned
  1.17%     0.97%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::miniflow::mf_ctx::miniflow_pad_to_64_
  1.16%     1.15%  pmd-c06/id:6  libovsflowrust.so   [.] ovsflowrust::miniflow::mf_ctx::miniflow_push_macs_
  1.09%     0.95%  pmd-c06/id:6  libc-2.23.so        [.] __memcmp_sse4_1
  1.07%     0.92%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::pointer::BitPtr<T>::new
  0.71%     0.66%  pmd-c06/id:6  libovsflowrust.so   [.] bitvec::store::BitIdx::span

Are those methods big? It might be, that they're actually quite small in size, yet they do not get inlined, i.e. there might be optimization potential when marking them as #[inline] or #[inline(always)]. The overhead of calling a function is usually not that big, but if the methods don't perform much work and are called often, then it will be noticeable and inlining can make a huge difference.