Optimizing my module created for old dictionaries

Hello, I would like to ask if anyone could advise me on how I could optimize my module? I use it to deserialize old dictionaries in a scripting language, but the problem is that I don't like that I first have to use collect to vec from split in order to be able to iterate by 2.

use {ahash::AHashMap, std::iter::once};
pub enum Value {
Int(i32),
Float(f64),
Vector(Vec<u8>),
Bool(bool)
}

pub fn generate(data: &[u8]) -> AHashMap<Vec<u8>, Value> {
let map = data
.split(|&x| x == 0)
.collect::<Vec<&[u8]>>()
.windows(2)
.step_by(2)
//.chunks_exact(2)
.map(|x| {
let key = x[0];
let value = x[1];
let mut key_buf = Vec::new();
key_buf.resize(key.len(), 0);
let bytes_written = base64::decode_engine_slice(key, &mut key_buf, &base64::engine::DEFAULT_ENGINE);
key_buf.resize(bytes_written.unwrap(), 0);
let temp_value = &value[1..];
if value[0] == 1 {
(
key_buf,
Value::Int(lexical_core::parse(temp_value).unwrap())
)
} else if value[0] == 2 {
(
key_buf,
Value::Float(lexical_core::parse(temp_value).unwrap())
)
} else if value[0] == 3 {
let mut value_buf = Vec::new();
value_buf.resize(temp_value.len(), 0);
let bytes_written2 =
base64::decode_engine_slice(temp_value, &mut value_buf, &base64::engine::DEFAULT_ENGINE);
value_buf.resize(bytes_written2.unwrap() + 1, 0);
(key_buf, Value::Vector(value_buf))
} else {
(key_buf, Value::Bool(temp_value[0] != 48))
}
})
.collect();
map
}

pub unsafe fn get_dict_size(data: &[u8]) -> usize {
if data[data.len() - 1] == 0 {
data[0..data.len() - 1].split(|&x| x == 0).count() / 2
} else {
0
}
}

pub fn dict_get_keys_double_values(data: &[u8], sort: i32) -> Vec<u8> {
let mut list = data
.split(|&x| x == 0)
.collect::<Vec<&[u8]>>()
.windows(2)
.step_by(2)
//.array_chunks::<2>()
.filter_map(|x| {
let key = x[0];
let value = x[1];
let mut key_buf = Vec::new();
key_buf.resize(key.len(), 0);
let bytes_written = base64::decode_engine_slice(key, &mut key_buf, &base64::engine::DEFAULT_ENGINE);
key_buf.resize(bytes_written.unwrap(), 0);
let temp_value = &value[1..];
(value[0] == 2).then(|| (key_buf, lexical_core::parse(temp_value).unwrap()))
})
.collect::<Vec<(Vec<u8>, f64)>>();
if sort == 1 {
list.sort_unstable_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
} else if sort == 2 {
list.sort_unstable_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
} else {
}
list
.into_iter()
.flat_map(|x| {
x
.0
.into_iter()
.chain(once(0))
.chain(x.1.to_string().as_bytes().iter().copied())
.chain(once(0))
.collect::<Vec<u8>>()
})
.collect::<Vec<u8>>()
}

(Playground)

Please format your code. This is completely unreadable. The Playground contains the same thing, too – I hope you are not actually writing code without any indentation. (A good start would be to press Tools > Rustfmt in the Playground.)

I cleaned up the code significantly (removed useless collects, an useless unsafe, etc.), but there still is a lot of room for improvement (e.g. everything is unwrap()ped, there is no proper error handling).

I don't know if this works, since your code is not self-contained – there are quite a few type/function definitions missing, so it didn't even compile in the first place.

1 Like

I'm sorry, but I'm a blind user and my code is only formatted so that
there are no spaces at the beginning so that I don't have to go around
it further, error handling will be done when I manage to optimize it
and my code should compile but not on the playground because there
they don't carry all crates

2022-12-07 16:31 GMT+01:00, H2CO3 via The Rust Programming Language
Forum notifications@rust-lang.discoursemail.com:

1 Like

i am edited your code to the point where i was able to understand it
just to be able to compile and also formatted so non blind people
would check this easier

No, I am sorry, I didn't know you were blind. Completely understandable in this case.

that's fine, because according to the rules I should do it and give
you a formatted k´d, which I also format with these settings:
unstable_features = true
brace_style= "PreferSameLine"
edition = "2021"
imports_granularity = "One"
tab_spaces = 0
trailing_comma = "Never"
but I would like to ask now that the code can be compiled, do you
think it was possible to speed up the processing somehow? because I
also used itertools before, but it seems that it is slower even though
there is no collect

2022-12-07 17:28 GMT+01:00, H2CO3 via The Rust Programming Language
Forum notifications@rust-lang.discoursemail.com:

Are you measuring/running in release (optimized) mode?

yes release build latest rus and 2021 edition

2022-12-07 19:08 GMT+01:00, H2CO3 via The Rust Programming Language
Forum notifications@rust-lang.discoursemail.com:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.