What is the rust equivalent of numpy.where in the following code? I want to find indices for 4278.0<Xall<=4286.0. Here Xall is NumPy array. But in my rust code, I am using a vector instead.
I tried it using a handwritten logical code using for loop, but it is not fast enough. So does anybody know what is the rust equivalent of numpy.where in the following code?
Also what is the performance of the Python code, how did you measure it, did you run 'cargo run' or 'cargo run --release' (the former is only runs a debug build of rust which is usually magnitudes slower than the release version), how did you time the rust code?
I assume you’re working with 1d data if you say you’re using a vector. It might help if you also shared your previous attempt and how you tell that it’s too slow. It might also help if you link to some documentation of the numpy method in question yourself and provide a type-signature of what you’d like to do in Rust. Anyways, looking through the docs
When only condition is provided, this function is a shorthand for np.asarray(condition).nonzero().
numpy.nonzero(a)
Return the indices of the elements that are non-zero.
to fully understand what you’re after, it seems like this should be straightforward with iterators. E.g.
fn indices_between_4278_and_4286(x: &[i32]) -> Vec<usize> {
x.iter()
.enumerate()
.filter_map(|(index, &value)| (4278 <= value && value <= 4286).then(|| index))
.collect()
}
Thanks. This is the kind of approach I am looking for. I replaced integers with floats. At the position of numbers in the expressions, it says error[E0308]: mismatched types. expected &&f64, found floating-point number.
The filter method borrows whatever type the iterator has, if the iterator already is over &f64 the filter will expect an function that takes an &&f64.
You can now either dereference the parameter (by using **x<=...) or reference the constant twice (&&100.0).
I am doing this for a large amount of data in parallel. So I am already getting very high swap usage and low CPU usage. It is i/o bottleneck. So am trying to avoid unnecessary cloning data. Is it possible to just know the limits of indices so I can reference them from vector, like below (but faster using something like numpy.where() in python).
for j in 0..Xall.len() {
if Xall[j] >= 4278.0 && Xall[j - 1] < 4278.0 {
Xlow_left = Xall[j];
Xlow_left_index = j;
}
if Xall[j] > 4286.0 && Xall[j - 1] <= 4286.0 {
Xhigh_left = Xall[j];
Xhigh_left_index = j;
}
}
let Xdata_left = &Xall[Xlow_left_index..=Xhigh_left_index];
otherwise, can you write the code for me for this suggestion.
@exoplanet_hunter It looks like you can save some time by breaking the loop once you find Xhigh_left_index. It looks like a search in sorted data, is it? It might also be useful to try a binary search in that case, as long as the vector is long enough (I'd bother with it if the Xall sequence was longer than 100 elements).
I'd possibly use partition_point instead of the usual binary search method (both do similar jobs) - it makes it easier to both express the predicate and work with floats in that case. It looks like you want the partition point of elt <= 4286.0 for the high bound?