I'm trying to sort words by their letters as alphabetic, but something is really wrong about lifetime,
references and borrowing.
use std::env::args;
fn serialize_input(input: &str) -> &[&[u8]] {
let mut bytes: Vec<u8> = Vec::new();
let mut words: Vec<&[u8]> = Vec::new();
for i in input.as_bytes() {
bytes.push(*i);
if *i == '\n' as u8 {
words.push(&bytes);
}
}
&words
}
fn merge(left: &[&[u8]], right: &[&[u8]]) -> &[&[u8]] {
let mut i: usize = 0;
let mut j = 0;
let mut merged: Vec<&[u8]> = Vec::new();
while i < left.len() && j < right.len() {
if &left[i][0] < &right[j][0] {
merged.push(&left[i]);
i += 1;
} else {
merged.push(&right[j]);
j += 1;
}
}
if i < left.len() {
while i < left.len() {
merged.push(&left[i]);
i += 1;
}
}
if i < left.len() {
while j < left.len() {
merged.push(&right[j]);
j += 1;
}
}
&merged
}
fn sort_words(input: &[&[u8]]) -> &str {
if input.len() < 2 {
let output = String::new();
for i in input {
for j in *i {
output.push(*j as char);
}
output.push('\n');
}
&output
} else {
let mid = input.len()/2;
let left = sort_words(&input[0..mid]);
let right = sort_words(&input[mid..]);
let mut merged = merge(left as &[&[u8]], &right as &[&[u8]]);
merged
}
}
fn main() {
let args: Vec<String> = args().collect();
let filename = &args[1];
let books = match std::fs::read_to_string(filename) {
Ok(text) => text,
Err(_) => panic!("Whoops... Silence in the library...")
};
let sorted = sort_words(serialize_input(&books));
}
Output:
Compiling library v0.1.0 (/home/alimkoca/Softwares/library)
error[E0106]: missing lifetime specifiers
--> src/main.rs:17:46
|
17 | fn merge(left: &[&[u8]], right: &[&[u8]]) -> &[&[u8]] {
| -------- -------- ^ ^ expected named lifetime parameter
| |
| expected named lifetime parameter
|
= help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from one of `left`'s 2 lifetimes or one of `right`'s 2 lifetimes
help: consider introducing a named lifetime parameter
|
17 | fn merge<'a>(left: &'a [&'a [u8]], right: &'a [&'a [u8]]) -> &'a [&'a [u8]] {
| ++++ ++ ++ ++ ++ ++ ++
error[E0106]: missing lifetime specifier
--> src/main.rs:49:35
|
49 | fn sort_words(input: &[&[u8]]) -> &str {
| -------- ^ expected named lifetime parameter
|
= help: this function's return type contains a borrowed value, but the signature does not say which one of `input`'s 2 lifetimes it is borrowed from
help: consider introducing a named lifetime parameter
|
49 | fn sort_words<'a>(input: &'a [&'a [u8]]) -> &'a str {
| ++++ ++ ++ ++
error[E0308]: mismatched types
--> src/main.rs:65:9
|
49 | fn sort_words(input: &[&[u8]]) -> &str {
| ---- expected `&'static str` because of return type
...
65 | merged
| ^^^^^^ expected `str`, found slice `[&[u8]]`
|
= note: expected reference `&'static str`
found reference `&[&[u8]]`
error[E0605]: non-primitive cast: `&str` as `&[&[u8]]`
--> src/main.rs:64:32
|
64 | let mut merged = merge(left as &[&[u8]], &right as &[&[u8]]);
| ^^^^^^^^^^^^^^^^ an `as` expression can only be used to convert between primitive types or to coerce to a specific trait object
error[E0605]: non-primitive cast: `&&str` as `&[&[u8]]`
--> src/main.rs:64:50
|
64 | let mut merged = merge(left as &[&[u8]], &right as &[&[u8]]);
| ^^^^^^^^^^^^^^^^^^ an `as` expression can only be used to convert between primitive types or to coerce to a specific trait object
Some errors have detailed explanations: E0106, E0308, E0605.
For more information about an error, try `rustc --explain E0106`.
error: could not compile `library` due to 5 previous errors
There are a lot of misconceptions and errors in your code.
First, you are trying to return the addresses of local variables. That's not going to ever work, because it's uncondtionally creating a dangling pointer. If you are creating a Vec inside a function, then return it by value.
Second, if you are pushing a reference to the bytes vector into words, then you won't be able to push any more to the inner vector – no other borrow can co-exist with a mutable borrow, because shared mutability leads to memory management errors (what if the vector needs to reallocate while you are still looking at it?).
I'd advise you to start reading the Book from the beginning, and complete some entry-level exercises before trying to do something more complex, because it is very apparent that you have an incorrect mental model of "returning something by reference", possibly influenced by another language, which you'll need to unlearn.
Following the principles that @H2CO3 mentioned, I was able to clean up your serialize_input and merge with only minimal changes (see below), but I couldn't figure out how you meant sort_words to work. Note that this is highly un-idiomatic Rust, and there are much cleaner ways to do the same thing.
fn serialize_input(input: &str) -> Vec<&[u8]> {
let bytes = input.as_bytes();
let mut words: Vec<&[u8]> = Vec::new();
let mut start:Option<usize> = None;
for (i,c) in bytes.iter().enumerate() {
if *c == '\n' as u8 {
if let Some(start) = start.take() {
words.push(&bytes[start..i]);
}
} else {
if start.is_none() {
start = Some(i);
}
}
}
words
}
fn merge<'a>(left: &[&'a[u8]], right: &[&'a[u8]]) -> Vec<&'a[u8]> {
let mut i: usize = 0;
let mut j = 0;
let mut merged: Vec<&[u8]> = Vec::new();
while i < left.len() && j < right.len() {
if &left[i][0] < &right[j][0] {
merged.push(&left[i]);
i += 1;
} else {
merged.push(&right[j]);
j += 1;
}
}
if i < left.len() {
while i < left.len() {
merged.push(&left[i]);
i += 1;
}
}
if i < left.len() {
while j < left.len() {
merged.push(&right[j]);
j += 1;
}
}
merged
}
Sorry if this is too much of a tangent, but I find your solution super interesting, since I've never seen a "lazy" merge sort before. I tried to work through the steps on a sheet of paper, and as I understand it it's building a binary tree of closures/boxes/iterators and populating the tree using the caching done by "peekable". So in a sense, even though the code looks like a merge sort, it's actually a heap sort!
The "heap" is allocated on the first call to merge_sort, filled on the first call to .next(), then emptied with each following call to .next(). Here's a plot of "init" time, "first .next() time" and "second .next() time", all plotted vs n*ln(n):
Log-log plotting broke libreoffice's trend line, but you can still see the R^2s for "init" time and "first .next()" time. The fact that the first element is the longest to grab and that the time to grab it grows with n*ln(n)seems to confirm my understanding.
Here's another plot with the time to iterate over the remaining elements, the total time to sort, and comparison with naive "eager" versions of heap and merge sort:
[the forum blocked this one. I'll post it in a followup]
The eager sorts are faster, but the majority cost of the lazy sort isn't paid all up front, only while pulling items from it, which is pretty cool!
I hadn't thought much about the performance characteristics, as I was just aiming for something that corresponded to the structure @alimkoca laid out. Your post made me realize that there's a relatively easy change that should vastly improve the runtime and memory usage for mostly-sorted lists: Instead of making the leaf nodes individual elements, use the already-sorted runs from the original list.
fn split_runs<T:Ord>(input: &[T])->Vec<&[T]> {
let mut result = vec![];
let mut start = 0;
for i in 1..input.len() {
if input[i] < input[i-1] {
result.push(&input[start..i]);
start = i;
}
}
result.push(&input[start..]);
result
}
fn _merge_sort<'a, T:Ord>(input: &[&'a [T]])->Box<dyn 'a+Iterator<Item=&'a T>> {
match input {
&[] => Box::new(std::iter::empty()),
&[r] => Box::new(r.into_iter()),
_ => {
let mid = input.len()/2;
let left = _merge_sort(& input[0..mid]);
let right = _merge_sort(& input[mid..]);
Box::new(merge(left, right))
}
}
}
fn merge_sort<T:Ord>(input: &[T])->Box<dyn '_+Iterator<Item=&T>> {
_merge_sort(&split_runs(input))
}
Alternatively, a non-recursive version:
fn split_runs<T:Ord>(mut input: &[T])->impl '_ + Iterator<Item = impl '_+Iterator<Item=&T>> {
std::iter::from_fn(move || match input.len() {
0 => return None,
_ => {
for i in 1..input.len() {
if input[i] < input[i-1] {
let result = Some(input[..i].iter());
input = &input[i..];
return result
}
}
let result = Some(input.iter());
input = &[];
return result
}
})
}
fn merge_sort<'a, T:Ord>(input: &'a [T])->Box<dyn 'a + Iterator<Item=&'a T>> {
let mut lists:VecDeque<_> = split_runs(input)
.map(|x| Box::new(x) as _)
.collect();
while lists.len() > 1 {
let a = lists.pop_front().unwrap();
let b = lists.pop_front().unwrap();
lists.push_back(Box::new(merge(a,b)) as _);
}
lists.pop_front().unwrap_or(Box::new(std::iter::empty()))
}