How to use From?

use std::fs::File;
use std::io::{BufRead, BufReader, Error, ErrorKind, Read};

pub fn read_to_int64_vec(path: &str,approx_cap:usize) -> Result<Vec<i64> ,Error>{

		let mut v:Vec<i64> = Vec::with_capacity(approx_cap);
		let file_handle = File::open(path).expect("file open error");
		let iterator_file_handle = BufReader::new(file_handle);

		for line in iterator_file_handle.lines() {
			let line =  line ?;
			let n = line.trim().parse().expect("some failure during conversion of data"); 
			v.push(n);
		}
		Ok(v)
}

pub fn read_to_string_vec_as_whole_sentences(path: &str, approx_cap:usize) -> Result<Vec<String>,Error>{

		let mut v:Vec<String> = Vec::with_capacity(approx_cap);
		let file_handle = File::open(path).expect("file open error");
		let iterator_file_handle = BufReader::new(file_handle);

		for line in iterator_file_handle.lines() {

			let line = line?;
			 v.push(line);
			}
		Ok(v)
}

pub fn read_to_string_vec_word_by_word(path: &str,approx_cap:usize) -> Result<Vec<String>,Error> {

	let sentence_vector : Vec<String> = read_to_string_vec_as_whole_sentences(path,approx_cap)?;
	let mut word_vector = Vec::new();

	for sentence in sentence_vector.iter() {

		let sentence_as_bytes =  sentence.as_bytes();

		for (i,&bytes_of_sentence) in sentence_as_bytes.iter().enumerate() {
			if bytes_of_sentence == b' '|| bytes_of_sentence ==  b',' {
				let f = i ;
				let j = 0;
				let temp_word = &sentence[j..i] ;
				let temp_word = temp_word as String;
				word_vector.push(temp_word);
				j = i ;
			}
		}
	}
	Ok(word_vector)

}

error is :

C:\Users\Dell\Desktop\work_space\rust\lib\dbLibs>cargo build
   Compiling dbLibs v0.1.0 (file:///C:/Users/Dell/Desktop/work_space/rust/lib/dbLibs)
warning: unused imports: `ErrorKind`, `Read`
 --> src\fxx.rs:2:42
  |
2 | use std::io::{BufRead, BufReader, Error, ErrorKind, Read};
  |                                          ^^^^^^^^^  ^^^^
  |
  = note: #[warn(unused_imports)] on by default

error[E0605]: non-primitive cast: `&str` as `std::string::String`
  --> src\fxx.rs:46:21
   |
46 |                 let temp_word = temp_word as String;
   |                                 ^^^^^^^^^^^^^^^^^^^
   |
   = note: an `as` expression can only be used to convert between primitive types. Consider using the `From` trait

error: aborting due to previous error

For more information about this error, try `rustc --explain E0605`.
error: Could not compile `dbLibs`.

To learn more, run the command again with --verbose.

Replace this:

let temp_word = temp_word as String;
word_vector.push(temp_word);

With either this:

word_vector.push(String::from(temp_word));

Or this, from the very closely related Into trait:

word_vector.push(temp_word.into());

Here's a good blog post on the From and Into traits - they're extremely handy :slight_smile:

(An aside: I've been using String::from for ages without clocking that it was an implementation of From :man_facepalming:)

3 Likes

In addition to what @17cupsofcoffee says, I see two problems here, and solving the bigger one will make the smaller one go away :slight_smile:

The smaller one, that you ask for, temp_word as String;

When converting "arbitrary" bytes (like your sentence slice) to a String, you need to do a checked conversion, with String::from_utf8, because Rust Strings guarantee that they are valid UTF8, and your bytes might be anything.

String::from_utf8(&sentence[j..i])

Note that this can fail, if your 'sentence' is not valid utf8 (plain ASCII is, "accidentally", always valid UTF8 too, this is on purpose by the smart people at the Unicode consortium).


The bigger problem I see is the way you iterate; You start with sentence_vector: Vec<String>, and then throw away the information that this is a valid String by downcasting it to raw bytes with as_bytes(), and then checking if that byte matches two characters-as-bytes.
This is more work for you, and "loses" information that you are working with text, that you now have to add back in.

I'd suggest to avoid byte-casting completely, and use the built-in String::split method, instead.
It does exactly what you want, with less coding for you, and is probably better optimised!
You can also split by expressions/closures, to check for multiple separators; from the linked documentation:
"abc1defXghi".split(|c| c == '1' || c == 'X').collect()

If you do this, you shouldn't need as_bytes anymore, and thus also not String::from(), at all! :smile:

2 Likes

let me try, thanks for the tip, it might help me get a better grasp of functional programming
ill get back after trying

Good luck!

Yeah, Rust can be confusing that way :smiley:

It took me ages to realise that all the "interesting" methods where hidden under the "trait implementations" header in the docs, which I used to ignore, rather than on the structs directly.
A decade of javadac taught me that those "random interfaces" are usually not what you're looking for. Rust sure made me unlearn that!
everything seems to be handled generically, and all the useful tidbits seem to hide under either impl Iterator, impl From or impl Into, which leads to such beautiful abstractions!

1 Like

i did try your idea , because its really cool , but i might not be good enough to implement it

here is the error :

error[E0597]: `sentence` does not live long enough
  --> src\fxx.rs:62:26
   |
62 |                 let tmp :Vec<&str> = sentence.split( |c| c == ' ' || c == ',').collect();
   |                                      ^^^^^^^^ borrowed value does not live long enough
...
66 |             }
   |             - borrowed value only lives until here
   |
note: borrowed value must be valid for the anonymous lifetime #1 defined on the function body at 56:1... 

code is :
pub fn read_to_string_vec_word_by_word_string_split(path: &str,approx_cap:usize) -> Result<Vec<&str>,Error> {

	let sentence_vector : Vec<String> = read_to_string_vec_as_whole_sentences(path,approx_cap)?;
	let mut final_vector :Vec<&str> = Vec::with_capacity(10000);
			for sentence in sentence_vector {
	
				let tmp :Vec<&str> = sentence.split( |c| c == ' ' || c == ',').collect();
					for input in tmp {
						final_vector.push(input);
					}
			}

	Ok(final_vector)
}

You’re close. Let me offer you a more functional approach since you mentioned you wanted to learn that:

fn read_to_string_vec_word_by_word_string_split(
    path: &str,
    approx_cap: usize,
) -> Result<Vec<String>, Error> {
    let sentence_vector: Vec<String> = read_to_string_vec_as_whole_sentences(path, approx_cap)?;
    Ok(sentence_vector
// iterate over `&String` borrowed from the vec
        .iter()
// for each &String, split it, flatten the resulting
// iterator of `&str`, and convert the &str
// to a String using its Into::into impl
        .flat_map(|sentence| sentence.split(|c| c == ' ' || c == ',').map(Into::into))
// collect these Strings into the resulting Vec
        .collect())
}

Note that the resulting Vec contains String values, not &str - there’s nowhere to borrow from here such that it lives beyond the function, so we need owned values (String).

2 Likes

i somewhat understand , ill need to dive into map/flat map
ive seen them being used quite a times , but this is a really concise code , more rusty way i'd say
but it will return words of only one sentence a time right?

You forgot the "yet"! :slight_smile:
It's ok, everyone rust programmer on these forums, even the experienced ones, run into lifetime errors, it's Rust's very own rite of passage!

Vitaly already gave an excellent example of a working solution, so I won't repeat that.

This is a good opportunity to learn how to read these errors, because, like everone else who has ever learned rust, you'll see them a couple of hubdred times. This is normal!
I'll try to talk you through it.

62 | let tmp :Vec<&str> = sentence.split( |c| c == ' ' || c == ',').collect();
   |                      ^^^^^^^^ borrowed value does not live long enough

Here, the borrowchecker tells us which value does not live long enough: our sentence: Vec<String>.
So, we look where sentence is created and dropped.
Created: as the loop variable of for on line 60;
Dropped: at the end of each loop iteration, on line 66 (as the error helpfully mentions).

Then, why is it complaining?
We have to follow our usage of 'sentence' through the code to figure that out.

It is important to notice the type of tmp:

62 | let tmp :Vec<&str> = sentence.some_stuff_that_creates_BORROWED_str()

I recommend reading the ampersand '&' as 'borrow' (like "borrowing money from a friend") so tmp: Vec<&str> means "tmp is a vector of borrowed str-slices".
A borrow points at, or references, or locks the thing it borrows from, so tmp is now dependant on sentence.

The 'mistake' is here, which isn't actually pointed out by the error (sadly, the compiler cannot read our minds, so it cannot figure out what our goal is):
for input in tmp { final_vector.push(input); }
We take our borrows-of-sentence, and add them to final_vector of type Vec<&str>, so final_vector now also holds borrows-of-sentence.

Then, the for-sentence loop ends, so sentence is dropped, but final_vector (which holds borrowed references to sentence) still lives, because we want to return it!
In C, this would have been a dangling pointer problem, but Rust saves us just in time!


The important trick that Vitaly did, was to change the type of final_vector, and use Into.
Final_vector is now Vec<String>, so it owns its contents, rather than pointing at/borrowing someone else's strings.
If you own it, you can give it away at to someone else, in our case: return to outside our little function.

The use of Into::into is a bit magical, and very powerful. It says "compiler, here I have something from my iterator (please figure out the type yourself), and I want to collect it as a Vec<String>. Please look for any conversion that is defined on whatever my iterator returns, that turns it into a String".
In our case, this is what changes the borrowed &str of split() into an Owned String ( thanks to an impl Into<String> for &str somewhere in the standard library. This impl simply makes a copy.)

More on the awesome power of From/Into in this blog: Convenient and idiomatic conversions in Rust

5 Likes