Beginner Projects that can make me comfortable with rust

Hi,
I am new to Rust.
I need ideas for projects as a beginner.
These ideas need to be easy that help me get into Rust and make me comfortable with rust.
I just want some CLI applications for with them I can learn new stuff.
And if you can, I would like some projects that will prepare me for IoT and embedded systems with rust.

Thanks in Advance~ :slight_smile:

2 Likes

Writing a program to produce a KWIC index could be a reasonable exercise to get familiar with how to manipulate text in Rust.

hmmm I looked up on the internet I couldn't understand it well.

But is it an algorithm to refine the sentence by removing stop words

can you list the stop words or I think I would go with: the, is, a, an, they, and

No, it’s a generator for something like a book index. Think of the file as a set of section titles, one per line.

The idea is to make list of these titles so that the ones which contain any given word (“Rust”, for instance) appear together. This will require each title to appear in the list multiple times, once for each word it contains.

It’s your project; use whatever stopword list you want.

1 Like

So lets have a text first:

The moon has stars
but I know the police is there
lol it was a flower

And then we have stop words array: the, has, but, i , know, the, is , there, it , a , flower
We have to print where the title is occuring in each line from text
so in our case the output will be:

the moon has stars
but I know the police is here
lol it was a flower

What I will do I will turn the full list to lowercase() and the TITLES are CAPS

AM I CORRECT

For that text, the output would be something like this:

  lol it was a FLOWER               (3)
               LOL it was a flower  (3)
           the MOON has stars       (1)
but I know the POLICE is here       (2)
  the moon has STARS                (1)

Notice that the keywords are all lined up vertically and in alphabetical order. The lines that have muliple keywords are duplicated, and the originl line number is also reported.

This lets you quickly find (by hand!) where that mention of “police” was in the original file.

Thank You very much for explaining this new concept to me

I will let you know when I am done with the solution
I will try this in python3 and also in rust

And anymore ideas, if yes then load my To-Do up

Actually I just want to make my self comfortable with RUST

Thanks :slight_smile:

I finshed it but I didnt to the aligning part or the alphabetical order

That's a pretty good first attempt. Solving this problem is a lot easier if you use something more expressive than Vec<String> to store the intermediate results. You could, for instance, keep the words from individual lines separate by storing them in a vector of vectors (Vec<Vec<String>>). You then wouldn't need to re-detect the end of the line later.

For comparison, here's my solution.

One of my first exercises in Rust was to find all the anagrams in the "british-english-insane" word list Debian package.

As a Rust neophyte my code looked like C written in Rust. I should really go back and oxidize it properly.

1 Like

How did you do that, finding anagrams are so hard in real life and you did that in RUST

I suppose the algorithm is using nested loops and turning each word word into an array or vec and then sorting them and then checking if they are same, I Will do that but I need some words or data for it

Ha! Like I said, it's written in Rust but it uses Rust as if it were C with a bit different syntax. Being new to Rust and not having read The Book properly yet I even created a structure to represent 'slices' rather than using actual Rust slices. Call it "idiotomatic" rather than "idiomatic".

Sorting the letters of the worlds like that is the traditional approach. I tried something different.

From each word in the dictionary I create what I call a 'prime hash' a 64 bit integer number that will represent the word. That integer is created by:

  1. There is a map of all the letters of the alphabet to a prime number. So that 'a' => 2, 'b' => 3, 'c' => 5... z => 101. This is just a static array.

  2. Starting with a prime has of 1, for each letter of a word multiply the hash by the prime number the letter maps to.

  3. When all the letters are done the prime hash is now a number that is unique to all combinations of the letters that make up the word. This is true because of the "Fundamental Theorem of Arithmetic" which states:

Any integer greater than 1 is either a prime number , or can be written as a unique product of prime numbers (ignoring the order).

That sounds very grand but you have likely come across it in highschool maths.

Now, to find anagrams becomes easy. All we have to do is to try and store the words of the input into a HashMap using their "prime hash" as the key and the word as the value. If the HashMap already has that key we know we have found an anagram.

To avoid copying data around the entire word list is held in a byte array. Which is scanned byte by byte, any line ending found indicates the end of a word, so now the word can be remembered just by storing it's start and end index in that array. There is only a single loop in my algorithm.

Code is here: insane-british-anagram-rust/main.rs at master · ZiCog/insane-british-anagram-rust · GitHub. If you dare to look. It's pretty simple but really bad Rust style.

If you can't get the Debian insane word list there are word lists on the net of course:
https://www.curlewcommunications.uk/wordlist.html

1 Like

@ZiCog hahahaha please give me a simple idea mate.
I looked at your code and believe me I don't know many things from your code
I am just a beginner
And asking for projects

An anagram finder doesn't have to be as complicated as @ZiCog's. This also works, though probably isn't as fast:

fn anagrams(words: impl Iterator<Item = String>)->Vec<Vec<String>> {
    let mut groups: BTreeMap<Vec<char>, Vec<String>> = BTreeMap::new();
    for word in words {
        let mut letters: Vec<char> = word.chars().collect();
        letters.sort();
        groups.entry(letters).or_default().push(word);
    }
    groups.values().cloned().collect()
}

(Playground)

1 Like

Hey @2e71828 Thanks for sharing but can You explain the bnmtreemap and the whole anagram function, Like just add comments written in easy words through out the program, It will help me learn more.

And Thanks for you time

A BTreeMap is a lot like a Python dictionary: It stores key-value pairs, and makes it easy to find the value for any particular key. In this case, the key is the sorted letters (as a Vec<char>) and the value is a vector of the words made of those letters.

fn anagrams(words: impl Iterator<Item = String>)->Vec<Vec<String>> {
    // Create temporary storage.
    //    Key: sorted letters
    //    Value: words that contain those letters
    let mut groups: BTreeMap<Vec<char>, Vec<String>> = BTreeMap::new();

    for word in words {
        // Get the letters from word as a vector, so that they can be sorted
        let mut letters: Vec<char> = word.chars().collect();
        // Sort the letters vector in-place
        letters.sort();
        // Find the dictionary entry for this list of letters
        groups.entry(letters)
              // If it doesn't exist yet, put an empty word list there
              .or_default()
              // Add the word to the end of the list
              .push(word);
    }

    // Get an iterator that visits a reference to each value
    groups.values()
          // Turn those reference into owned objects (via clone)
          .cloned()
          // Put all of the items from the iterator into a new vector
          .collect()
          // No semicolon here, so the vector is returned from the function
}

Check https://exercism.io/

3 Likes

Sorry. I warned you. That is really bad Rust. The idea is simple enough. But your original word sorting idea is probably better. What I have is limited to ASCII rather than Unicode input. It will also break down if the words get too long due to that prime number thing I'm doing overflowing eventually.

Quite so.

When I wrote that it was a little challenge from some C/C++ guys. So it had to at least match their performance. Which it did. I'll have to pit it against your Rustic solution when I get a moment.

Note that my solution above is optimized for clarity and not speed-- I didn't want to complicate matters with things like closures and lifetime annotations.

1 Like