Validate User Input against predefined "list"

Yandros · April 21, 2019, 2:10pm

Ok, time to explain this sentence:

Linear search

You start your post with:

More precisely,

we define a sequence of valid commands (you mention a list, but since mutation of the list does not seem to be required, we can use Python's immutable sequence, i.e., a tuple):
```
valid_services_sequence = (
     "YES", "NO", "TRUE", "FALSE", "ON", "OFF", "0", "1",
)
```

and we can then test the validity of a command with

command in valid_services_sequence

Python's in operator is sugar for a call to the __contains__ method:
```
valid_services_sequence.__contains__(command)
```

in the case of a sequence like list or tuple, this in turn becomes:

any(command == valid_command for valid_command in valid_services_sequence)

and the equivalent Rust code is:

/// `static` is more performant than `let`
static valid_services_sequence: &[&'static str] = &[
    "YES", "NO", "TRUE", "FALSE", "ON", "OFF", "0", "1",
];

followed by

either:

valid_services_sequence.contains(&command);

or, equivalently:

valid_services_sequence.iter().any(|&valid_command| command == valid_command);

which unrolls to:

match &command {
    | "YES" | "NO" | "TRUE" | "FALSE" | "ON" | "OFF" | "0" | "1" => true,
    | _ => false,
}

This is called a linear search: the (average) cost of the search evolves linearly with the amount of valid commands. In other words, if we double the amount of valid commands, it takes (on average) twice as long to know whether a given command is valid or not.

Now, given your example, with a sample size of only 8 valid commands, this is completely fine and actually the fastest way.

Using a `Set`

However, for people having a need similar to yours, but where their amout of "valid answers" is gigantic,
there is a then faster solution: using a Set. This is a collection where its elements are laid out in memory so that this very test (.contains(&element)) is much faster:

when implemented as an Ordered tree, its (axiomatic) cost is logarithmic w.r.t. the amount of valid commands: to double the processing time of a thousand elements the set would need to contain a million elements (vs two thousands in the linear case);
- this is almost equivalent to having a sorted sequence to begin with, and using a binary search to test the presence of an element;
when implemented as a HashSet, the cost does not depend on the number of elements, except when hash collisions are involved.
- In Python to create a (hash) set all you need to do is use braces instead of brackets or parenthesis:
```
valid_services_set = {
    "YES", "NO", "TRUE", "FALSE", "ON", "OFF", "0", "1",
}
```

To fix this issue with collisions, that prevents the cost of a contains(&element) test from truly ignoring the number of elements in the set, there exist perfect hash sets (and maps) based on perfect hash functions. These are only usable when the elements to be put in the set are known beforehand. Which is the case here, hence my suggesting:

Yandros:

For those looking for a more scalable solution ( i.e. , avoiding a linear search), you can use a Set using perfect hash functions , thanks to the ::phf crate:
#![feature(proc_macro_hygiene)] // currently requires nightly, should be fixed soon

fn is_valid_command (s: &'_ str) -> bool
{
    static VALID_COMMANDS: ::phf::Set<&'static str> = ::phf::phf_set! {
        "YES", "NO", "TRUE", "FALSE", "ON", "OFF", "0", "1",
    };

    VALID_COMMANDS.contains(s)
}

by the way, a pull request is on its way to make phf_set! work on stable

Topic		Replies	Views
Match case review, for new user code review	9	466	December 29, 2022
A crate that makes getting command line input easy announcements	6	2216	January 12, 2023
Checking Parts of Strings code review	8	500	September 1, 2022
Check symmetry in list help	3	449	September 26, 2020
Do two things in Result matching code review	4	499	June 19, 2020

Validate User Input against predefined "list"

Linear search

Using a Set

Related Topics

Using a `Set`