Validate User Input against predefined "list"

Ok, time to explain this sentence:

Linear search

You start your post with:

More precisely,

  1. we define a sequence of valid commands (you mention a list, but since mutation of the list does not seem to be required, we can use Python's immutable sequence, i.e., a tuple):

    valid_services_sequence = (
         "YES", "NO", "TRUE", "FALSE", "ON", "OFF", "0", "1",
    )
    
  2. and we can then test the validity of a command with

    command in valid_services_sequence
    
    • Python's in operator is sugar for a call to the __contains__ method:

      valid_services_sequence.__contains__(command)
      
    • in the case of a sequence like list or tuple, this in turn becomes:

      any(command == valid_command for valid_command in valid_services_sequence)
      
    • and the equivalent Rust code is:

      /// `static` is more performant than `let`
      static valid_services_sequence: &[&'static str] = &[
          "YES", "NO", "TRUE", "FALSE", "ON", "OFF", "0", "1",
      ];
      

      followed by

      • either:

        valid_services_sequence.contains(&command);
        
      • or, equivalently:

        valid_services_sequence.iter().any(|&valid_command| command == valid_command);
        
      • which unrolls to:

        match &command {
            | "YES" | "NO" | "TRUE" | "FALSE" | "ON" | "OFF" | "0" | "1" => true,
            | _ => false,
        }
        

This is called a linear search: the (average) cost of the search evolves linearly with the amount of valid commands. In other words, if we double the amount of valid commands, it takes (on average) twice as long to know whether a given command is valid or not.

Now, given your example, with a sample size of only 8 valid commands, this is completely fine and actually the fastest way.

Using a Set

However, for people having a need similar to yours, but where their amout of "valid answers" is gigantic,
there is a then faster solution: using a Set. This is a collection where its elements are laid out in memory so that this very test (.contains(&element)) is much faster:

  • when implemented as an Ordered tree, its (axiomatic) cost is logarithmic w.r.t. the amount of valid commands: to double the processing time of a thousand elements the set would need to contain a million elements (vs two thousands in the linear case);

    • this is almost equivalent to having a sorted sequence to begin with, and using a binary search to test the presence of an element;
  • when implemented as a HashSet, the cost does not depend on the number of elements, except when hash collisions are involved.

    • In Python to create a (hash) set all you need to do is use braces instead of brackets or parenthesis:
      valid_services_set = {
          "YES", "NO", "TRUE", "FALSE", "ON", "OFF", "0", "1",
      }
      

To fix this issue with collisions, that prevents the cost of a contains(&element) test from truly ignoring the number of elements in the set, there exist perfect hash sets (and maps) based on perfect hash functions. These are only usable when the elements to be put in the set are known beforehand. Which is the case here, hence my suggesting:

6 Likes