Efficiency: Multiple loops vs. multiple SQL-queries?

Hello

I have a website with multiple restaurants. Each restaurant can have multiple categories like "pizza", "kebab", "vegan",...

pub struct Restaurant {
    name: String,
    //...
    categories: std::vec::Vec<Category>,
}

pub enum Category {
    Pizza,
    Kebab,
    Vegan,
}

Now I have a webpage showing all the existing categories and the restaurants in that category.

E.g:

<h2>Pizza</h2>
<div class="restaurant">
    <span>Restaurant A</span>
</div>
<div class="restaurant">
    <span>Restaurant B</span>
</div>

<h2>Vegan</h2>
<div class="restaurant">
    <span>Restaurant A</span>
</div>
<div class="restaurant">
    <span>Restaurant C</span>
</div>

Of course the HTML is generated.

The problem I have is that it is a expensive calculation to display all the restaurants in the correct category. Because you first have a loop for displaying all the categories, and in that loop you need to loop over all the restaurants to check if restaurant.categories.contains(current_category_of_loop).

Currently I do the following:

  1. Get all restaurants from database in a vec.
  2. Loop over each category.
    a. In this loop, loop over all restaurants.
    b. if restaurant.category.contains(category_of_first_loop) -> display

I also tried the following:

  1. Create a hashmap HashMap<Category, std::vec::Vec<Restaurant>
  2. Loop over all categories.
    a. SQL Query: SELECT * FROM restaurants WHERE category=category_of_loop.
  3. Now I have a HashMap containing all the restaurants sorted by their category as key.
  4. In the Askama/HTML-template:
for (category, restaurants) in my_hashmap {
    <h2>{{ category }}</h2>

    for restaurant in restaurants {
        <div class="restaurant">
            <span>{{ restaurant.name }}</span>
        </div>
    }
}

I do not know what is the fastest:

  • Requesting all the restaurants once from the database and then looping over all those restaurants for each new category.
  • Or making a SQL-query for each category to get the restaurants in that category, so I only have to loop over the categories but not over all the restaurants.

Questions:

  1. What is the most efficient way?
  2. Is there a better solution?
  3. Would the algorithm change if each restaurant could have only one category?
  4. Does the best solution depend on the amount of restaurants? Now I have 20, later maybe 100?

Thanks!

Why don't you benchmark it, then?


If I had to guess: likely making multiple SQL queries would be slower. Especially if you are using a client-server database (which everyone and their grandmother seems to do nowadays, whether or not it's warranted), a single query can take multiple milliseconds.

Almost certainly. Databases are built for filtering and aggregating data efficiently. If you are post-processing the results of an SQL query using hashmaps, then you are likely doing something wrong.

Currently, your pseudo-SQL does suggest that there is only one category (there's a single restaurants.category column which you equate with a category). In this case, you could just ORDER BY category and display the results one-by-one, and they would naturally be grouped together by category. You would then traverse the result set with a single loop and emit a category header whenever the category changes.

If you do have multiple categories, then you could still proceed as above, but you'd have to make sure that every restaurant is listed multiple times, once for each category. If you have designed your database correctly, then this will involve a JOIN (likely a left or right outer join) of the restaurants table with a categories table.

4 Likes

As @H2CO3 said, the easiest way to find out is to benchmark it. On the other hand, you might be interested in learning about Big O notation. It will help you to better understand the performance implications of your algorithms and the data structures that you use.

If you have designed your database correctly, then this will involve a JOIN (likely a left or right outer join) of the restaurants table with a categories table.

Actually, even if there's only one category per restaurant you would have a join to a categories table in a correctly designed database.

The category may be nothing more than a simple enum, in which case it's atomic and it can just be a single column. That's not possible within a single table (without denormalizing Restaurant) when multiple categories may correspond to a given restaurant, even if the category is an atomic value.

We're putting OP on the right track though. This is definitely a job for SQL:

With a "normal" (as I understand it, it's a complicated topic!) database where a mapping table is used for the many-many relationship:

SELECT restaurant.*, category.name as category FROM restaurant JOIN restaurant_category_map map ON restaurant.id = map.restaurant_id JOIN category ON category.id = map.category_id

Once you've got that as a Vec of some RestaurantCategoriesRow struct,
Then it should be easy to get the HashMap you want:

let results = db.execute(/* above query*/)
    .map(|row| (row.category, row.restaurant...)
    .collect();

That's the gist anyway!

If you don't have a mapping table it gets trickier.

If it's fully your website and early days it should be worth migrating the database schema to fix that. If so, I'm curious what the DB schema is like currently: a comma separated list of categories in the category column? A separate row for each category, restaurant pair!?

I'd do it like this with a reasonably normalized database and a single query that gets all data, where you can also easily detect where to insert category headers.

Wow, sorry to derail but what magic is this?

Just listing tables like they're fields... Is it doing some sort of implicit join? Why am I getting the feeling I've been writing SQL like a dinosaur? :smiley:

Yes, you can do implicit cross joins, depending on your rdbms you're using. Personally though, I much prefer the explicit join clause. Makes it a lot harder to accidently forget to specify foreign key relationships.

2 Likes

It's a full cross join (cartesian products). I find the relationship between NULL's and the different kind of joins confusing.

Every other join is just a subset of the cartesian product anyway.