Empowering E-commerce With Efficient Product Search

Building a scalable product search functionality with Algolia

Posted by Bishal Sarker on 22/12/2024

Searching is one the most important features for an e-commerce website because this is how the customers find their desired products. So the more the results are accurate the better customer experience you will gain. It has a major impact on the business as well. Implementation of product search feature varies from site to site because not every e-comm website needs a complex search functionality. It depends on how many product catalog are there and how much traffic the site handles. Today I'll share an experience where I'll take you to a journey that how I had built a search module for a major e-commerce website which has 5000+ product catalogs and gets thousands of traffic per day.


Building The Search API

Let's think of a basic search functionality where our backend provides an API, For example: https://api.mydomain.com/search?q={"keyword": "cool jackets", "categories":["women"], "styles":["half-sleeve"]}

We will need to design this API in a way so that we can get the product list with different criteria, for example:

  1. Keyword (cool jackets, half sleeve shirts etc.)
  2. Sorting (new arrival, best sellers, price etc.)
  3. Filtering by

a. Category (shoes, shirts, pants etc.)

b. Brand (nike, puma etc.)

c. Size (30, 40, XL, XXL etc.)

d. Gender (male, female, unisex etc.)

e. Color (red, blue, green etc.)

f. Style (backpack, formal shoes, high heels etc.)

g. Price range

Suppose we got a search request and after receiving it our backend will run a database query with these criteria in our main database. It'll then prepare the response with the database query results and send it back to the client. For example, here a MongoDB query for "Products" collections:

db.Products.find({
  $text: { $search: 'cool jackets' },
  "categoryIds": {
    "$in": [
      "1424255363"
    ]
  },
  "styleIds": {
    "$in": [
      "9928276641"
    ]
  }
}).limit(20);

This is the easiest way to filter out the products based on the query criteria. Quite alright for websites with few products, catalog, criteria and lower traffic. But for the opposite it will create issues like:

  1. Large number of main database access which may slower the whole system.
  2. For large number of products the search results will take a significant amount of time which will effect customer experience.
  3. Based on the number of criteria the database query can get very complex which will raise higher possibility for bugs.
  4. Can add more database queries for resolving product and criteria information.
  5. and goes on...

To overcome these issues one of the feasible solution is to create a separate search index database where products will be organized by search criteria. So, what's a search index? If you’ve used reference books such as encyclopedias, you’re familiar with the concept of an index. To find information in an encyclopedia, you typically start by flipping to the index in the back, where the topics are organized in alphabetical order with their respective page numbers. A search index database works in a similar way. It makes searching easier and faster. You don't have to build it from the scratch. There are many tools out there. We have picked Algolia Search Engine for our solution. It's simpler and most importantly it's a cloud hosted and managed solution so we don't have to think about scalability as well. There are few more points on why we are choosing a search engine over querying a traditional database:

  1. Term and field weights
  2. Text normalization and processing
  3. Faceting
  4. Highlighting


Designing An Optimized Search API

We need to create product index models from "Products" collection in our main database and save them to the index database in Algolia. Then we need to let Algolia know that which product properties are going to be used for querying and faceting. You can think of facet as the product criteria/property. After that all we have to do is to run a search query with keyword, filters, facets, pagination etc. in the Algolia. Our search system will have two major responsibility:

  1. Syncing between main database products and product index models (Indexing): Whenever we change any information in our main database products we need to re-index those products in Algolia index database as well. Otherwise, customers wouldn't get the the updated product information. We can maintain this by identifying the product triggers for example, product creation, update, delete, product category update, product brand update etc. This triggers will send an event to our search module to notify that a particular product has been updated. Our search module will fetch the new updated product from main database, prepare the updated product index model and re-index it to Algolia.
  2. Querying index database and resolve product information: When a client sends a search request through API our search module will fetch the results from Algolia index database and resolve necessary product and criteria information from our main database. Afterwards, it will return the product results through API as a response.


Handling Sorting

We can always keep a sorted index in Algolia. But as we have multiple sorting criteria we can use Algolia Replicas. So there will be multiple replicas for our main search index database, for example, new_arrivals_search_index, best_sellers_search_index etc. Algolia automatically syncs between these replicas so when the main database is updated the replicas are updated as well but as per our sorting key. So when we get a request that has a sorting criteria we just need to let know Algolia which index database it will look for.


Engineering behind search optimization

Our search infrastructure is ready but still there are places of improvements too. Because still we have issues like:

  1. Large number of database calls for resolving index models per request
  2. Slow response time for accessing database multiple times

Why and what we are resolving actually? See we are storing the facet references of category, brand, color, styles in the index database so that we can filter them correctly. But when we get the result from Algolia it's just the facet ids or unique names and it's not meaningful to show them to the customers. We need to resolve it's names and other information from the main database before sending the response back to client. So, what about keeping these already resolved? Our idea is to create a new collection in our main database where all these information will already be resolved while we are indexing products in Algolia. We will then make just single call to the database to get all the refences and map it with index results on the fly. As these resolved references don't get updated frequently we can use caching. Caching is faster and it will make the performance better.




Moreover, using an API gateway (eg. AWS API Gateway) will help us to balance the load of traffic. Finally we got a Search API that is scalable, managed, fast and doesn't make our customers wait too long for their desired results. Let me know about your thoughts over this.