When building a scalable project such as a website or an app, at a certain point you need to add a search engine to find and retrieve a piece of information from a large amount of data. Now, there are two different ways you can do this - by using a pre-built search engine to add to your project or creating your own search engine from scratch. As you can guess, the latter can be quite difficult, thus it is better to use a pre-defined search engine that specializes in indexing large amounts of data.
Such softwares include Algolia, ElasticSearch, Typesense, etc.
In this article, we’ll be exploring Typesense with python.
What is Typesense
Typesense is a “modern, privacy-friendly, open-source” search engine that is typo-tolerant and provides a ‘search-as-you-type’ experience, reduces the time to perform efficient searches, and offers a ‘batteries-included’ developer experience.
It requires minimum setup and provides relevant search results out-of-the-box, hence the term ‘batteries-included’.
Typesense is an open-source and easy-to-use alternative to other tools such as Algolia and ElasticSearch. It can even be considered a combination of the two. For an in-depth study of how Typesense compares with the other search engine tools, check out ‘Typesense vs Algolia vs Elasticsearch vs Meilisearch Comparison’
Typesense provides tons of libraries for JavaScript, Python, PHP, Ruby, C#, Dart, Java, Go, Rust, etc., some of which are official and some developed by the community. Throughout this article, we’ll be working with the Python official library to explore various functionalities of Typesense.
Installing Typesense
There are basically two ways you can install and run Typesense on your machine.
1. Typesense Cloud
The first and easiest way is to use the Typesense Cloud.
→ Sign in with your Github account.
→ Choose your desired configuration and click on ‘launch’. After a few minutes, you’ll have your cluster ready.
→ Then click on ‘Generate API Key’ and use the provided hostnames and API keys in your code.
2. Local Machine/Self-hosting
Typesense can also be run on your local machine or self-hosting.
You can download DEB, RPM, and pre-built binaries for Linux (X86_64) and macOS here.
For official Docker images for Typesense, go to Docker hub.
Download & Install
I will be downloading the pre-built binary for mac, thus below commands will only work for a macOS machine. For other systems, refer to the official typesense documentation
Open terminal and type in the given commands
curl -O <https://dl.typesense.org/releases/0.22.2/typesense-server-0.22.2-darwin-amd64.tar.gz>
tar -xzf typesense-server-0.22.2-darwin-amd64.tar.gz
Starting Typesense
After downloading the pre-built binary for Mac, you can start typesense like this
export TYPESENSE_API_KEY=xyz
mkdir /tmp/typesense-data
./typesense-server --data-dir=/tmp/typesense-data --api-key=$TYPESENSE_API_KEY --enable-cors
→ Here, our API key is ‘xyz’
Run this command to verify the server is successfully set up and ready to accept requests
curl <http://localhost:8108/health>
Since we’ll be working with python, we need to install the client library for python like this
pip install typesense
Exploring Typesense
Now that we’ve successfully installed and set up typesense, we can start exploring different features of typesense.
For our collection, we’ll be using the books dataset provided in the official documentation. You can download it by clicking here.
Initializing client
First, we need to initialize the client by pointing it to a typesense node.
Open an IDE and type the given code, or if you're using Typesense Cloud, click on the "Generate API key" button on the cluster page. This will give you a set of hostnames and API keys to use.
import typesense
client = typesense.Client({
'nodes': [{
'host': 'localhost', # For Typesense Cloud use xxx.a1.typesense.net
'port': '8108', # For Typesense Cloud use 443
'protocol': 'http' # For Typesense Cloud use https
}],
'api_key': '<API_KEY>',
'connection_timeout_seconds': 2
})
→ Here, replace <API_KEY>
with the API key we used to start the server(i.e., xyz)
Creating and adding to a collection
A collection in Typesense is similar to a table in a relational database.
We can create a collection like this
books_schema = {
'name': 'books',
'fields': [
{'name': 'title', 'type': 'string' },
{'name': 'authors', 'type': 'string[]', 'facet': True },
{'name': 'publication_year', 'type': 'int32', 'facet': True },
{'name': 'ratings_count', 'type': 'int32' },
{'name': 'average_rating', 'type': 'float' }
],
'default_sorting_field': 'ratings_count'
}
client.collections.create(books_schema)
Here, we define a collection as ‘books_schema’, give it the name ‘books’, and describe the fields that it’ll include. These fields will be indexed when a document is added to the collection.
We give the ‘name’ to each of the fields, describe its ‘type’ and whether it is a ‘facet’. If a field is a facet, the search results can be clustered together into different categories.
We also added a default_sorting_field
which sorts the search results in a certain manner, when no sorting rule is provided.
Now it’s time to add books in the collection we just created. We can do this by opening the ‘books.jsonl’ file we downloaded earlier and importing it into our collection.
with open('books.jsonl') as books_dataset:
client.collections['books'].documents.import_(books_dataset.read().encode('utf-8'))
Our collection of books is ready. Now we can try searching for books and test different functions of typesense.
Searching the collection
Now let’s create a simple python program to perform search operations on our collection.
# Inputs a query(book name)
query = str(input("Search for a book: "))
print("\\n")
search_parameters = {
'q' : query,
'query_by' : 'title',
'sort_by' : 'publication_year:asc'
}
In the above code, we define the parameters for the search operation. We provide a query
(book name) to search in the collection, query_by
describes the field of the query (title, authors, publication_date, etc.), and provide a sort_by
clause to sort the search results by publication_year
in ascending order.
We can change query_by
to ‘authors’ if we want to search a book by its authors’ names, and similarly for other fields.
Store the results in a variable
result = client.collections['books'].documents.search(search_parameters)
...and print the result
print(result)
After running the program, we get an output like this
But this looks messy, and the relevant information is quite difficult to read.
To solve this issue, we can print only the information we need
keys = list(result['hits'][0]['document'].keys())
for i in range(len(result['hits'])):
c=0
for j in result['hits'][i]['document']:
print(keys[c] + ": ", result['hits'][i]['document'][j])
c+=1
print("\\n\\n")
The above code is just basic python, it iterates through the whole dictionary we get as output and prints the important information in a much more readable format like this
Cool! We got all the results for our book from the collection.
Now let’s try to filter these results to only those published before 1999, using the filter_by
clause.
Also, let’s intentionally make a typo in the search query to see the typo-tolerance ability of Typesense in action.
search_parameters = {
'q' : query,
'query_by' : 'title',
'filter_by' : 'publication_year:<1999',
'sort_by' : 'publication_year:asc'
}
Output:
You can see how well it handles the typo and returns the search results for ‘Harry Potter’.
Let’s try one more feature- faceting.
search_parameters = {
'q' : query,
'query_by' : 'title',
'facet_by' : 'authors',
'filter_by' : 'publication_year:<1999',
'sort_by' : 'publication_year:asc'
}
Here, the facet_by
clause gives us the number of books written by the authors in the search results. Since the facet results we get are also not easy to understand, we print it clearly like this
keys = list(result['hits'][0]['document'].keys())
for i in range(len(result['hits'])):
c=0
for j in result['hits'][i]['document']:
print(keys[c] + ": ", result['hits'][i]['document'][j])
c+=1
for k in result['facet_counts'][0]['counts']:
if k['value'] in result['hits'][i]['document']['authors']:
print(f"No. of books written by {k['value']}: {k['count']}")
print("\\n\\n")
Output:
As you can see the search results display the number of books written by each author.
Reference implementations
This article includes just a part of what Typesense is capable of. To see the Typesense search engine in action, you can check out these live implementations of the search engine here.
Conclusion
This concludes our introduction on Typesense. This tutorial aims to familiarize you with the basic functionalities of Typesense to get you started. For more advanced features and detailed documentation, refer to the official Typesense resources.
Go give it a try and share your thoughts and review about Typesense in the comments.
FOR MORE, FOLLOW ME ON TWITTER
Hope you found this article helpful. See you in the next one!