Elasticsearch is a powerful and flexible open source, distributed, real-time search, and analytics engine. It’s the gold standard in search technologies. It’s built on top of the Apache Lucene library and provides a robust, RESTful HTTP interface for querying and indexing massive amounts of structured data. Right out of the box, it provides efficient, scalable and robust search. A main characteristic of Elasticsearch is that it’s distributed at its core, meaning you can easily scale it horizontally for the sake of redundancy or performance.
Elasticsearch can also be used as data store engine, but it has some disadvantages:
In Elasticsearch, data is available in "near real time." Meaning, when you submit a comment to a post and then refresh the page, it perhaps won't show up as the index is still updating.
Elasticsearch doesn't give any access control system or internal security. You will need a firewall to protect ES from external access.
There is also limited support for advanced computation on the database side.
Backups are not as much of a priority as in other data store solutions, although ES is distributed and relatively stable. If Elasticsearch is your primary data store, you might want to rethink.
What you need to know about Elasticsearch:
Elasticsearch is document oriented. It uses documents to stores its entire objects. These documents are indexed to make them searchable. Here, a document belongs to a type, while a type belongs to an index. Every ES Index could be split into numerous pieces known as shards. When your required storage volume exceeds the abilities of a single node or server, this is done. It can also be used to increase the performance of your ES cluster by parallelizing operations across shards.
The moment an index is created, the number of primary shards are fixed. This basically defines the maximum amount of data stored in your index, since each node is bound by the amount of CPU, RAM, and I/Os it can possess to fit at least a single shard. You are allowed to change the number of replica shards only after the index has been created, though this only affects the throughput of your cluster and not the actual data storage capabilities.
Do you really need Elasticsearch?
Often, the problem you're trying to solve can a lot more easily be solved with some advanced SQL queries and perhaps some new indexes. There are generally two cases where using Elasticsearch makes sense.
Full-text searching is what Elasticsearch was built for and for denormalizing complex data. More often than not, we try to normalize our data to match up with our models. But we might run into some performance issues doing this when querying across the normalized tables. Using a search technology like Elasticsearch makes denormalizing that data and retrieving it much faster.
If you do need Elasticsearch, the instructions below should help you get started:
Follow the installation instructions provided by Elasticsearch.
If this happens to be the first Elasticsearch index you are adding to the project, you'll have to install the elasticsearch-rails and elasticsearch-model gems.
To configure the gem, you'll need an initializer file to set up the Elasticsearch Client that the models use.
Set up the index on the model once the rails project is all ready.
The obvious first step when setting up the index is to configure the index on the model you want to search. However, the recommended approach is to encapsulate the indexing logic into an ActiveSupport:: Concern which can be included in your model. Remember, here you only want to index the fields/relations that you require the search indexing on. The more fields you index, the bigger the index will be in Elasticsearch. This means you'll need more powerful hardware and the larger the index is, the longer the search query will take. Next, decide how many shards will hold the data. One primary shard is recommended if it's not expected to be a huge amount of data.
Once you have the index defined, it's time to start querying. Remember, before you can start querying, you need to index some data. The elasticsearch-model gem has given us a powerful search interface to the RESTful Elasticsearch API. You can query with any of the API endpoints mentioned in the Elasticsearch documentation.
By just including Elasticsearch:: Model:: Callbacks, automatic Callbacks can be set up. Custom Callbacks can be easily defined, and you can control when you really want the indexes to be updated.
Lastly, take the time out to write a faster import. The built-in import is not very efficient.
This is not the only way to use the elasticsearch-rails gem with your project, but this method worked out pretty well for us.
An alternative to the elasticsearch-model gem is the Chewy Gem. It adds on to the Elasticsearch-Ruby client, making it more powerful and giving us tighter integration with Rails. It’s relatively easy to use, and you will definitely like the ActiveRecord like API. We went with the elasticsearch-model gem because it gave us direct access to "low level" plain queries to Elasticsearch and so we were easily able to customize our queries.