Lightning fast search with ElasticSearch

App Development LA

It’s not a secret that tools are selected for the specific task, and our project was no exception. Since the requirements had been clearly defined before the beginning of the development, the backend architecture was chosen to meet these requirements.

One might say the standard solutions were chosen as a technology stack- Ruby language, Rails framework, PostgreSQL database, no-sql Redis cache for the backend, AngularJS for the frontend. Also, since the requirements required the application to have search functionality, it was decided to use ElasticSearch for these purposes. The solution was deployed on Amazon, using the respective services.

As you can see, the stack includes time-proven solutions with nothing exotic. However, using some of these tools turned out to be not so standard. For your understanding, let us first describe the problem.

We needed to search a PostgreSQL database, arranging data which changes rapidly, and that’s why it is stored in Redis. Everyone knows ElasticSearch enables flexible sorting by changing the weight (‘score’) of the document in a special script (‘script_score’ function). More information about this can be found in the official documentation.

Our task was complicated by having to pull the data for the ‘score’ calculation out of Redis. The thing is there is one Amazon server for Redis, another one for Postgres, and the third one for the application (the corresponding Amazon services are connected). Therefore, for the script (‘script_score’) ElasticSearch to get fast access to the data from Redis, we needed to ensure a permanent connection to its server.

To implement such a scheme, the ‘script_score’ has to operate connections on its own, rather than creating a new one for each search query (which is slow). According to the official documentation, the script for arranging ‘score’ can be put in a separate file in one of the supported programming languages (e.g., Groovy, as in the example below):


 "functions": [
       {
         "script_score": {
           "script": { 
             "lang": "groovy",
             "file": "calculate-score",
             "params": {
               "my_modifier": 8
             }
           }
         }
       }
     ]

The script body put in a separate file improves the code visually and structures it better, but doesn’t allow one to maintain permanent connections. Therefore, the task was not only to run the script, but also to make it stay in memory, working all the time, with the ElasticSearch engine.

Based on the advice from the official forum, such functionality is achievable, but not trivial – you need to write a plugin for ElasticSearch that will perform the required script. The good thing about using a plug-in is it is initialized and constantly working with the search engine at the start of ElasticSearch.

Below, you can see the example of initializing the plugin on Java, which registers the required script (‘discovery_script’) for ElasticSearch by extends a special class Plugin from Elasticsearch.


import org.elasticsearch.plugins.Plugin;
...
public class SearchPlugin extends Plugin {
 
   @Override
   public String name() {
       return "discovery-script";
   }
 
   @Override
   public String description() {
       return "Native script examples";
   }
 
   public void onModule(ScriptModule module) {
       // Register each script that defined in plugin
       module.registerScript("discovery_script", DiscoveryScriptFactory.class);
   }
}

And this example shows how the connection pool factory is used, allowing to operate connections by placing them in a pool.


 
public class JedisFactory {
 
   public final String REDIS_HOST;
   public static final int MAX_CONNECTIONS = 1000;
   public static final int MIN_IDLE_CONNECTIONS = 100;
 
   private static JedisFactory instance = null;
   private static JedisPool jedisPool;
 
   public JedisFactory() {
       JedisPoolConfig poolConfig = new JedisPoolConfig();
       … // pool config settings
       jedisPool = new JedisPool(poolConfig, REDIS_HOST);
   }
 
   public JedisPool getJedisPool() {
       return jedisPool;
   }
 
   public static JedisFactory getInstance() {
       if (instance == null) {
           instance = new JedisFactory();
       }
       return instance;
   }
}
 

And the script for calculating the ‘score’ parameter extends the special AbstractFloatSearchScript class of ElasticSearch.


import org.elasticsearch.script.AbstractFloatSearchScript;
...
public class DiscoveryScript extends AbstractFloatSearchScript {
   …
 
   @Override
   public float runAsFloat() {
       … // Security checks
       });
       try {
           jedis.getClient().setTimeoutInfinite();
           ScriptDocValues docValue = (ScriptDocValues) doc().get("id");
           … // getting data from Redis
           }
       } catch (JedisException e) {
           logger.error("Redis read/connection exception", e);
       } catch (Exception e) {
           logger.error("Unknown read/parse/calculate exception", e);
       } finally {
           jedis.close();
       }
 
       return calculateScore();
   }
   …
}
 

As you can see, the key role is played by the runAsFloat method, which calculates the ‘score’ parameter. Let’s install the written plugin and set the script to calculate directly in the ElasticSearch function:


scope = posts.min_score(0.001).query(
     function_score: {
       filter: {
         not: {
           terms: { id: Post.popular.ids }
         }
       },
       query: {
         constant_score: {
           filter: {
             range: {
               created: { gte: 'now-1d' }
             }
           }
         }
       },
       functions: [
         {
           script_score: {
             script: 'discovery_script',
             params: {
               period: 6
             },
             lang: 'native'
           }
         }
       ]
     }
   )
 

At the start, ElasticSearch loads the installed plugins, including our custom one, and keeps them in memory throughout the operation time. When the server accesses ElasticSearch for finding results, the script from the plugin is run. It does not need to create and raise a new connection to Redis for each request, which by not doing so, significantly increases the search speed.