Lightning fast search with ElasticSearch

elastic search
Voiced by Amazon Polly

“Make it faster.” Developers hear these words all the time. In this incredibly fast-moving world, users expect speed from the technology they use. They have no time or patience for slow-moving applications. That’s why, during the app development process, Distillery focuses on finding and customizing technology solutions that increase speed and optimize efficiency.

Here at Distillery, we love sharing information about the solutions we create. After all, software development isn’t about reinventing the wheel — it’s about finding ways to use all available wheels as effectively as possible, and building new wheels only when necessary. Blogs that showcase this commitment include using SignalR to expedite client-server communication, building architecture in Django based on finite state machines, using PubNub for real-time messaging, and using Spark and Spark Streaming for data wrangling. In the same spirit, the blog below — written by Daniil Kolotev, one of Distillery’s developers, and originally published in early 2017 — explains how we used ElasticSearch to significantly increase the search speed within one application we built.

It’s no secret that tools are selected for each specific task, and our project was no exception. Since the requirements had been clearly defined before the beginning of the app development process, the backend architecture was chosen to meet the defined requirements.

One might say the standard solutions were chosen as a technology stack- Ruby language, Rails framework, PostgreSQL database, no-sql Redis cache for the backend, AngularJS for the frontend. Also, since the requirements required the application to have search functionality, it was decided to use ElasticSearch for these purposes. The solution was deployed on Amazon, using the respective services.

As you can see, the stack includes time-proven solutions with nothing exotic. However, using some of these tools turned out to be not so standard. For your understanding, let us first describe the problem.

We needed to search a PostgreSQL database, arranging data which changes rapidly, and that’s why it is stored in Redis. Everyone knows ElasticSearch enables flexible sorting by changing the weight (‘score’) of the document in a special script (‘script_score’ function). More information about this can be found in the official documentation.

Our task was complicated by having to pull the data for the ‘score’ calculation out of Redis. The thing is there is one Amazon server for Redis, another one for Postgres, and the third one for the application (the corresponding Amazon services are connected). Therefore, for the script (‘script_score’) ElasticSearch to get fast access to the data from Redis, we needed to ensure a permanent connection to its server.

To implement such a scheme, the ‘script_score’ has to operate connections on its own, rather than creating a new one for each search query (which is slow). According to the official documentation, the script for arranging ‘score’ can be put in a separate file in one of the supported programming languages (e.g., Groovy, as in the example below):

"functions": [
  {
    "script_score": {
      "script": { 
        "lang": "groovy",
        "file": "calculate-score",
        "params": {
          "my_modifier": 8
        }
      }
    }
  }
]

The script body put in a separate file improves the code visually and structures it better, but doesn’t allow one to maintain permanent connections. Therefore, the task was not only to run the script, but also to make it stay in memory, working all the time, with the ElasticSearch engine.

Based on the advice from the official forum, such functionality is achievable, but not trivial – you need to write a plugin for ElasticSearch that will perform the required script. The good thing about using a plug-in is it is initialized and constantly working with the search engine at the start of ElasticSearch.

Below, you can see the example of initializing the plugin on Java, which registers the required script (‘discovery_script’) for ElasticSearch by extends a special class Plugin from Elasticsearch.

import org.elasticsearch.plugins.Plugin;
...
public class SearchPlugin extends Plugin {
  @Override
  public String name() {
    return "discovery-script";
  }

  @Override
  public String description() {
    return "Native script examples";
  }

  public void onModule(ScriptModule module) {
    // Register each script that defined in plugin
    module.registerScript("discovery_script", DiscoveryScriptFactory.class);
  }
}

And this example shows how the connection pool factory is used, allowing to operate connections by placing them in a pool.

public class JedisFactory { 
  public final String REDIS_HOST;
  public static final int MAX_CONNECTIONS = 1000;
  public static final int MIN_IDLE_CONNECTIONS = 100;

  private static JedisFactory instance = null;
  private static JedisPool jedisPool;

  public JedisFactory() {
    JedisPoolConfig poolConfig = new JedisPoolConfig();
    … // pool config settings
    jedisPool = new JedisPool(poolConfig, REDIS_HOST);
  }

  public JedisPool getJedisPool() {
    return jedisPool;
  }

  public static JedisFactory getInstance() {
    if (instance == null) {
      instance = new JedisFactory();
    }
    return instance;
  }
}
 

And the script for calculating the ‘score’ parameter extends the special AbstractFloatSearchScript class of ElasticSearch.

import org.elasticsearch.script.AbstractFloatSearchScript;
...
public class DiscoveryScript extends AbstractFloatSearchScript {
  …

  @Override
  public float runAsFloat() {
    … // Security checks
    });
    try {
      jedis.getClient().setTimeoutInfinite();
      ScriptDocValues docValue = (ScriptDocValues) doc().get("id");
      … // getting data from Redis
      }
    } catch (JedisException e) {
      logger.error("Redis read/connection exception", e);
    } catch (Exception e) {
      logger.error("Unknown read/parse/calculate exception", e);
    } finally {
      jedis.close();
    }

    return calculateScore();
  }
  …
}
 

As you can see, the key role is played by the runAsFloat method, which calculates the ‘score’ parameter. Let’s install the written plugin and set the script to calculate directly in the ElasticSearch function:

scope = posts.min_score(0.001).query(
  function_score: {
    filter: {
      not: {
        terms: { id: Post.popular.ids }
      }
    },
    query: {
      constant_score: {
        filter: {
          range: {
            created: { gte: 'now-1d' }
          }
        }
      }
    },
    functions: [
      {
        script_score: {
          script: 'discovery_script',
          params: {
            period: 6
          },
          lang: 'native'
        }
      }
    ]
  }
)

At the start, ElasticSearch loads the installed plugins, including our custom one, and keeps them in memory throughout the operation time. When the server accesses ElasticSearch for finding results, the script from the plugin is run. It does not need to create and raise a new connection to Redis for each request, which by not doing so, significantly increases the search speed.

Want to learn more about how Distillery’s developers find solutions that help us streamline the product development process? Let us know!

previous post next post
BACK TO TOP >