Skip to content

Nixiesearch: batteries included search engine

logo with name

CI Status License: Apache 2 Last commit Last release Join our slack Visit demo

What is Nixiesearch?

Nixiesearch is a modern search engine that runs on S3-compatible storage. We built it after dealing with the headaches of running large Elastic/OpenSearch clusters (here's the blog post full of pain), and here’s why it’s awesome:

NS design diagram

Search is never easy, but Nixiesearch has your back. It takes care of the toughest parts—like reindexing, capacity planning, and maintenance—so you can save time (and your sanity).

Note

Want to learn more? Go straight to the quickstart and check out the live demo.

What Nixiesearch is not?

  • Nixiesearch is not a database, and was never meant to be. Nixiesearch is a search index for consumer-facing apps to find top-N most relevant documents for a query. For analytical cases consider using good old SQL with Clickhouse or Snowflake.
  • Not a tool to search for logs. Log search is about throughput, and Nixiesearch is about relevance. If you plan to use Nixiesearch as a log storage system, please don't: consider ELK or Quickwit as better alternatives.

The difference

Our elasticsearch cluster has been a pain in the ass since day one with the main fix always "just double the size of the server" to the point where our ES cluster ended up costing more than our entire AWS bill pre-ES [HN source]

When your search cluster is red again when you accidentally send a wrong JSON to a wrong REST endpoint, you can just write your own S3-based search engine like big guys do:

Nixiesearch was inspired by these search engines, but is open-source. Decoupling search and storage makes ops simpler. Making your search configuration immutable makes it even more simple.

immutable config diagram

How it's different from popular search engines?

Try it out

Get the sample MSRD: Movie Search Ranking Dataset dataset:

curl -o movies.jsonl.gz https://nixiesearch.ai/data/movies.jsonl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   162  100   162    0     0   3636      0 --:--:-- --:--:-- --:--:--  3681
100 32085  100 32085    0     0   226k      0 --:--:-- --:--:-- --:--:--  226k

Create an index mapping for movies index in a file config.yml:

inference:
  embedding:
    e5-small: # 
      model: intfloat/e5-small-v2 # 
schema:
  movies: # index name
    fields:
      title: # field name
        type: text
        search: 
          type: hybrid
          model: e5-small
        language: en # language is needed for lexical search
        suggest: true
      overview:
        type: text
        search: 
          type: hybrid
          model: e5-small
        language: en

Run the Nixiesearch docker container:

docker run -itp 8080:8080 -v .:/data nixiesearch/nixiesearch:latest standalone -c /data/config.yml
a.nixiesearch.index.sync.LocalIndex$ - Local index movies opened
ai.nixiesearch.index.Searcher$ - opening index movies
a.n.main.subcommands.StandaloneMode$ - ███╗   ██╗██╗██╗  ██╗██╗███████╗███████╗███████╗ █████╗ ██████╗  ██████╗██╗  ██╗
a.n.main.subcommands.StandaloneMode$ - ████╗  ██║██║╚██╗██╔╝██║██╔════╝██╔════╝██╔════╝██╔══██╗██╔══██╗██╔════╝██║  ██║
a.n.main.subcommands.StandaloneMode$ - ██╔██╗ ██║██║ ╚███╔╝ ██║█████╗  ███████╗█████╗  ███████║██████╔╝██║     ███████║
a.n.main.subcommands.StandaloneMode$ - ██║╚██╗██║██║ ██╔██╗ ██║██╔══╝  ╚════██║██╔══╝  ██╔══██║██╔══██╗██║     ██╔══██║
a.n.main.subcommands.StandaloneMode$ - ██║ ╚████║██║██╔╝ ██╗██║███████╗███████║███████╗██║  ██║██║  ██║╚██████╗██║  ██║
a.n.main.subcommands.StandaloneMode$ - ╚═╝  ╚═══╝╚═╝╚═╝  ╚═╝╚═╝╚══════╝╚══════╝╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝  ╚═╝
a.n.main.subcommands.StandaloneMode$ -                                                                                
o.h.ember.server.EmberServerBuilder - Ember-Server service bound to address: [::]:8080

Build an index for a hybrid search:

curl -XPUT -d @movies.jsonl http://localhost:8080/movies/_index
{"result":"created","took":8256}

Send the search query:

curl -XPOST -d '{"query": {"match": {"title":"matrix"}},"fields": ["title"], "size":3}'\
   http://localhost:8080/movies/_search
{
  "took": 1,
  "hits": [
    {
      "_id": "605",
      "title": "The Matrix Revolutions",
      "_score": 0.016666668
    },
    {
      "_id": "604",
      "title": "The Matrix Reloaded",
      "_score": 0.016393442
    },
    {
      "_id": "624860",
      "title": "The Matrix Resurrections",
      "_score": 0.016129032
    }
  ],
  "aggs": {},
  "ts": 1722441735886
}

You can also open http://localhost:8080/_ui in your web browser for a basic web UI:

web ui

For more details, see a complete Quickstart guide.

License

This project is released under the Apache 2.0 license, as specified in the License file.