Skip to content

Embedding cache

Computing local SBERT-compatible and API-based embeddings is latency-heavy task. Nixiesearch can cache hot queries in in-memory and remote Redis-based cache. This approach:

  • minimises end-to-end search latency, as query embedding can take up to 300ms for API-based embedding providers.
  • reduces costs, as most frequent queries are never re-embedded.

embed cache diagram

In-memory cache

In-memory caching is a default option. It is defined per-model in the inference section of config file.

inference:
  embedding:
    # Used for semantic retrieval
    e5-small:
      model: nixiesearch/e5-small-v2-onnx
      cache:
        memory:
          max_size: 32768

Cache can be disabled altogether:

inference:
  embedding:
    # Used for semantic retrieval
    e5-small:
      model: nixiesearch/e5-small-v2-onnx
      cache: false

Cache is local to the node and is ephemeral (so is not persisted between restarts). If Nixiesearch runs in a standalone mode (e.g. when indexing and search tiers are co-located in a single process), the cache is shared between indexing and search threads.

Redis cache

Planned in Nixiesearch 0.6.