Config file¶

Core config¶

Main server-related settings are stored here:

core:
  host: 0.0.0.0 # optional, default=0.0.0.0
  port: 8080 # optional, default=8080
  loglevel: info # optional, default info
  telemetry: true # optional, default true
  cache:
    dir: ./cache # optional, default=./cache

Environment variables overrides¶

Core config settings can be overridden with env variables:

NIXIESEARCH_CORE_HOST: overrides core.host
NIXIESEARCH_CORE_PORT: overrides core.port
NIXIESEARCH_CORE_LOGLEVEL: overrides core.loglevel
NIXIESEARCH_CORE_TELEMETRY: overrides core.telemetry
NIXIESEARCH_CORE_CACHE_DIR: overrides core.cache.dir

Loglevel can also be set from the command-line flags. Env overrides always have higher priority than config values.

Telemetry configuration¶

You can opt-out of anonymous usage telemetry collection by setting the core.telemetry: false option:

core:
  telemetry: false

This is possible to also have a fine-grained control over telemetry parts from a config file (like error stack traces and performance metrics). But currently we only collect usage telemetry:

core:
  telemetry:
    usage: false

Searcher config¶

The searcher section is currently reserved for future searcher-specific settings:

searcher: {}

This section is currently empty but part of the configuration schema for future extensibility.

Index mapping¶

You can define each index in the schema block of the configuration:

schema:
  <your-index-name>:
    alias: <list of aliases>
    config:
      <index configuration>
    store:
      <store configuration>
    fields:
      <field definitions>

!! note The index name is immutable, so choose it wisely. But you can always add an alias to address it using a new name.

Index configuration¶

An example of index configuration:

schema:
  index-name:
    config:
      indexer:
        ram_buffer_size: 512mb
        flush:
          interval: 5s # how frequently new segments are created
      hnsw:
        m: 16 # max number of node-node links in HNSW graph
        efc: 100 # beam width used while building the index
        workers: 8 # how many concurrent workers used for HNSW merge ops

Fields:

indexer.flush.interval: optional, duration, default 5s. Index writer will periodically produce flush index segments (if there are new documents) with this interval.
indexer.ram_buffer_size: optional, size, default 512mb. RAM buffer size for new segments.
hnsw.m: optional, int, default 16. How many links should HNSW index have? Larger value means better recall, but higher memory usage and bigger index. Common values are within 16-128 range.
hnsw.efc: optional, int, default 100. How many neighbors in the HNSW graph are explored during indexing. Bigger the value, better the recall, but slower the indexing speed.
hnsw.workers: optional, int, default = number of CPU cores. How many concurrent workers to use for index merges.
indexer.merge_policy: optional, merge policy config, default tiered. Controls how Lucene merges index segments. See Merge Policies section below for details.

Merge Policies¶

Nixiesearch allows you to configure Lucene merge policies to optimize indexing performance for your specific use case. The merge policy determines how and when index segments are merged together.

Tiered Merge Policy (default)¶

Best for most use cases, provides balanced performance between indexing speed and search performance.

schema:
  your-index:
    config:
      indexer:
        merge_policy: tiered
        # or with custom settings:
        merge_policy:
          tiered:
            segments_per_tier: 10
            max_merge_at_once: 10
            max_merged_segment_size: 5gb
            floor_segment_size: 16mb
            target_search_concurrency: 1

Parameters: * segments_per_tier: optional, int, default 10. Number of segments per tier in the merge hierarchy * max_merge_at_once: optional, int, default 10. Maximum number of segments merged at once * max_merged_segment_size: optional, size, default 5gb. Maximum size of merged segments * floor_segment_size: optional, size, default 16mb. Minimum segment size threshold * target_search_concurrency: optional, int, default 1. Target concurrency level for search operations

Byte Size Merge Policy¶

Good for maintaining consistent segment sizes, useful when you want predictable storage patterns.

schema:
  your-index:
    config:
      indexer:
        merge_policy: byte_size
        # or with custom settings:
        merge_policy:
          byte_size:
            max_merge_size: 5gb
            min_merge_size: 16mb
            min_merge_size_for_forced_merge: 5gb

Parameters: * max_merge_size: optional, size, default 5gb. Maximum size of segments to merge * min_merge_size: optional, size, default 16mb. Minimum size threshold for merging segments * min_merge_size_for_forced_merge: optional, size, default 5gb. Minimum size threshold for forced merges

Document Count Merge Policy¶

Good when documents are of uniform size and you want to control merging based on document count rather than byte size.

schema:
  your-index:
    config:
      indexer:
        merge_policy: doc_count
        # or with custom settings:
        merge_policy:
          doc_count:
            min_merge_docs: 10
            max_merge_docs: 2147483647

Parameters: * min_merge_docs: optional, int, default 10. Minimum number of documents in segments before merging * max_merge_docs: optional, int, default 2147483647. Maximum number of documents in merged segments

No Merge Policy¶

Disables automatic merging entirely. Useful for read-only indexes or when you want full control over merging.

schema:
  your-index:
    config:
      indexer:
        merge_policy: none

This policy has no additional configuration options.

Store configuration¶

TODO

Fields definitions¶

TODO

ML Inference¶

See ML Inference overview and RAG Search for an overview of use cases for inference models.

Embedding models¶

ONNX models¶

Example of a full configuration:

inference:
  embedding:
    your-model-name:
      provider: onnx
      model: nixiesearch/e5-small-v2-onnx
      file: model.onnx
      max_tokens: 512
      batch_size: 32
      pooling: mean
      normalize: true
      cache: false
      prompt:
        query: "query: "
        doc: "passage: "

Fields:

provider: optional, string, default onnx. As for v0.3.0, only the onnx provider is supported.
model: required, string. A Huggingface handle, or an HTTP/Local/S3 URL for the model. See model URL reference for more details on how to load your model.
prompt: optional. A document and query prefixes for asymmetrical models. Nixiesearch can guess the proper prompt format for the vast majority of embedding models. See the list of supported embedding models for more details.
file: optional, string, default is to pick a lexicographically first file. A file name of the model - useful when HF repo contains multiple versions of the same model.
max_tokens: optional, int, default 512. How many tokens from the input document to process. All tokens beyond the threshold are truncated.
batch_size: optional, int, default 32. Computing embeddings is a highly parallel task, and doing it in big chunks is much more effective than one by one. For CPUs there are usually no gains of batch sizes beyong 32, but on GPUs you can go up to 1024.
pooling: optional, cls/mean/last, default auto. Which pooling method use to compute sentence embeddings. This is model specific and Nixiesearch tries to guess it automatically. Options: cls (first token), mean (average pooling), last (last token, used by decoder models like Qwen3-Embedding). See the list of supported embeddings models to know if it can be detected automatically. If your model is not on the list, consult the model doc on its pooling method.
normalize: optional, bool, default true. Should embeddings be L2-normalized? With normalized embeddings it becomes possible to use a faster dot-product based aKNN search.
cache: optional, bool or CacheSettings. Default memory.max_size=32768. Cache top-N LRU embeddings in RAM. See Embedding caching for more details.

OpenAI models¶

Example of a full configuration:

inference:
  embedding:
    <model-name>:
      provider: openai
      model: text-embedding-3-small
      timeout: 2000ms
      endpoint: "https://api.openai.com/"
      dimensions: null
      batch_size: 32
      cache: false

Parameters:

timeout: optional, duration, default 2s. External APIs might be slow sometimes.
retry: optional, string, default "https://api.openai.com/". You can use alternative API or EU-specific endpoint.
dimensions: optional, int, default empty. For matryoshka models, how many dimensions to return.
batch_size: optional, int, default 32. Batch size for calls with many documents.
cache: optional, bool or CacheSettings. Default memory.max_size=32768. Cache top-N LRU embeddings in RAM. See Embedding caching for more details.

Cohere models¶

Example of a full configuration:

inference:
  embedding:
    <model-name>:
      provider: cohere
      model: embed-english-v3.0
      timeout: 2000ms
      endpoint: "https://api.cohere.com/"
      batch_size: 32
      cache: false

Parameters:

timeout: optional, duration, default 2s. External APIs might be slow sometimes.
retry: optional, string, default "https://api.cohere.com/". You can use alternative API or EU-specific endpoint.
batch_size: optional, int, default 32. Batch size for calls with many documents.
cache: optional, bool or CacheSettings. Default memory.max_size=32768. Cache top-N LRU embeddings in RAM. See Embedding caching for more details.

Embedding caching¶

Each embedding model has a cache section, which controls embedding caching.

inference:
  embedding:
    e5-small:
      model: nixiesearch/e5-small-v2-onnx
      cache:
        memory:
          max_size: 32768

Parameters:

cache, optional, bool or object. Default memory.max_size=32768. Which cache implementation to use.
cache.memory.max_size, optional, int, default 32768. How many string-embedding pairs to keep in the LRU cache.

Nixiesearch currently supports only memory embedding cache, Redis caching is planned.

Reranking models¶

Cross-encoder reranking models can be configured in the inference.ranker section:

inference:
  ranker:
    your-model-name:
      provider: onnx
      model: cross-encoder/ms-marco-MiniLM-L6-v2
      max_tokens: 512
      batch_size: 32
      device: cpu
      file: model.onnx

Fields:

provider: required, string. Currently only onnx is supported.
model: required, string. A Huggingface handle, or an HTTP/Local/S3 URL for the model. See model URL reference for more details on how to load your model.
max_tokens: optional, int, default 512. Maximum sequence length for query-document pairs.
batch_size: optional, int, default 32. Inference batch size for processing multiple query-document pairs.
device: optional, string, default cpu. Processing device (cpu or gpu).
file: optional, string. A file name of the model - useful when HF repo contains multiple versions of the same model.
padding_side: optional, left or right. Controls tokenizer padding direction. Auto-detected for known models (e.g., left for Qwen3-based rerankers).
prompt: optional, object. Custom prompt template configuration. Auto-detected for known models like Qwen3-Reranker which use template-based prompting with special system instructions.
logits_processor: optional, sigmoid or noop, default noop. Controls how raw model outputs are transformed to scores. Auto-detected as sigmoid for Qwen3-based models, noop for traditional cross-encoders.

Example with Qwen3-Reranker (auto-detection):

inference:
  ranker:
    qwen-reranker:
      provider: onnx
      model: zhiqing/Qwen3-Reranker-0.6B-seq-cls-ONNX
      # The following are auto-detected for Qwen3-based models:
      # padding_side: left
      # logits_processor: sigmoid
      # prompt: (model-specific template)
      max_tokens: 512
      batch_size: 32
      device: cpu

For detailed usage examples, see Cross-Encoder Reranking documentation.

LLM completion models¶

Example of a full configuration:

inference:
  completion:
    your-model-name:
      provider: llamacpp
      model: Qwen/Qwen2-0.5B-Instruct-GGUF
      file: qwen2-0_5b-instruct-q4_0.gguf
      system: "You are a helpful assistant, answer only in haiku."
      options:
        threads: 8
        gpu_layers: 100
        cont_batching: true
        flash_attn: true
        seed: 42

Fields:

provider: required, string. As for v0.3.0, only llamacpp is supported. Other SaaS providers like OpenAI, Cohere, mxb and Google are on the roadmap.
model: required, string. A Huggingface handle, or an HTTP/Local/S3 URL for the model. See model URL reference for more details on how to load your model.
file: optional, string. A file name for the model, if the target model has multiple. A typical case for quantized models.
system: optional, string, default empty. An optional system prompt to be prepended to all the user prompts.
options: optional, obj. A set of llama-cpp specific options. See llamacpp reference on options for more details.