Skip to content

Sentence Transformers

Supported embedding models

Nixiesearch supports any sentence-transformers-compatible model in the ONNX format.

The following list of models is tested to work well with Nixiesearch:

  • there is an ONNX model provided in the repo (e.g. a model.onnx file),
  • input tensor shapes are supported,
  • Nixiesearch can correctly guess query and document prompt format (like E5-family of models requiring query: and passage: prefixes),
  • embedding pooling method is supported - CLS or mean.

Note

Nixiesearch can automatically guess the proper prompt format and pooling method for all the models in the supported list table below. You can override this behavior in the model configuration section with pooling and prompt parameters.

List of supported models

Name Size Seqlen Dimensions Prompt Pooling
sentence-transformers/all-MiniLM-L6-v2 22M 512 384 not needed mean
sentence-transformers/all-MiniLM-L12-v2 33M 512 384 not needed mean
sentence-transformers/all-mpnet-base-v2 109M 384 384 not needed mean
intfloat/e5-small 33M 512 384 query+doc mean
intfloat/e5-base 109M 512 768 query+doc mean
intfloat/e5-large 335M 512 1024 query+doc mean
intfloat/e5-small-v2 33M 512 384 query+doc mean
intfloat/e5-base-v2 109M 512 768 query+doc mean
intfloat/e5-large-v2 335M 512 1024 query+doc mean
intfloat/multilingual-e5-small 118M 512 384 auto mean
intfloat/multilingual-e5-small 278M 512 768 auto mean
intfloat/multilingual-e5-small 560M 512 1024 auto mean
Alibaba-NLP/gte-base-en-v1.5 137M 8192 768 not needed CLS
Alibaba-NLP/gte-large-en-v1.5 434M 8192 1024 not needed CLS
Alibaba-NLP/gte-modernbert-base 149M 8192 768 not needed CLS
Snowflake/snowflake-arctic-embed-s 33M 512 384 query CLS
Snowflake/snowflake-arctic-embed-xs 22M 512 384 query CLS
Snowflake/snowflake-arctic-embed-m 109M 512 768 query CLS
Snowflake/snowflake-arctic-embed-m-v1.5 109M 512 768 query CLS
Snowflake/snowflake-arctic-embed-m-v2.0 109M 512 768 query CLS
Snowflake/snowflake-arctic-embed-l 109M 512 1024 query CLS
Snowflake/snowflake-arctic-embed-l-v2.0 109M 512 1024 query CLS
BAAI/bge-small-en-v1.5 33M 512 384 query mean
BAAI/bge-base-en-v1.5 109M 512 768 query mean
BAAI/bge-large-en-v1.5 33M 512 1024 query mean
BAAI/bge-small-zh-v1.5 33M 512 384 query mean
BAAI/bge-base-zh-v1.5 109M 512 768 query mean
BAAI/bge-large-zh-v1.5 33M 512 1024 query mean
BAAI/bge-m3 560M 8192 1024 not needed mean
WhereIsAI/UAE-Large-V1 335M 512 1024 query mean
mixedbread-ai/mxbai-embed-large-v1 335M 512 1024 query CLS
jinaai/jina-embeddings-v3 572M 8192 1024 query+doc mean
NovaSearch/stella_en_400M_v5 435B 4096 8192 query mean

If the model is not listed in this table, but has an ONNX file available, then most probably it should work well. But you might set prompt and pooling parameters based on model documentation. See embedding model configuration section for more details.

Model handles

Nixiesearch supports loading models directly from Huggingface by its handle (e.g. sentence-transformers/all-MiniLM-L6-v2) and from local file directory.

You can reference any HF model handle in the inference block, for example:

inference:
  embedding:
    e5-small:
      model: sentence-transformers/all-MiniLM-L6-v2
It also works with local paths:

inference:
  embedding:
    your-model:
      model: /path/to/model/dir

Optionally you can define which particular ONNX file to load, for example the QInt8 quantized one:

inference:
  embedding:
    # Used for semantic retrieval
    e5-small:
      model: nixiesearch/e5-small-v2-onnx
      file: model_opt2_QInt8.onnx

Converting your own model

You can use the nixiesearch/onnx-convert to convert yur own model:

python convert.py --model_id intfloat/multilingual-e5-large --optimize 2 --quantize QInt8

Conversion config: ConversionArguments(model_id='intfloat/multilingual-e5-base', quantize='QInt8', output_parent_dir='./models/', task='sentence-similarity', opset=None, device='cpu', skip_validation=False, per_channel=True, reduce_range=True, optimize=2)
Exporting model to ONNX
Framework not specified. Using pt to export to ONNX.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Using framework PyTorch: 2.1.0+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating ONNX model models/intfloat/multilingual-e5-base/model.onnx...
        -[✓] ONNX model output names match reference model (last_hidden_state)
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 16, 768) matches (2, 16, 768)
                -[✓] all values close (atol: 0.0001)
The ONNX export succeeded and the exported model was saved at: models/intfloat/multilingual-e5-base
Export done
Processing model file ./models/intfloat/multilingual-e5-base/model.onnx
ONNX model loaded
Optimizing model with level=2
Optimization done, quantizing to QInt8
See the nixiesearch/onnx-convert repo for more details and options.