Sentence Transformers
Supported embedding models¶
Nixiesearch supports any sentence-transformers-compatible model in the ONNX format.
The following list of models is tested to work well with Nixiesearch:
- there is an ONNX model provided in the repo (e.g. a
model.onnx
file), - input tensor shapes are supported,
- Nixiesearch can correctly guess query and document prompt format (like E5-family of models requiring
query:
andpassage:
prefixes), - embedding pooling method is supported -
CLS
ormean
.
Note
Nixiesearch can automatically guess the proper prompt format and pooling method for all the models in the supported list table below. You can override this behavior in the model configuration section with pooling
and prompt
parameters.
List of supported models¶
Name | Size | Seqlen | Dimensions | Prompt | Pooling |
---|---|---|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 | 22M | 512 | 384 | not needed | mean |
sentence-transformers/all-MiniLM-L12-v2 | 33M | 512 | 384 | not needed | mean |
sentence-transformers/all-mpnet-base-v2 | 109M | 384 | 384 | not needed | mean |
intfloat/e5-small | 33M | 512 | 384 | query+doc | mean |
intfloat/e5-base | 109M | 512 | 768 | query+doc | mean |
intfloat/e5-large | 335M | 512 | 1024 | query+doc | mean |
intfloat/e5-small-v2 | 33M | 512 | 384 | query+doc | mean |
intfloat/e5-base-v2 | 109M | 512 | 768 | query+doc | mean |
intfloat/e5-large-v2 | 335M | 512 | 1024 | query+doc | mean |
intfloat/multilingual-e5-small | 118M | 512 | 384 | auto | mean |
intfloat/multilingual-e5-small | 278M | 512 | 768 | auto | mean |
intfloat/multilingual-e5-small | 560M | 512 | 1024 | auto | mean |
Alibaba-NLP/gte-base-en-v1.5 | 137M | 8192 | 768 | not needed | CLS |
Alibaba-NLP/gte-large-en-v1.5 | 434M | 8192 | 1024 | not needed | CLS |
Alibaba-NLP/gte-modernbert-base | 149M | 8192 | 768 | not needed | CLS |
Snowflake/snowflake-arctic-embed-s | 33M | 512 | 384 | query | CLS |
Snowflake/snowflake-arctic-embed-xs | 22M | 512 | 384 | query | CLS |
Snowflake/snowflake-arctic-embed-m | 109M | 512 | 768 | query | CLS |
Snowflake/snowflake-arctic-embed-m-v1.5 | 109M | 512 | 768 | query | CLS |
Snowflake/snowflake-arctic-embed-m-v2.0 | 109M | 512 | 768 | query | CLS |
Snowflake/snowflake-arctic-embed-l | 109M | 512 | 1024 | query | CLS |
Snowflake/snowflake-arctic-embed-l-v2.0 | 109M | 512 | 1024 | query | CLS |
BAAI/bge-small-en-v1.5 | 33M | 512 | 384 | query | mean |
BAAI/bge-base-en-v1.5 | 109M | 512 | 768 | query | mean |
BAAI/bge-large-en-v1.5 | 33M | 512 | 1024 | query | mean |
BAAI/bge-small-zh-v1.5 | 33M | 512 | 384 | query | mean |
BAAI/bge-base-zh-v1.5 | 109M | 512 | 768 | query | mean |
BAAI/bge-large-zh-v1.5 | 33M | 512 | 1024 | query | mean |
BAAI/bge-m3 | 560M | 8192 | 1024 | not needed | mean |
WhereIsAI/UAE-Large-V1 | 335M | 512 | 1024 | query | mean |
mixedbread-ai/mxbai-embed-large-v1 | 335M | 512 | 1024 | query | CLS |
jinaai/jina-embeddings-v3 | 572M | 8192 | 1024 | query+doc | mean |
NovaSearch/stella_en_400M_v5 | 435B | 4096 | 8192 | query | mean |
If the model is not listed in this table, but has an ONNX file available, then most probably it should work well. But you might set prompt
and pooling
parameters based on model documentation. See embedding model configuration section for more details.
Model handles¶
Nixiesearch supports loading models directly from Huggingface by its handle (e.g. sentence-transformers/all-MiniLM-L6-v2
) and from local file directory.
You can reference any HF model handle in the inference block, for example:
inference:
embedding:
e5-small:
model: sentence-transformers/all-MiniLM-L6-v2
inference:
embedding:
your-model:
model: /path/to/model/dir
Optionally you can define which particular ONNX file to load, for example the QInt8 quantized one:
inference:
embedding:
# Used for semantic retrieval
e5-small:
model: nixiesearch/e5-small-v2-onnx
file: model_opt2_QInt8.onnx
Converting your own model¶
You can use the nixiesearch/onnx-convert to convert yur own model:
python convert.py --model_id intfloat/multilingual-e5-large --optimize 2 --quantize QInt8
Conversion config: ConversionArguments(model_id='intfloat/multilingual-e5-base', quantize='QInt8', output_parent_dir='./models/', task='sentence-similarity', opset=None, device='cpu', skip_validation=False, per_channel=True, reduce_range=True, optimize=2)
Exporting model to ONNX
Framework not specified. Using pt to export to ONNX.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.1.0+cu121
Overriding 1 configuration item(s)
- use_cache -> False
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating ONNX model models/intfloat/multilingual-e5-base/model.onnx...
-[✓] ONNX model output names match reference model (last_hidden_state)
- Validating ONNX Model output "last_hidden_state":
-[✓] (2, 16, 768) matches (2, 16, 768)
-[✓] all values close (atol: 0.0001)
The ONNX export succeeded and the exported model was saved at: models/intfloat/multilingual-e5-base
Export done
Processing model file ./models/intfloat/multilingual-e5-base/model.onnx
ONNX model loaded
Optimizing model with level=2
Optimization done, quantizing to QInt8