Dense retrieval models
Dense retrieval models are models that take in something like text or images and return a fixed sized array. This representation is then indexed and searchable by using approximate nearest neighbour algorithms along with a simililarty measure like cosine similarity or L2 distance.
Text
The following models are supported by default (and primarily based on the excellent sbert and Huggingface libraries and models).
- sentence-transformers/all-MiniLM-L6-v1
- sentence-transformers/all-MiniLM-L6-v2
- sentence-transformers/all-mpnet-base-v1
- sentence-transformers/all-mpnet-base-v2
- sentence-transformers/stsb-xlm-r-multilingual
- flax-sentence-embeddings/all_datasets_v3_MiniLM-L12
- flax-sentence-embeddings/all_datasets_v3_MiniLM-L6
- flax-sentence-embeddings/all_datasets_v4_MiniLM-L12
- flax-sentence-embeddings/all_datasets_v4_MiniLM-L6
- flax-sentence-embeddings/all_datasets_v3_mpnet-base
- flax-sentence-embeddings/all_datasets_v4_mpnet-base
These models can be selected when creating the index and are illustrated by the example below;
settings = {
"index_defaults": {
"treat_urls_and_pointers_as_images": False,
"model": "flax-sentence-embeddings/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": True,
},
}
response = mq.create_index("my-index", settings_dict=settings)
The model
field is the pertinent field for selecting the model to use. Note, once an index has been created and a model has been selected, the model cannot be changed. A new index would need to be created with the alternative model.
The model will be applied to all relevent fields. Field specific settings which allow different models to be applied to different fields is not currently supported but will be coming soon (and contributions are always welcome).
Although use case specific, a good starting point is the model flax-sentence-embeddings/all_datasets_v4_MiniLM-L6
. It provides a good compromise between speed and relevancy. The model flax-sentence-embeddings/all_datasets_v4_mpnet-base
provides the best relevancy (in general).
ONNX
ONNX versions of the above models can also be used. ONNX is an open format for models that is designed to allow interoperability of models across frameworks. Other benefits include faster inference (model and use case specific but ~2x) and lower memory usage. The ONNX conversion of the above models happens 'on the fly'. To use one of the above models as an ONNX version, simply replace the text preceding the first '/' with 'onnx'. For example;
- onnx/all-MiniLM-L6-v1
- onnx/all-MiniLM-L6-v2
- onnx/all_datasets_v3_MiniLM-L12
- onnx/all_datasets_v3_MiniLM-L6
- onnx/all_datasets_v4_MiniLM-L12
- onnx/all_datasets_v4_MiniLM-L6
The 'mpnet' based models are not currently supported by the ONNX conversion but will be added soon. See below for the example how to use an ONNX model:
settings = {
"index_defaults": {
"treat_urls_and_pointers_as_images": False,
"model": "onnx/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": True,
},
}
response = mq.create_index("my-index", settings_dict=settings)
Images
The models that are used for tensorizing images come from CLIP. We support two implementations, one from OpenAI, and the other one is an open source implementation called open clip. The following models are supported;
OpenAI
- RN50
- RN101
- RN50x4
- RN50x16
- RN50x64
- ViT-B/32
- ViT-B/16
- ViT-L/14
- ViT-L/14@336px
Although use case specific, a good starting point is the model ViT-B/16
. It provides a good compromise between speed and relevancy. The models ViT-L/14
and ViT-L/14@336px
provides the best relevancy (in general) but are typically slower.
settings = {
"index_defaults": {
"treat_urls_and_pointers_as_images": True,
"model": "ViT-L/14",
"normalize_embeddings": True,
},
}
response = mq.create_index("my-index", settings_dict=settings)
Open Clip
- open_clip/RN50/openai
- open_clip/RN50/yfcc15m
- open_clip/RN50/cc12m
- open_clip/RN50-quickgelu/openai
- open_clip/RN50-quickgelu/yfcc15m
- open_clip/RN50-quickgelu/cc12m
- open_clip/RN101/openai
- open_clip/RN101/yfcc15m
- open_clip/RN101-quickgelu/openai
- open_clip/RN101-quickgelu/yfcc15m
- open_clip/RN50x4/openai
- open_clip/RN50x16/openai
- open_clip/RN50x64/openai
- open_clip/ViT-B-32/openai
- open_clip/ViT-B-32/laion400m_e31
- open_clip/ViT-B-32/laion400m_e32
- open_clip/ViT-B-32/laion2b_e16
- open_clip/ViT-B-32/laion2b_s34b_b79k
- open_clip/ViT-B-32-quickgelu/openai
- open_clip/ViT-B-32-quickgelu/laion400m_e31
- open_clip/ViT-B-32-quickgelu/laion400m_e32
- open_clip/ViT-B-16/openai
- open_clip/ViT-B-16/laion400m_e31
- open_clip/ViT-B-16/laion400m_e32
- open_clip/ViT-B-16-plus-240/laion400m_e31
- open_clip/ViT-B-16-plus-240/laion400m_e32
- open_clip/ViT-L-14/openai
- open_clip/ViT-L-14/laion400m_e31
- open_clip/ViT-L-14/laion400m_e32
- open_clip/ViT-L-14/laion2b_s32b_b82k
- open_clip/ViT-L-14-336/openai
- open_clip/ViT-H-14/laion2b_s32b_b79k
- open_clip/ViT-g-14/laion2b_s12b_b42k
Like the Open AI based models, the larger ViT based models typically perform better. For example, open_clip/ViT-H-14/laion2b_s32b_b79k
is the best model for relevency (in general) and surpasses even the best models from Open AI.
The names of the open clip models are in the format of "implementation source / model name / pretrained dataset". The detailed configurations of models can be found here.
settings = {
"index_defaults": {
"treat_urls_and_pointers_as_images": True,
"model": "open_clip/ViT-H-14/laion2b_s32b_b79k",
"normalize_embeddings": True,
},
}
response = mq.create_index("my-index", settings_dict=settings)
ONNX Clip
We now add onnx version of clip models into our available models. Currently, Marqo support two onnx models: - onnx32/openai/ViT-L/14 - onnx16/openai/ViT-L/14
The onnx32/openai/ViT-L/14
model should get the same results as the pytorch version (ViT-L-14
) with a faster
speed. In our test, it can reduce the index time from 80ms to 60ms for each image. We encourage you
to use this model if you need to index a large amount of images with best accuracy.
The onnx16/openai/ViT-L/14
is the float16
version of the above model. It provides even faster
inference speed, 28ms per image. However, its searching accuracy is not as good as the float32
version. If you really
care about indexing speed but are less sensitive to accuracy, this might be your choice.
Similarly, choose these model by setting your index as:
settings = {
"index_defaults": {
"treat_urls_and_pointers_as_images": True,
"model": "onnx32/openai/ViT-L/14",
"normalize_embeddings": True,
},
}
response = mq.create_index("my-index", settings_dict=settings)
Generic Models
You can also use models that are not supported by default.
settings = {
"index_defaults": {
"treat_urls_and_pointers_as_images": False,
"model": 'unique-model-alias',
"model_properties": {"name": "sentence-transformers/multi-qa-MiniLM-L6-cos-v1",
"dimensions": 384,
"tokens": 128,
"type": "sbert"},
"normalize_embeddings": True,
},
}
response = mq.create_index("my-generic-model-index", settings_dict=settings)
The model
field is required and acts as an identifying alias to the model specified through model_properties
.
If a default model name is used in the name
field, model_properties
will override the default model settings.
Currently, models hosted on huggingface model hub are supported. These models need to output embeddings and conform to either the sbert api or huggingface api. More options for custom models will be added shortly, including inference endpoints.
Required Keys for model_properties
Name | Type | Description |
---|---|---|
name |
String | Name of model in library |
dimensions |
Integer | Dimensions of model |
Optional Keys for model_properties
Search Parameter | Type | Default value | Description |
---|---|---|---|
tokens |
Integer | 128 |
Number of tokens |
type |
String | "sbert" |
Type of model loader |
Other media types
At the moment only text and images are supported. Other media types and custom media types will be supported soon.