Create Index
By default the settings look like this. Settings can be set as the index is created.
POST /indexes/{index_name}
Create and index with (optional) settings.
This endpoint accepts the application/json
content type.
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the index |
Body Parameters
The settings for the index. The settings are represented as a nested JSON object.
Name | Type | Default value | Description |
---|---|---|---|
index_defaults |
Dictionary | "" |
The index defaults object |
number_of_shards |
Integer | 3 |
The number of shards for the index |
number_of_replicas |
Integer | 0 |
The number of replicas for the index |
Index Defaults Object
The index_defaults
object contains the default settings for the index. The parameters are as follows:
Name | Type | Default value | Description |
---|---|---|---|
treat_urls_and_pointers_as_images |
Boolean | "" |
Fetch images from pointers |
model |
String | hf/all_datasets_v4_MiniLM-L6 |
The model to use for the index |
model_properties |
Dictionary | "" |
The model properties object (for custom models) |
normalize_embeddings |
Boolean | true |
Normalize the embeddings to have unit length |
text_preprocessing |
Dictionary | "" |
The text preprocessing object |
image_preprocessing |
Dictionary | "" |
The image preprocessing object |
ann_parameters |
Dictionary | "" |
The ANN algorithm parameter object |
Text Preprocessing Object
The text_preprocessing
object contains the specifics of how you want the index to preprocess text. The parameters are as follows:
Name | Type | Default value | Description |
---|---|---|---|
split_length |
Integer | 2 |
The length of the chunks after splitting by split_method |
split_overlap |
Integer | 0 |
The length of overlap between adjacent chunks |
split_method |
String | sentence |
The method by which text is chunked (character , word , sentence , or passage ) |
Image Preprocessing Object
The image_preprocessing
object contains the specifics of how you want the index to preprocess images. The parameters are as follows:
Name | Type | Default value | Description |
---|---|---|---|
patch_method |
String | null |
The method by which images are chunked (simple or frcnn ) |
ANN Algorithm Parameter object
The ann_parameters
object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:
Name | Type | Default value | Description |
---|---|---|---|
space_type |
String | cosinesimil |
The function used to measure the distance between two points in ANN (l1 , l2 , linf , or cosinesimil . |
parameters |
Dict | "" |
The hyperparameters for the ANN method (which is always hnsw for Marqo). |
HNSW Method Parameters Object
parameters
can have the following values:
Name | Type | Default value | Description |
---|---|---|---|
ef_construction |
int | 128 |
The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096) |
m |
int | 16 |
The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100. |
Model Properties Object
model_properties
is a flexible object that is used to set up models that aren't available in Marqo by default (models available by default are listed here).
The structure of model_properties will vary depending on the model.
For Open CLIP models, see here for model_properties
format and example usage.
For Generic SBERT models, see here for model_properties
format and example usage.
Below is a sample index settings JSON object. When using the Python client, pass this dictionary as the settings_dict
parameter for the create_index
method.
{
"index_defaults": {
"treat_urls_and_pointers_as_images": false,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": true,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": null
},
"ann_parameters" : {
"space_type": "cosinesimil",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
},
"number_of_shards": 3,
"number_of_replicas": 0
}
Example
curl -XPOST 'http://localhost:8882/indexes/my-first-index' -H 'Content-type:application/json' -d '
{
"index_defaults": {
"treat_urls_and_pointers_as_images": false,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": true,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": null
},
"ann_parameters" : {
"space_type": "cosinesimil",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
},
"number_of_shards": 3,
"number_of_replicas": 0
}'
index_settings = {
"index_defaults": {
"treat_urls_and_pointers_as_images": False,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": True,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": None
},
"ann_parameters" : {
"space_type": "cosinesimil",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
},
"number_of_shards": 3,
"number_of_replicas": 0
}
mq.create_index("my-first-index", settings_dict=index_settings)
Response: 200 OK
{"acknowledged":true, "shards_acknowledged":true, "index":"my-first-index"}
No Model
You may want to use marqo to store and search upon vectors that you have already generated. In this case, you can create your index with no model. To do this, set model
to the string "no_model"
and define model_properties
with only the dimensions
key. Set this to the size of the vectors you intend to use for this index.
Note that for a no_model
index, you will not be able to vectorise any raw text documents or search queries. To add documents, use the custom_vector object field, and to search, use the context parameter with no q
defined.
Example
index_settings = {
"index_defaults": {
"model": "no_model",
"model_properties": {
"dimensions": 123 # Put your custom vector size here!
}
},
}
mq.create_index("my-no-model-index", settings_dict=index_settings)