Skip to content

Create Index

By default, the settings for index look like this. Settings can be set as the index is created.

POST /indexes/{index_name}

Create index with (optional) settings. This endpoint accepts the application/json content type.

Index creation and deletion can not be called concurrently. If you try to create an index when there is an ongoing creation or deletion of an index, the request will fail, you will receive a 409 OperationConflictError.

Path parameters

Name Type Description
String name of the index

Body Parameters

The settings for the index are represented as a nested JSON object that contains the default settings for the index. The parameters are as follows:

Name Type Default value Description
treatUrlsAndPointersAsImages Boolean "" Fetch images from pointers
model String hf/e5-base-v2 The model to use to vectorise doc content in add_documents() calls for the index
modelProperties Dictionary "" The model properties object corresponding to model (for custom models)
Boolean true Normalize the embeddings to have unit length
textPreprocessing Dictionary "" The text preprocessing object
imagePreprocessing Dictionary "" The image preprocessing object
annParameters Dictionary "" The ANN algorithm parameter object
type String unstructured Type of the index
vectorNumericType String float Numeric type for vector encoding
filterStringMaxLength Int 20 Specifies the maximum character length allowed for strings used in filtering queries within unstructured indexes. This means that any string field you intend to use as a filter in these indexes should not exceed 20 characters in length.
textChunkPrefix String "" or model default The prefix added to indexed text document chunks when embedding.
textQueryPrefix String "" or or model default The prefix added to text queries when embedding.

Text Preprocessing Object

The textPreprocessing object contains the specifics of how you want the index to preprocess text. The parameters are as follows:

Name Type Default value Description
splitLength Integer 2 The length of the chunks after splitting by split_method
Integer 0 The length of overlap between adjacent chunks
splitMethod String sentence The method by which text is chunked (character, word, sentence, or passage)

Image Preprocessing Object

The imagePreprocessing object contains the specifics of how you want the index to preprocess images. The parameters are as follows:

Name Type Default value Description
String null The method by which images are chunked (simple or frcnn)

ANN Algorithm Parameter object

The annParameters object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:

Name Type Default value Description
String prenormalized-angular The function used to measure the distance between two points in ANN (angular, euclidean, dotproduct, geodegrees, hamming, or prenormalized-angular).
parameters Dict "" The hyperparameters for the ANN method (which is always hnsw for Marqo).

HNSW Method Parameters Object

parameters can have the following values:

Name Type Default value Description
int 512 The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096)
m int 16 The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.

Model Properties Object

This flexible object, used by modelProperties is used to set up models that aren't available in Marqo by default ( models available by default are listed here). The structure of this object will vary depending on the model.

For Open CLIP models, see here for modelProperties format and example usage.

For Generic SBERT models, see here for modelProperties format and example usage.

Prefixes in Index Settings

Parameters: textChunkPrefix, textQueryPrefix

Expected value: A string.

Default value: ""

These fields override the model's default prefixes for text documents and queries. URLs pointing to images are not affected by these prefixes. If these fields are left undefined, Marqo will use the model's default prefixes. Currently, only the e5 series models have default prefixes defined.

Indexes built on Marqo 2.5 and below will not have prefixes added to any new documents, embeddings, or queries when read with Marqo 2.6 and above, even if the index’s model has default prefixes set.

Currently, Marqo adds the prefixes by default to e5 models since these are trained on data with prefixes. So, adding them to text chunks before embedding improves the quality of the embeddings. For more information, refer to the model card here

Example: Setting text chunk and query prefixes during index creation

curl -X POST 'http://localhost:8882/indexes/my-first-index' \
-H "Content-Type: application/json" \
-d '{
    "textChunkPrefix": "passage: ",
    "textQueryPrefix": "query: ",
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
    "type": "unstructured"
    text_query_prefix="override query: ", 
    text_chunk_prefix="override passage: "