Create Index

By default, the settings for index look like this. Settings can be set as the index is created.

To create your index:

POST /indexes/{index_name}

Create index with (optional) settings. This endpoint accepts the application/json content type.

Index creation and deletion can not be called concurrently. If you try to create an index when there is an ongoing creation or deletion of an index, the request will fail, you will receive a 409 OperationConflictError.

Marqo Cloud creates dedicated infrastructure for each index. Using the create index endpoint, you can specify the type of storage for the index storageClass and the type of inference inferenceType. The number of storage instances is defined by numberOfShards, the number of replicas numberOfReplicas and the number of Marqo inference nodes by numberOfInferences. This is only supported for Marqo Cloud, not Marqo Open Source.

Example

Marqo Open SourceMarqo Cloud

This is an example of creating an index with Marqo Open Source:

cURLPython

cURL -X POST 'http://localhost:8882/indexes/my-first-index' \
-H "Content-Type: application/json" \
-d '{
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
    "type": "unstructured"
}'

import marqo
mq = marqo.Client("http://localhost:8882", api_key=None)
index_settings = {
    "treatUrlsAndPointersAsImages": False,
    "model": "hf/e5-base-v2",
}
mq.create_index("my-first-index", settings_dict=index_settings)

Response: `200 OK`

{"acknowledged":true, "index":"my-first-index"}

This is an example of creating an index with Marqo Cloud:

cURLPython

cURL -X POST 'https://api.marqo.ai/api/v2/indexes/my-first-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' \
-d '
{
"treatUrlsAndPointersAsImages": false,
"model": "hf/e5-base-v2",
"numberOfShards": 1,
"numberOfReplicas": 0,
"inferenceType": "marqo.CPU.large",
"storageClass": "marqo.basic",
"numberOfInferences": 1
}'

import marqo
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
index_settings = {
    "treatUrlsAndPointersAsImages": False,
    "model": "hf/e5-base-v2",
    "numberOfShards": 1,
    "numberOfReplicas": 0,
    "inferenceType": "marqo.CPU.large",
    "storageClass": "marqo.basic",
    "numberOfInferences": 1
}
mq.create_index("my-first-index", settings_dict=index_settings)

Response: `200 OK`

{"acknowledged":true, "shards_acknowledged":true, "index":"my-first-index"}

Path parameters

Name	Type	Description
`index_name`	String	name of the index

Body Parameters

The settings for the index are represented as a nested JSON object that contains the default settings for the index. The parameters are as follows:

Name	Type	Default value	Description
`treatUrlsAndPointersAsImages`	Boolean	`""`	Fetch images from pointers
`treatUrlsAndPointersAsMedia`*	Boolean	`""`	Fetch images, videos, and audio from pointers
`model`	String	`hf/e5-base-v2`	The model to use to vectorise doc content in `add_documents()` calls for the index
`modelProperties`	Dictionary	`""`	The model properties object corresponding to `model` (for custom models)
`normalizeEmbeddings`	Boolean	`true`	Normalize the embeddings to have unit length
`textPreprocessing`	Dictionary	`""`	The text preprocessing object
`imagePreprocessing`	Dictionary	`""`	The image preprocessing object
`videoPreprocessing`	Dictionary	`""`	The video preprocessing object
`audioPreprocessing`	Dictionary	`""`	The audio preprocessing object
`annParameters`	Dictionary	`""`	The ANN algorithm parameter object
`type`	String	`unstructured`	Type of the index
`vectorNumericType`	String	`float`	Numeric type for vector encoding
`filterStringMaxLength`	Int	`50`	Specifies the maximum character length allowed for strings used in filtering queries within unstructured indexes. This means that any string field you intend to use as a filter in these indexes should not exceed 50 characters in length.
`textChunkPrefix`	String	`""` or model default	The prefix added to indexed text document chunks when embedding.
`textQueryPrefix`	String	`""` or or model default	The prefix added to text queries when embedding.

* treatUrlsAndPointersAsMedia is a new parameter introduced in Marqo 2.12 to support the new modalities of video and audio. Here is how it interacts with treatUrlsAndPointersAsImages:

Both False: All content is processed as text only.
treatUrlsAndPointersAsImages True, treatUrlsAndPointersAsMedia False:
- Processes URLs and pointers as images
- Does not process other media types (video, audio)
treatUrlsAndPointersAsImages False, treatUrlsAndPointersAsMedia True:
- Invalid state since this is a conflict.
Both True:
- Processes URLs and pointers as various media types (images, videos, audio)

Note: these body parameters are used in both Marqo Open Source and Marqo Cloud. Marqo Cloud also has additional body parameters. Let's take a look at those now.

Additional Marqo Cloud Body Parameters

Marqo Cloud creates dedicated infrastructure for each index. Using the create index endpoint, you can specify the type of storage for the index storageClass and the type of inference inferenceType. The number of storage instances is defined by numberOfShards, the number of replicas numberOfReplicas and the number of Marqo inference nodes by numberOfInferences. This is only supported for Marqo Cloud, not Marqo Open Source.

Name	Type	Default value	Description	Open Source	Cloud
`inferenceType`	String	`marqo.CPU.small`	Type of inference for the index. Options are "marqo.CPU.small"(deprecated), "marqo.CPU.large", "marqo.GPU".	❌	✅
`storageClass`	String	`marqo.basic`	Type of storage for the index. Options are "marqo.basic", "marqo.balanced", "marqo.performance".	❌	✅
`numberOfShards`	Integer	`1`	The number of shards for the index.	❌	✅
`numberOfReplicas`	Integer	`0`	The number of replicas for the index.	❌	✅
`numberOfInferences`	Integer	`1`	The number of inference nodes for the index.	❌	✅

Text Preprocessing Object

The textPreprocessing object contains the specifics of how you want the index to preprocess text. The parameters are as follows:

Name	Type	Default value	Description
`splitLength`	Integer	`2`	The length of the chunks after splitting by split_method
`splitOverlap`	Integer	`0`	The length of overlap between adjacent chunks
`splitMethod`	String	`sentence`	The method by which text is chunked (`character`, `word`, `sentence`, or `passage`)

Image Preprocessing Object

The imagePreprocessing object contains the specifics of how you want the index to preprocess images. The parameters are as follows:

Name	Type	Default value	Description
`patchMethod`	String	`null`	The method by which images are chunked (`simple` or `frcnn`)

Video Preprocessing Object

The videoPreprocessing object contains the specifics of how you want the index to preprocess videos. The last chunk in the video file will have a start time of the total length of the video file minus the split length.

The parameters are as follows:

Name	Type	Default value	Description
`splitLength`	Integer	`20`	The length of the video chunks in seconds after splitting by split_method
`splitOverlap`	Integer	`3`	The length of overlap in seconds between adjacent chunks

Audio Preprocessing Object

The audioPreprocessing object contains the specifics of how you want the index to preprocess audio. The last chunk in the audio file will have a start time of the total length of the audio file minus the split length.

The parameters are as follows:

Name	Type	Default value	Description
`splitLength`	Integer	`20`	The length of the video chunks in seconds after splitting by split_method
`splitOverlap`	Integer	`3`	The length of overlap in seconds between adjacent chunks

ANN Algorithm Parameter object

The annParameters object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:

Name	Type	Default value	Description
`spaceType`	String	`prenormalized-angular`	The function used to measure the distance between two points in ANN (`angular`, `euclidean`, `dotproduct`, `geodegrees`, `hamming`, or `prenormalized-angular`).
`parameters`	Dict	`""`	The hyperparameters for the ANN method (which is always `hnsw` for Marqo).

HNSW Method Parameters Object

parameters can have the following values:

Name	Type	Default value	Description
`efConstruction`	int	`512`	The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096)
`m`	int	`16`	The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.

Model Properties Object

This flexible object, used by modelProperties is used to set up models that aren't available in Marqo by default ( models available by default are listed here). The structure of this object will vary depending on the model.

For Open CLIP models, see here for modelProperties format and example usage.

For Generic SBERT models, see here for modelProperties format and example usage.

Below is a sample index settings JSON object. When using the Python client, pass this dictionary as the settings_dict parameter for the create_index method.

{
  "type": "unstructured",
  "vectorNumericType": "float",
  "treatUrlsAndPointersAsImages": true,
  "model": "open_clip/ViT-L-14/laion2b_s32b_b82k",
  "normalizeEmbeddings": true,
  "textPreprocessing": {
    "splitLength": 2,
    "splitOverlap": 0,
    "splitMethod": "sentence"
  },
  "imagePreprocessing": {
    "patchMethod": null
  },
  "annParameters": {
    "spaceType": "prenormalized-angular",
    "parameters": {
      "efConstruction": 512,
      "m": 16
    }
  },
  "filterStringMaxLength": 20
}

Prefixes in Index Settings

Parameters: textChunkPrefix, textQueryPrefix

Expected value: A string.

Default value: ""

These fields override the model's default prefixes for text documents and queries. URLs pointing to images are not affected by these prefixes. If these fields are left undefined, Marqo will use the model's default prefixes. Currently, only the e5 series models have default prefixes defined.

Indexes built on Marqo 2.5 and below will not have prefixes added to any new documents, embeddings, or queries when read with Marqo 2.6 and above, even if the index’s model has default prefixes set.

Currently, Marqo adds the prefixes by default to e5 models since these are trained on data with prefixes. So, adding them to text chunks before embedding improves the quality of the embeddings. For more information, refer to the model card here

Example: Setting text chunk and query prefixes during index creation

Marqo Open SourceMarqo Cloud

cURLPython

cURL -X POST 'http://localhost:8882/indexes/my-first-index' \
-H "Content-Type: application/json" \
-d '{
    "textChunkPrefix": "passage: ",
    "textQueryPrefix": "query: ",
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
    "type": "unstructured"
}'

import marqo

mq = marqo.Client("http://localhost:8882", api_key=None)
mq.create_index(
    index_name="my-first-index",
    model="open_clip/ViT-B-32/laion2b_s34b_b79k",
    text_query_prefix="override query: ",
    text_chunk_prefix="override passage: ",
)

cURLPython

cURL -X POST 'https://api.marqo.ai/api/v2/indexes/my-first-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
    "textChunkPrefix": "passage: ",
    "textQueryPrefix": "query: ",
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
    "type": "unstructured"
    "numberOfShards": 1,
    "numberOfReplicas": 0,
    "inferenceType": "marqo.CPU.large",
    "storageClass": "marqo.basic",
    "numberOfInferences": 1
}'

import marqo

mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
mq.create_index(
    index_name="my-first-index",
    model="open_clip/ViT-B-32/laion2b_s34b_b79k",
    text_query_prefix="override query: ",
    text_chunk_prefix="override passage: ",
)

Create Index

Example

Response: 200 OK

Response: 200 OK

Path parameters

Body Parameters

Additional Marqo Cloud Body Parameters

Text Preprocessing Object

Image Preprocessing Object

Video Preprocessing Object

Audio Preprocessing Object

ANN Algorithm Parameter object

HNSW Method Parameters Object

Model Properties Object

Prefixes in Index Settings

Example: Setting text chunk and query prefixes during index creation

Response: `200 OK`

Response: `200 OK`