Create Structured Index

Structured indexes in Marqo are tailored for datasets with a defined schema and are particularly effective for complex queries like sorting, grouping, and filtering. They are designed for fast, in-memory operations.

By default, the settings for structured index look like this. Settings can be set as the index is created.

POST /indexes/{index_name}

Create index with (optional) settings. This endpoint accepts the application/json content type.

Path parameters

Name	Type	Description
`index_name`	String	Name of the index

Body Parameters

The settings for the index are represented as a nested JSON object that contains the default settings for the index. The parameters are as follows:

Name	Type	Default value	Description
`allFields`	List	`-`	List of fields that might be indexed or queried. Valid only if `type` is `structured`
`tensorFields`	List	`[]`	List of fields that are treated as tensors
`model`	String	`hf/e5-base-v2`	The model to use to vectorise doc content in `add_documents()` calls for the index
`modelProperties`	Dictionary	`""`	The model properties object corresponding to `model` (for custom models)
`normalizeEmbeddings`	Boolean	`true`	Normalize the embeddings to have unit length
`textPreprocessing`	Dictionary	`""`	The text preprocessing object
`imagePreprocessing`	Dictionary	`""`	The image preprocessing object
`annParameters`	Dictionary	`""`	The ANN algorithm parameter object
`type`	String	`unstructured`	Type of the index. The default value is `unstructured`, but for the structured index this needs to be `structured`
`vectorNumericType`	String	`float`	Numeric type for vector encoding

Fields

The allFields object contains the fields that might be indexed or queried. Each field has the following parameters:

Name	Type	Default value	Description
`name`	String	`-`	Name of the field
`type`	String	`-`	Type of the field
`features`	List	`[]`	List of features that the field supports

Available types are:

Field Type	Description	Supported Features
`text`	Text field	`lexical_search`, `filter`
`int`	32-bit integer	`filter`, `score_modifier`
`float`	32-bit float	`filter`, `score_modifier`
`long`	64-bit integer	`filter`
`double`	64-bit float	`filter`
`array<text>`	Array of text	`lexical_search`, `filter`
`array<int>`	Array of 32-bit integers	`filter`
`array<float>`	Array of 32-bit floats	`filter`
`array<long>`	Array of 64-bit integers	`filter`
`array<double>`	Array of 64-bit floats	`filter`
`bool`	Boolean	`filter`
`multimodal_combination`	Multimodal combination field	None
`image_pointer`	Image URL. Must only be used with a multimodal model such as CLIP	None
`custom_vector`	Custom vector, with optional text for lexical/filtering	`lexical_search`, `filter`
`map<text, int>`	Map of text to integers	`score_modifier`
`map<text, long>`	Map of text to longs	`score_modifier`
`map<text, float>`	Map of text to floats	`score_modifier`
`map<text, double>`	Map of text to doubles	`score_modifier`

Available features are:

lexical_search: The field can be used for lexical search
filter: The field can be used for exact and range (numerical fields) filtering
score_modifier: The field can be used to modify the score of the document

When using multimodal_combination fields, the dependentFields object is used to define the weights for the multimodal combination field and is required. The dependentFields object is a dictionary where the keys are the names of the fields that are used to create the multimodal combination field and the values are the weights for each field. Field names must refer to fields that are defined in allFields. See the example below for more details.

Text Preprocessing Object

The textPreprocessing object contains the specifics of how you want the index to preprocess text. The parameters are as follows:

Name	Type	Default value	Description
`splitLength`	Integer	`2`	The length of the chunks after splitting by split_method
`splitOverlap`	Integer	`0`	The length of overlap between adjacent chunks
`splitMethod`	String	`sentence`	The method by which text is chunked (`character`, `word`, `sentence`, or `passage`)

Image Preprocessing Object

The imagePreprocessing object contains the specifics of how you want the index to preprocess images. The parameters are as follows:

Name	Type	Default value	Description
`patchMethod`	String	`null`	The method by which images are chunked (`simple` or `frcnn`)

ANN Algorithm Parameter object

The annParameters object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:

Name	Type	Default value	Description
`spaceType`	String	`prenormalized-angular`	The function used to measure the distance between two points in ANN (`angular`, `euclidean`, `dotproduct`, `geodegrees`, `hamming`, or `prenormalized-angular`).
`parameters`	Dict	`""`	The hyperparameters for the ANN method (which is always `hnsw` for Marqo).

HNSW Method Parameters Object

parameters can have the following values:

Name	Type	Default value	Description
`efConstruction`	int	`512`	The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096)
`m`	int	`16`	The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.

Model Properties Object

This flexible object, used by modelProperties is used to set up models that aren't available in Marqo by default ( models available by default are listed here). The structure of this object will vary depending on the model.

For Open CLIP models, see here for modelProperties format and example usage.

For Generic SBERT models, see here for modelProperties format and example usage.

Sample structured index settings

Here's a sample settings for a structured index using marqo client:

import marqo

settings = {
    "type": "structured",
    "vectorNumericType": "float",
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
    "normalizeEmbeddings": True,
    "textPreprocessing": {
        "splitLength": 2,
        "splitOverlap": 0,
        "splitMethod": "sentence",
    },
    "imagePreprocessing": {"patchMethod": None},
    "allFields": [
        {"name": "text_field", "type": "text", "features": ["lexical_search"]},
        {"name": "caption", "type": "text", "features": ["lexical_search", "filter"]},
        {"name": "tags", "type": "array<text>", "features": ["filter"]},
        {"name": "image_field", "type": "image_pointer"},
        {"name": "my_int", "type": "int", "features": ["score_modifier"]},
        {
            "name": "multimodal_field",
            "type": "multimodal_combination",
            "dependentFields": {"image_field": 0.9, "text_field": 0.1},
        },
    ],
    "tensorFields": ["multimodal_field"],
    "annParameters": {
        "spaceType": "prenormalized-angular",
        "parameters": {"efConstruction": 512, "m": 16},
    },
}

mq = marqo.Client(url="http://localhost:8882")

mq.create_index("my-first-structured-index", settings_dict=settings)