Skip to content

Multimodal Unstructured Indexes

For documentation on unstructured indexes see here.

Unstructured indexes have the multimodal combination fields defined during indexing. Multimodal combination field mappings can be different for each document though for most use cases it is best to keep them consistent.

In this section we will go over how to create a simple unstructured index with a multimodal field to vectorise.

Minimal example of creating an untructured multimodal index

For Marqo to know that it needs to identify image URLs and download the images we need to set the treatUrlsAndPointersAsImages setting to True. Marqo will automatically detect image URLs, fetch the image, and vectorise it with the CLIP model.

import marqo

settings = {
    "model": "open_clip/ViT-L-14/laion2b_s32b_b82k",
    "treatUrlsAndPointersAsImages": True,
}

mq = marqo.Client(url="http://localhost:8882")

mq.create_index("my-mm-unstructured-index", settings_dict=settings)

Index Settings: model

The model field is used to specify the model to use for the vectorisation. To do multimodal search this must be a CLIP model.

Example Add Documents Usage

Because we are using an unstructured index we need to specify the tensor fields or the multimodal mappings in add documents.

documents = [
    {
        "_id": "1",
        "text_field": "New York",
        "image_field": "https://example.com/image.jpg",
    },
    {
        "_id": "2",
        "text_field": "Los Angeles",
        "image_field": "https://example.com/image2.jpg",
    },
]

mq.index("my-mm-unstructured-index").add_documents(
    documents,
    mappings={
        "multimodal_field": {
            "type": "multimodal_combination",
            "weights": {"text_field": 0.1, "image_field": 0.9},
        }
    },
    tensor_fields=["multimodal_field"],
)

Example Search Usage

This index will allow us to search the multimodal_field with the TENSOR search method and the text_field field using the LEXICAL search method.

Tensor Search (search_method="TENSOR" is the default):

results = mq.index("my-mm-unstructured-index").search(q="New York")

Lexical Search:

results = mq.index("my-mm-unstructured-index").search(
    q="New York", search_method="LEXICAL"
)