Skip to content

Mappings

The mappings object is a parameter (mappings) for an add_documents call. Mappings can be used for granular control over a field. Currently, it is supported for the multimodal_combination, custom_vector, and text_field field types.

When creating a structured index you define weights for a multimodal field under dependent fields. When adding documents mappings is optional with structured indexes and is only needed if the user needs to override default multimodal weights defined at index creation time.

Mappings is used to define custom_vector fields for unstructured indexes only. For structured indexes, do not include custom_vector fields in mappings. Instead, declare them as fields during index creation.

Language and stemming mappings are only supported for unstructured indexes created with Marqo 2.16 or later.


Mappings object

Multimodal Combination Mappings

Defining the mapping for multimodal_combination fields:

my_mappings = {
    "my_combination_field": {
        "type": "multimodal_combination",
        "weights": {"My_image": 0.5, "Some_text": 0.5},
    },
    "my_2nd_combination_field": {
        "type": "multimodal_combination",
        "weights": {"Title": -2.5, "Description": 0.3},
    },
}

Custom Vector Mappings

Defining the mapping for custom_vector fields (in an unstructured index):

my_mappings = {
    "my_custom_audio_vector_1": {"type": "custom_vector"},
    "my_custom_audio_vector_2": {"type": "custom_vector"},
}

Adding custom vector documents using that mapping object:

Unstructured Index

# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]

# Create the unstructured index
mq = marqo.Client("http://localhost:8882", api_key=None)
settings = {
    "type": "unstructured",
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}
mq.create_index("my-custom-vector-index", settings_dict=settings)

# Add the custom vectors
mq.index("my-custom-vector-index").add_documents(
    documents=[
        {
            "_id": "doc1",
            "my_custom_audio_vector_1": {
                # Put your own vector (of correct length) here.
                "vector": example_vector_1,
                "content": "Singing audio file",
            },
        },
        {
            "_id": "doc2",
            "my_custom_audio_vector_2": {
                # Put your own vector (of correct length) here.
                "vector": example_vector_2,
                "content": "Podcast audio file",
            },
        },
    ],
    tensor_fields=["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
    mappings=my_mappings,
)

For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.

# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]

# Create the unstructured index
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
settings = {
    "type": "unstructured",
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}
mq.create_index("my-custom-vector-index", settings_dict=settings)

# Add the custom vectors
mq.index("my-custom-vector-index").add_documents(
    documents=[
        {
            "_id": "doc1",
            "my_custom_audio_vector_1": {
                # Put your own vector (of correct length) here.
                "vector": example_vector_1,
                "content": "Singing audio file",
            },
        },
        {
            "_id": "doc2",
            "my_custom_audio_vector_2": {
                # Put your own vector (of correct length) here.
                "vector": example_vector_2,
                "content": "Podcast audio file",
            },
        },
    ],
    tensor_fields=["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
    mappings=my_mappings,
)

Structured Index

# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]

# Create the structured index
mq = marqo.Client("http://localhost:8882", api_key=None)
settings = {
    "type": "structured",
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
    "allFields": [
        {"name": "my_custom_audio_vector_1", "type": "custom_vector"},
        {"name": "my_custom_audio_vector_2", "type": "custom_vector"},
    ],
    "tensorFields": ["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
}
mq.create_index("my-structured-custom-vector-index", settings_dict=settings)

# Add the custom vectors
mq.index("my-structured-custom-vector-index").add_documents(
    documents=[
        {
            "_id": "doc1",
            "my_custom_audio_vector_1": {
                # Put your own vector (of correct length) here.
                "vector": example_vector_1,
                "content": "Singing audio file",
            },
        },
        {
            "_id": "doc2",
            "my_custom_audio_vector_2": {
                # Put your own vector (of correct length) here.
                "vector": example_vector_2,
                "content": "Podcast audio file",
            },
        },
    ]
)

For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.

# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]

# Create the structured index
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
settings = {
    "type": "structured",
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
    "allFields": [
        {"name": "my_custom_audio_vector_1", "type": "custom_vector"},
        {"name": "my_custom_audio_vector_2", "type": "custom_vector"},
    ],
    "tensorFields": ["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
}
mq.create_index("my-structured-custom-vector-index", settings_dict=settings)

# Add the custom vectors
mq.index("my-structured-custom-vector-index").add_documents(
    documents=[
        {
            "_id": "doc1",
            "my_custom_audio_vector_1": {
                # Put your own vector (of correct length) here.
                "vector": example_vector_1,
                "content": "Singing audio file",
            },
        },
        {
            "_id": "doc2",
            "my_custom_audio_vector_2": {
                # Put your own vector (of correct length) here.
                "vector": example_vector_2,
                "content": "Podcast audio file",
            },
        },
    ]
)

Text Field Language and Stemming Mappings

For unstructured indexes created with Marqo 2.16 or later, you can specify language and stemming method for text fields to control lexical search behavior:

my_mappings = {
    "title": {"type": "text_field", "language": "fr", "stemming": "multiple"},
    "description": {"type": "text_field", "language": "es", "stemming": "none"},
    "content": {"type": "text_field", "language": "en"},
}

Language mappings affect how text is processed for lexical search operations. The specified language is used for:

  • Tokenization
  • Stemming
  • Stop word removal
  • Other language-specific text processing

Stemming mappings affect if and how text is stemmed during indexing and search. The default stemming method is best. The following stemming methods are valid:

  • none: Index words without stemming
  • best: Choose the best stem
  • shortest: Choose the shortest stem
  • multiple: Retain all possible stems

Notes

  • A field's language and stemming method are set the first time it is indexed. Attempting to index the same field with a different language or stemming method will result in a 400 error.
  • If no language is specified for a new field, that field will use automatic language detection.
  • If language or stemming method is omitted in subsequent mappings, the previously set language or stemming method is used.
  • When searching indexes with more than one stemming method, always specify lexical searchable attributes. This ensures the query is expanded and stemmed for each text field, providing the best recall.

Supported Languages

The following language codes are supported:

  • Arabic (ar)
  • Catalan (ca)
  • Danish (da)
  • Dutch (nl)
  • English (en)
  • Finnish (fi)
  • French (fr)
  • German (de)
  • Greek (el)
  • Hungarian (hu)
  • Indonesian (id)
  • Irish (ga)
  • Italian (it)
  • Norwegian (nb)
  • Portuguese (pt)
  • Romanian (ro)
  • Russian (ru)
  • Spanish (es)
  • Swedish (sv)
  • Turkish (tr)

Example: Adding Multilingual Documents

import marqo

mq = marqo.Client("http://localhost:8882", api_key=None)

# Create unstructured index
mq.create_index(
    index_name="multilingual-index", type="unstructured", model="hf/e5-base-v2"
)

# Define language mappings
language_mappings = {
    "title_en": {"type": "text_field", "language": "en"},
    "title_fr": {"type": "text_field", "language": "fr"},
    "title_es": {"type": "text_field", "language": "es"},
}

# Add documents with language mappings
mq.index("multilingual-index").add_documents(
    documents=[
        {
            "_id": "doc1",
            "title_en": "Brown shoes for men",
            "title_fr": "Chaussures marron pour hommes",
            "title_es": "Zapatos marrones para hombres",
        }
    ],
    tensor_fields=["title_en"],
    mappings=language_mappings,
)
import marqo

mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")

# Create unstructured index
mq.create_index(
    index_name="multilingual-index", type="unstructured", model="hf/e5-base-v2"
)

# Define language mappings
language_mappings = {
    "title_en": {"type": "text_field", "language": "en"},
    "title_fr": {"type": "text_field", "language": "fr"},
    "title_es": {"type": "text_field", "language": "es"},
}

# Add documents with language mappings
mq.index("multilingual-index").add_documents(
    documents=[
        {
            "_id": "doc1",
            "title_en": "Brown shoes for men",
            "title_fr": "Chaussures marron pour hommes",
            "title_es": "Zapatos marrones para hombres",
        }
    ],
    tensor_fields=["title_en"],
    mappings=language_mappings,
)

Example: Searching with Language Mappings

Once you've indexed documents with language mappings, you can search them using the language parameter:

# Search in French
results = mq.index("multilingual-index").search(
    q="chaussures marron",
    search_method="LEXICAL",
    language="fr",
    searchable_attributes=["title_fr"],
)

# Hybrid search combining tensor and language-specific lexical search
results = mq.index("multilingual-index").search(
    q="brown shoes",
    search_method="HYBRID",
    language="en",
    hybrid_parameters={"retrievalMethod": "disjunction", "rankingMethod": "rrf"},
)
# Search in French
results = mq.index("multilingual-index").search(
    q="chaussures marron",
    search_method="LEXICAL",
    language="fr",
    searchable_attributes=["title_fr"],
)

# Hybrid search combining tensor and language-specific lexical search
results = mq.index("multilingual-index").search(
    q="brown shoes",
    search_method="HYBRID",
    language="en",
    hybrid_parameters={"retrievalMethod": "disjunction", "rankingMethod": "rrf"},
)