Mappings
The mappings object is a parameter (mappings
) for an add_documents
call. Mappings can be used for granular control over a field.
Currently, it is supported for the multimodal_combination
, custom_vector
, and text_field
field types.
When creating a structured index you define weights for a multimodal field under dependent fields. When adding
documents mappings
is optional with structured indexes and is only needed if the user needs to override default
multimodal weights defined at index creation time.
Mappings is used to define custom_vector
fields for unstructured indexes only. For structured indexes, do not include custom_vector
fields in mappings. Instead, declare them as fields during index creation.
Language mappings are only supported for unstructured indexes created with Marqo 2.16 or later.
Mappings object
Multimodal Combination Mappings
Defining the mapping for multimodal_combination
fields:
my_mappings = {
"my_combination_field": {
"type": "multimodal_combination",
"weights": {"My_image": 0.5, "Some_text": 0.5},
},
"my_2nd_combination_field": {
"type": "multimodal_combination",
"weights": {"Title": -2.5, "Description": 0.3},
},
}
Custom Vector Mappings
Defining the mapping for custom_vector
fields (in an unstructured index):
my_mappings = {
"my_custom_audio_vector_1": {"type": "custom_vector"},
"my_custom_audio_vector_2": {"type": "custom_vector"},
}
Adding custom vector documents using that mapping object:
Unstructured Index
# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]
# Create the unstructured index
mq = marqo.Client("http://localhost:8882", api_key=None)
settings = {
"type": "unstructured",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}
mq.create_index("my-custom-vector-index", settings_dict=settings)
# Add the custom vectors
mq.index("my-custom-vector-index").add_documents(
documents=[
{
"_id": "doc1",
"my_custom_audio_vector_1": {
# Put your own vector (of correct length) here.
"vector": example_vector_1,
"content": "Singing audio file",
},
},
{
"_id": "doc2",
"my_custom_audio_vector_2": {
# Put your own vector (of correct length) here.
"vector": example_vector_2,
"content": "Podcast audio file",
},
},
],
tensor_fields=["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
mappings=my_mappings,
)
For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint
with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.
# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]
# Create the unstructured index
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
settings = {
"type": "unstructured",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}
mq.create_index("my-custom-vector-index", settings_dict=settings)
# Add the custom vectors
mq.index("my-custom-vector-index").add_documents(
documents=[
{
"_id": "doc1",
"my_custom_audio_vector_1": {
# Put your own vector (of correct length) here.
"vector": example_vector_1,
"content": "Singing audio file",
},
},
{
"_id": "doc2",
"my_custom_audio_vector_2": {
# Put your own vector (of correct length) here.
"vector": example_vector_2,
"content": "Podcast audio file",
},
},
],
tensor_fields=["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
mappings=my_mappings,
)
Structured Index
# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]
# Create the structured index
mq = marqo.Client("http://localhost:8882", api_key=None)
settings = {
"type": "structured",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"allFields": [
{"name": "my_custom_audio_vector_1", "type": "custom_vector"},
{"name": "my_custom_audio_vector_2", "type": "custom_vector"},
],
"tensorFields": ["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
}
mq.create_index("my-structured-custom-vector-index", settings_dict=settings)
# Add the custom vectors
mq.index("my-structured-custom-vector-index").add_documents(
documents=[
{
"_id": "doc1",
"my_custom_audio_vector_1": {
# Put your own vector (of correct length) here.
"vector": example_vector_1,
"content": "Singing audio file",
},
},
{
"_id": "doc2",
"my_custom_audio_vector_2": {
# Put your own vector (of correct length) here.
"vector": example_vector_2,
"content": "Podcast audio file",
},
},
]
)
For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint
with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.
# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]
# Create the structured index
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
settings = {
"type": "structured",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"allFields": [
{"name": "my_custom_audio_vector_1", "type": "custom_vector"},
{"name": "my_custom_audio_vector_2", "type": "custom_vector"},
],
"tensorFields": ["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
}
mq.create_index("my-structured-custom-vector-index", settings_dict=settings)
# Add the custom vectors
mq.index("my-structured-custom-vector-index").add_documents(
documents=[
{
"_id": "doc1",
"my_custom_audio_vector_1": {
# Put your own vector (of correct length) here.
"vector": example_vector_1,
"content": "Singing audio file",
},
},
{
"_id": "doc2",
"my_custom_audio_vector_2": {
# Put your own vector (of correct length) here.
"vector": example_vector_2,
"content": "Podcast audio file",
},
},
]
)
Text Field Language Mappings
For unstructured indexes created with Marqo 2.16 or later, you can specify a language for text fields to control lexical search behavior:
my_mappings = {
"title": {"type": "text_field", "language": "fr"},
"description": {"type": "text_field", "language": "es"},
"content": {"type": "text_field", "language": "en"},
}
Language mappings affect how text is processed for lexical search operations. The specified language is used for:
- Tokenization
- Stemming
- Stop word removal
- Other language-specific text processing
Notes
- A field's language is set the first time it is indexed. Attempting to index the same field with a different language will result in a 400 error.
- If no language is specified for a new field, that field will use automatic language detection.
- Omitting the language in subsequent mappings uses the previously set language.
Supported Languages
The following language codes are supported:
- Arabic (
ar
) - Catalan (
ca
) - Danish (
da
) - Dutch (
nl
) - English (
en
) - Finnish (
fi
) - French (
fr
) - German (
de
) - Greek (
el
) - Hungarian (
hu
) - Indonesian (
id
) - Irish (
ga
) - Italian (
it
) - Norwegian (
nb
) - Portuguese (
pt
) - Romanian (
ro
) - Russian (
ru
) - Spanish (
es
) - Swedish (
sv
) - Turkish (
tr
)
Example: Adding Multilingual Documents
import marqo
mq = marqo.Client("http://localhost:8882", api_key=None)
# Create unstructured index
mq.create_index(
index_name="multilingual-index",
type="unstructured",
model="hf/e5-base-v2"
)
# Define language mappings
language_mappings = {
"title_en": {"type": "text_field", "language": "en"},
"title_fr": {"type": "text_field", "language": "fr"},
"title_es": {"type": "text_field", "language": "es"},
}
# Add documents with language mappings
mq.index("multilingual-index").add_documents(
documents=[
{
"_id": "doc1",
"title_en": "Brown shoes for men",
"title_fr": "Chaussures marron pour hommes",
"title_es": "Zapatos marrones para hombres"
}
],
tensor_fields=["title_en"],
mappings=language_mappings
)
import marqo
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
# Create unstructured index
mq.create_index(
index_name="multilingual-index",
type="unstructured",
model="hf/e5-base-v2"
)
# Define language mappings
language_mappings = {
"title_en": {"type": "text_field", "language": "en"},
"title_fr": {"type": "text_field", "language": "fr"},
"title_es": {"type": "text_field", "language": "es"},
}
# Add documents with language mappings
mq.index("multilingual-index").add_documents(
documents=[
{
"_id": "doc1",
"title_en": "Brown shoes for men",
"title_fr": "Chaussures marron pour hommes",
"title_es": "Zapatos marrones para hombres"
}
],
tensor_fields=["title_en"],
mappings=language_mappings
)
Example: Searching with Language Mappings
Once you've indexed documents with language mappings, you can search them using the language parameter:
# Search in French
results = mq.index("multilingual-index").search(
q="chaussures marron",
search_method="LEXICAL",
language="fr",
searchable_attributes=["title_fr"]
)
# Hybrid search combining tensor and language-specific lexical search
results = mq.index("multilingual-index").search(
q="brown shoes",
search_method="HYBRID",
language="en",
hybrid_parameters={
"retrievalMethod": "disjunction",
"rankingMethod": "rrf"
}
)
# Search in French
results = mq.index("multilingual-index").search(
q="chaussures marron",
search_method="LEXICAL",
language="fr",
searchable_attributes=["title_fr"]
)
# Hybrid search combining tensor and language-specific lexical search
results = mq.index("multilingual-index").search(
q="brown shoes",
search_method="HYBRID",
language="en",
hybrid_parameters={
"retrievalMethod": "disjunction",
"rankingMethod": "rrf"
}
)