Document field types
Strings
These are vectorised, if the field is specified is in tensor_fields
during index time.
Floats
These aren't vectorised, but can be used to filter search results.
Bools
These aren't vectorised, but can be used to filter search results.
Ints
These aren't vectorised, but can be used to filter search results.
Array
Currently, only arrays of strings are supported.
Array fields must not be a tensor field during index time, else an error will be thrown.
This type of field can be used to filter search results and for lexical search.
Example
# index an array field called "my_tags", making sure it is not a tensor field
mq.index("my-index").add_documents(documents=[
{"Title": "Cool summer t-shirt", "_id": "1234", 'my_tags': ['summer', 'yellow']}],
tensor_fields=['Title']
)
# do a search request that filters based on the tags
mq.index("my-index").search(
q="Something to wear in warm weather",
filter_string="(my_tags:yellow) AND (my_tags:summer)"
)
Multimodal combination object
The multimodal combination object works with mappings.
This field can consist of multiple child fields. The contents of these child fields will be vectorized
and combined into a single tensor using a weighted-sum approach. The weights are specified in mappings
.
Each child field must have an assigned weight.
The combined tensor will be used for tensor search.
The multimodal combination field must be in tensor_fields
.
Child fields can be used for lexical search or tensor search with filtering. All the child fields and
child fields content must be str
.
Note that only a single vector is generated for a multimodal combination object per document. No chunking is applied.
Example
# Create an index with "open_clip/ViT-B-32/laion2b_s34b_b79k" that can vectorise both text and images.
settings = {
"treat_urls_and_pointers_as_images": True,
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}
mq.create_index("my-index", **settings)
# We add the document into our index
mq.index("my-index").add_documents(
documents=[
{
"_id": "111",
"Title": "my document",
"my_text_attribute_1": "Riding horse",
"my_image_attribute_1": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image0.jpg",
}
],
tensor_fields=["combo_text_image"],
mappings={
"combo_text_image": {
"type": "multimodal_combination",
"weights": {"my_text_attribute_1": 0.5, "my_image_attribute_1": 0.5},
}
},
)
# tensor search
res = mq.index("my-index").search(q="Riding horse")
# lexical search
res = mq.index("my-index").search(
q="Riding horse",
search_method="LEXICAL",
)
# filter search
res = mq.index("my-index").search(
q="Riding horse",
filter_string="my_text_attribute_1:(Riding horse)",
)
Results
Here we have the search results for the filter search:
{
"hits": [
{
"Title": "my document",
"_highlights": [
{
"combo_text_image": "{'my_text_attribute_1': 'Riding horse"
"'my_image_attribute_1': 'https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image0.jpg'}"
}
],
"_id": "111",
"_score": 0.726511350523296,
"my_image_attribute_1": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image0.jpg",
"my_text_attribute_1": "Riding horse",
}
],
"limit": 10,
"offset": 0,
"processingTimeMs": 45,
"query": "Riding horse",
}
Custom vector object
The custom vector object allows you to insert your own vectors into a Marqo index. This is especially useful if you have vectors generated from another model that you want to use for search in Marqo.
For unstructured indexes, this field type requires mappings.
For structured indexes, this field type should only be declared upon index creation.
The mappings object for a custom vector should only have a type
key, which should be set to custom_vector
.
mappings = {"my_custom_vector": {"type": "custom_vector"}}
The document field content for a custom vector must be a dictionary with keys: vector
(required) and content
(optional). If content
is left empty, it will be assigned as empty string ""
.
Name | Type | Default | Description |
---|---|---|---|
vector |
List[Float] |
Required, List length must match the dimensions property of the model used in the index you are adding documents to. |
|
content |
String |
"" |
Optional, used for lexical search, filtering, and search result highlight. |
The custom vector field must be in tensor_fields
.
Note that only a single chunk (containing the given vector
) is generated for a custom vector object per document. No vectorisation in Marqo is done.
Marqo does not support custom vector fields being dependent fields of multimodal combination fields.
No normalization is done on custom vectors. Because of this, using prenormalized-angular
as your annParameters.spaceType
will result in unexpected behavior when searching with custom vectors.
Instead, set it to a different space type, such as angular
, euclidean
, etc.
Example
# Create an index with the model that has the dimensions of your custom vectors. For example: "open_clip/ViT-B-32/laion2b_s34b_b79k" (dimension is 512).
# Only the model dimension matters, as we are not vectorising anything when using custom vector fields.
# Space type CANNOT be 'prenormalized-angular' for custom vectors, as they are not normalized.
settings = {
"treat_urls_and_pointers_as_images": True,
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"ann_parameters": {
"spaceType": "angular",
"parameters": {"efConstruction": 512, "m": 16},
},
}
mq.create_index("my-first-index", **settings)
# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]
# We add the custom vector documents into our index (with mappings)
res = mq.index("my-first-index").add_documents(
documents=[
{
"_id": "doc1",
"my_custom_vector": {
# Put your own vector (of correct length) here.
"vector": example_vector_1,
"content": "Singing audio file",
},
},
{
"_id": "doc2",
"my_custom_vector": {
# Put your own vector (of correct length) here.
"vector": example_vector_2,
"content": "Podcast audio file",
},
},
],
mappings={"my_custom_vector": {"type": "custom_vector"}},
tensor_fields=["my_custom_vector"],
)
# Tensor Search
# Use the `context` search parameter to search with your own vectors.
res = mq.index("my-first-index").search(
q={"dummy text": 0},
context={
"tensor": [{"vector": example_vector_1, "weight": 1}] # custom vector from doc1
},
)
print(res)
# Lexical Search
# You can search for the text in the `content` field.
res = mq.index("my-first-index").search(q="Podcast audio file", search_method="lexical")
print(res)
# Filter search
res = mq.index("my-first-index").search(
q="A rider is riding a horse jumping over the barrier.",
filter_string="my_custom_vector:(Singing audio file)",
)
print(res)
# For structured indexes, the custom vector field should be declared upon index creation (with type `custom_vector`).
# Create an index with the model that has the dimensions of your custom vectors. For example: "open_clip/ViT-B-32/laion2b_s34b_b79k" (dimension is 512).
# Only the model dimension matters, as we are not vectorising anything when using custom vector fields.
# Space type CANNOT be 'prenormalized-angular' for custom vectors, as they are not normalized.
settings = {
"type": "structured",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"all_fields": [
{
"name": "my_custom_vector",
"type": "custom_vector",
"features": ["lexical_search", "filter"],
}
],
"tensor_fields": ["my_custom_vector"],
"ann_parameters": {
"spaceType": "angular",
"parameters": {"efConstruction": 512, "m": 16},
},
}
mq.create_index("my-first-structured-index", **settings)
# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]
# We add the custom vector documents into our structured index.
# We do NOT use mappings for custom vectors here.
res = mq.index("my-first-structured-index").add_documents(
documents=[
{
"_id": "doc1",
"my_custom_vector": {
# Put your own vector (of correct length) here.
"vector": example_vector_1,
"content": "Singing audio file",
},
},
{
"_id": "doc2",
"my_custom_vector": {
# Put your own vector (of correct length) here.
"vector": example_vector_2,
"content": "Podcast audio file",
},
},
]
)
# Tensor Search
# Use the `context` search parameter to search with your own vectors.
res = mq.index("my-first-structured-index").search(
q={"dummy text": 0},
context={
"tensor": [{"vector": example_vector_1, "weight": 1}] # custom vector from doc1
},
)
print(res)
# Lexical Search
# You can search for the text in the `content` field.
res = mq.index("my-first-structured-index").search(
q="Podcast audio file", search_method="lexical"
)
print(res)
# Filter search
res = mq.index("my-first-structured-index").search(
q="A rider is riding a horse jumping over the barrier.",
filter_string="my_custom_vector:(Singing audio file)",
)
print(res)