Skip to content

Optimising Search

Minimising tensor fields

Tensor embeddings are used to power Marqo's tensor search.

There is a natural tradeoff between tensor search and storage size. For certain fields, it may not be worth using tensor search, and thus, storing full embeddings for each field and document. For example, categorical fields such as a song's genre or a book's category may be strings, but are mainly useful in keyword/lexical search, or as conditions in pre-filtering tensor search.

Marqo provides the ability to tune this tradeoff. When adding documents, we recommend you avoiding adding fields with categorical data as tensor fields. These fields cannot be used in tensor search, but will reduce the storage size. For example:

import marqo

mq = marqo.Client(url='http://localhost:8882')

mq.create_index("my-first-index")

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels",
        "Genre": "History"
    }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                        "mobility, life support, and communications for astronauts",
        "Genre": "Science"
    }], tensor_fields=["Title", "Description"]
)
The above example will not store tensors against the "Genre" field, but we can still use it, for example:
## Use lexical search to find documents that match a specific genre
result = mq.index("my-first-index").search('Science', searchable_attributes=['Genre'], search_method="LEXICAL")

## Filter out search results that aren't from our target genre 
results = mq.index("my-first-index").search(q="Worms", filter_string="Genre:Biology")