Skip to content

Getting started with Marqo

Here you'll find everything you need to get started building your first end-to-end vector search application.

This will walk you through setting up Marqo, adding indexes, documents, performing your first search, and other basic operations.


Setup and installation

We'll start with downloading and installing Marqo

  1. Marqo requires docker. To install docker go to Docker Docs.
  2. Use docker to run Marqo:

    docker pull marqoai/marqo:latest
    docker rm -f marqo
    docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest
    
  3. Start indexing and searching! Let's look at a simple example below:

pip install marqo
import marqo

mq = marqo.Client(url="http://localhost:8882")

mq.create_index("my-first-index", model="hf/e5-base-v2")

mq.index("my-first-index").add_documents(
    [
        {
            "Title": "The Travels of Marco Polo",
            "Description": "A 13th-century travelogue describing Polo's travels",
        },
        {
            "Title": "Extravehicular Mobility Unit (EMU)",
            "Description": "The EMU is a spacesuit that provides environmental protection, "
            "mobility, life support, and communications for astronauts",
            "_id": "article_591",
        },
    ],
    tensor_fields=["Description"],
)

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?"
)
curl -X POST -H 'Content-type: application/json' http://localhost:8882/indexes/my-first-index -d '{
        "model": "hf/e5-base-v2"
    }'

curl -X POST -H 'Content-type: application/json' http://localhost:8882/indexes/my-first-index/documents -d '{
    "documents":[
        {
            "Title": "The Travels of Marco Polo",
            "Description": "A 13th-century travelogue describing Polo'\''s travels"
        },
        {
            "Title": "Extravehicular Mobility Unit (EMU)",
            "Description": "The EMU is a spacesuit that provides environmental protection, mobility, life support, and communications for astronauts",
            "_id": "article_591"
        }
    ],
    "tensorFields": ["Description"]
}'

curl -X POST -H 'Content-type: application/json' http://localhost:8882/indexes/my-first-index/search  -d '{
  "q":"What is the best outfit to wear on the moon?"
}'
  • mq is the client that wraps themarqo API
  • create_index() creates a new index with default settings. We optionally specify the model to be hf/e5-base-v2 which also the default model. Experimentation with different models is often required to achieve the best retrieval for your specific use case. Different models also offer a tradeoff between inference speed and relevancy. See here for the full list of models.
  • add_documents() takes a list of documents, represented as python dicts, for indexing
  • The tensor_fields parameter in indexes specifies which fields should be indexed as tensor fields, and searchable with vector search.
  • You can optionally set a document's ID with the special _id field. Otherwise, marqo will generate one.


Let's have a look at the results:

# let's print out the results:
import pprint

pprint.pprint(results)

The result:

{
    "hits": [
        {   
            "Title": "Extravehicular Mobility Unit (EMU)",
            "Description": "The EMU is a spacesuit that provides environmental protection, mobility, life support, and" 
                           "communications for astronauts",
            "_highlights": [{
                "Description": "The EMU is a spacesuit that provides environmental protection, "
                               "mobility, life support, and communications for astronauts"
            }],
            "_id": "article_591",
            "_score": 1.2387788
        }, 
        {   
            "Title": "The Travels of Marco Polo",
            "Description": "A 13th-century travelogue describing Polo's travels",
            "_highlights": [{"Title": "The Travels of Marco Polo"}],
            "_id": "e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a",
            "_score": 1.2047464
        }
    ],
    "limit": 10,
    "processingTimeMs": 49,
    "query": "What is the best outfit to wear on the moon?"
}
  • Each hit corresponds to a document that matched the search query
  • They are ordered from most to least matching
  • limit is the maximum number of hits to be returned. This can be set as a parameter during search
  • Each hit has a _highlights field. This was the part of the document that matched the query the best

Other basic operations

Get document

Retrieve a document by ID.

result = mq.index("my-first-index").get_document(document_id="article_591")

Note that by adding the document using add_documents again using the same _id will cause a document to be updated.

Get index stats

Get information about an index.

results = mq.index("my-first-index").get_stats()

Perform a keyword search. This uses BM25 for the retrieval ranking.

result = mq.index("my-first-index").search("marco polo", search_method="LEXICAL")

To power image and text search, Marqo allows users to plug and play with CLIP models from HuggingFace. Note that if you do not configure multi modal search, image urls will be treated as strings. To start indexing and searching with images, first create an index with a CLIP configuration, as below:

settings = {
    "treat_urls_and_pointers_as_images": True,  # allows us to find an image file and index it
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}
response = mq.create_index("my-multimodal-index", **settings)

Images can then be added within documents as follows. You can use urls from the internet (for example S3) or from the disk of the machine:

response = mq.index("my-multimodal-index").add_documents(
    [
        {
            "My_Image": "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b3/Hipop%C3%B3tamo_%28Hippopotamus_amphibius%29%2C_parque_nacional_de_Chobe%2C_Botsuana%2C_2018-07-28%2C_DD_82.jpg/640px-Hipop%C3%B3tamo_%28Hippopotamus_amphibius%29%2C_parque_nacional_de_Chobe%2C_Botsuana%2C_2018-07-28%2C_DD_82.jpg",
            "Description": "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa",
            "_id": "hippo-facts",
        }
    ],
    tensor_fields=["My_Image"],
)

You can then search using text as usual.

results = mq.index("my-multimodal-index").search("animal")

Searching using an image

Searching using an image can be achieved by providing the image link.

results = mq.index("my-multimodal-index").search(
    "https://docs.marqo.ai/2.0.0/Examples/marqo.jpg"
)

Searching using weights in queries

Queries can also be provided as dictionaries where each key is a query and their corresponding values are weights. This allows for more advanced queries consisting of multiple components with weightings towards or against them, queries can have negations via negative weighting.

The example below shows the application of this to a scenario where a user may want to ask a question but also negate results that match a certain semantic criterion.

import marqo
import pprint

mq = marqo.Client(url="http://localhost:8882")
mq.create_index("my-weighted-query-index")

mq.index("my-weighted-query-index").add_documents(
    [
        {
            "Title": "Smartphone",
            "Description": "A smartphone is a portable computer device that combines mobile telephone "
            "functions and computing functions into one unit.",
        },
        {
            "Title": "Telephone",
            "Description": "A telephone is a telecommunications device that permits two or more users to"
            "conduct a conversation when they are too far apart to be easily heard directly.",
        },
        {
            "Title": "Thylacine",
            "Description": "The thylacine, also commonly known as the Tasmanian tiger or Tasmanian wolf, "
            "is an extinct carnivorous marsupial."
            "The last known of its species died in 1936.",
        },
    ],
    tensor_fields=["Description"],
)

# initially we ask for a type of communications device which is popular in the 21st century
query = {
    # a weighting of 1.1 gives this query slightly more importance
    "I need to buy a communications device, what should I get?": 1.1,
    # a weighting of 1 gives this query a neutral importance
    "Technology that became prevelant in the 21st century": 1.0,
}

results = mq.index("my-weighted-query-index").search(q=query)

print("Query 1:")
pprint.pprint(results)

# now we ask for a type of communications which predates the 21st century
query = {
    # a weighting of 1 gives this query a neutral importance
    "I need to buy a communications device, what should I get?": 1.0,
    # a weighting of -1 gives this query a negation effect
    "Technology that became prevelant in the 21st century": -1.0,
}

results = mq.index("my-weighted-query-index").search(q=query)

print("\nQuery 2:")
pprint.pprint(results)

Creating and searching indexes with multimodal combination fields

Marqo lets you have indexes with multimodal combination fields. Multimodal combination fields can combine text and images into one field. This allows scoring of documents across the combined text and image fields together. It also allows for a single vector representation instead of needing many which saves on storage. The relative weighting of each component can be set per document.

The example below demonstrates this with retrieval of caption and image pairs using multiple types of queries.

import marqo
import pprint

mq = marqo.Client(url="http://localhost:8882")

settings = {
    "treat_urls_and_pointers_as_images": True,
    "model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}

mq.create_index("my-first-multimodal-index", **settings)

mq.index("my-first-multimodal-index").add_documents(
    [
        {
            "Title": "Flying Plane",
            "caption": "An image of a passenger plane flying in front of the moon.",
            "image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg",
        },
        {
            "Title": "Red Bus",
            "caption": "A red double decker London bus traveling to Aldwych",
            "image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg",
        },
        {
            "Title": "Horse Jumping",
            "caption": "A person riding a horse over a jump in a competition.",
            "image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image1.jpg",
        },
    ],
    # note that captioned_image must be a tensor field
    tensor_fields=["captioned_image"],
    # Create the mappings, here we define our captioned_image mapping
    # which weights the image more heavily than the caption - these pairs
    # will be represented by a single vector in the index
    mappings={
        "captioned_image": {
            "type": "multimodal_combination",
            "weights": {"caption": 0.3, "image": 0.7},
        }
    },
)

# Search this index with a simple text query
results = mq.index("my-first-multimodal-index").search(
    q="Give me some images of vehicles and modes of transport. I am especially interested in air travel and commercial aeroplanes."
)

print("Query 1:")
pprint.pprint(results)

# search the index with a query that uses weighted components
results = mq.index("my-first-multimodal-index").search(
    q={
        "What are some vehicles and modes of transport?": 1.0,
        "Aeroplanes and other things that fly": -1.0,
    }
)
print("\nQuery 2:")
pprint.pprint(results)

results = mq.index("my-first-multimodal-index").search(
    q={"Animals of the Perissodactyla order": -1.0}
)
print("\nQuery 3:")
pprint.pprint(results)

Delete documents

Delete documents.

results = mq.index("my-first-index").delete_documents(
    ids=["article_591", "article_602"]
)

Delete index

Delete an index.

results = mq.index("my-first-index").delete()

Support

  • Join our Slack community and chat with other community members about ideas.
  • Marqo community meetings (coming soon!)