Skip to content

Getting started with Marqo

Here you'll find everything you need to get started building your first tensor search application.

This will walk you through setting up Marqo, adding indexes, documents, performing your first search, and other basic operations.

Setup and installation

We'll start with downloading and installing Marqo

  1. Marqo requires docker. To install docker go to Docker Docs.
  2. Use docker to run Marqo (users with M1 based Macs will need to go here):

    docker pull marqoai/marqo:0.0.13
    docker rm -f marqo
    docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:0.0.13
    
  3. Start indexing and searching! Let's look at a simple example below:

pip install marqo
import marqo

mq = marqo.Client(url='http://localhost:8882')

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts",
        "_id": "article_591"
    }]
)

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?"
)
curl -XPOST -H 'Content-type: application/json' http://localhost:8882/indexes/my-first-index/documents -d '[
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo'\''s travels"
    },
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, mobility, life support, and communications for astronauts",
        "_id": "article_591"
    }
]'

curl -XPOST -u admin:admin http://localhost:8882/indexes/my-first-index/search -d '{
  "q":"What is the best outfit to wear on the moon?"
}'
npm install marqo --save
var Marqo = require('marqo');
var mq = new Marqo.DefaultApi();
mq.addDocuments([{
        Title: "The Travels of Marco Polo",
        Description: "A 13th-century travelogue describing Polo's travels"
    }, {
        Title: "Extravehicular Mobility Unit (EMU)",
        Description: "The EMU is a spacesuit that provides environmental protection, mobility, life support, and communications for astronauts",
        _id: "article_591"
    }],
    "my-first-index"
).then((res) => console.log(res));
mq.search("What is the best outfit to wear on the moon?").then((res) => console.log(res));
  • mq is the client that wraps themarqo API
  • add_documents() takes a list of documents, represented as python dicts, for indexing
  • add_documents() creates an index with default settings, if one does not already exist
  • You can optionally set a document's ID with the special _id field. Otherwise, marqo will generate one.
  • If the index doesn't exist, Marqo will create it. If it exists then Marqo will add the documents to the index.


Let's have a look at the results:

# let's print out the results:
import pprint
pprint.pprint(results)

The result:

{
    'hits': [
        {   
            'Title': 'Extravehicular Mobility Unit (EMU)',
            'Description': 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and' 
                           'communications for astronauts',
            '_highlights': {
                'Description': 'The EMU is a spacesuit that provides environmental protection, '
                               'mobility, life support, and communications for astronauts'
            },
            '_id': 'article_591',
            '_score': 1.2387788
        }, 
        {   
            'Title': 'The Travels of Marco Polo',
            'Description': "A 13th-century travelogue describing Polo's travels",
            '_highlights': {'Title': 'The Travels of Marco Polo'},
            '_id': 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a',
            '_score': 1.2047464
        }
    ],
    'limit': 10,
    'processingTimeMs': 49,
    'query': 'What is the best outfit to wear on the moon?'
}
  • Each hit corresponds to a document that matched the search query
  • They are ordered from most to least matching
  • limit is the maximum number of hits to be returned. This can be set as a parameter during search
  • Each hit has a _highlights field. This was the part of the document that matched the query the best

Other basic operations

Get document

Retrieve a document by ID.

result = mq.index("my-first-index").get_document(document_id="article_591")

Note that by adding the document using add_documents again using the same _id will cause a document to be updated.

Get index stats

Get information about an index.

results = mq.index("my-first-index").get_stats()

Perform a keyword search. This uses BM25 for the retrieval ranking.

result =  mq.index("my-first-index").search('marco polo', search_method="LEXICAL")

Search specific fields

Using the default tensor search method

result = mq.index("my-first-index").search('adventure', searchable_attributes=['Title'])

To power image and text search, Marqo allows users to plug and play with CLIP models from HuggingFace. Note that if you do not configure multi modal search, image urls will be treated as strings. To start indexing and searching with images, first create an index with a CLIP configuration, as below:

settings = {
"treat_urls_and_pointers_as_images":True,   # allows us to find an image file and index it 
"model":"ViT-L/14"
}
response = mq.create_index("my-multimodal-index", **settings)

Images can then be added within documents as follows. You can use urls from the internet (for example S3) or from the disk of the machine:

response = mq.index("my-multimodal-index").add_documents([{
    "My Image": "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b3/Hipop%C3%B3tamo_%28Hippopotamus_amphibius%29%2C_parque_nacional_de_Chobe%2C_Botsuana%2C_2018-07-28%2C_DD_82.jpg/640px-Hipop%C3%B3tamo_%28Hippopotamus_amphibius%29%2C_parque_nacional_de_Chobe%2C_Botsuana%2C_2018-07-28%2C_DD_82.jpg",
    "Description": "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa",
    "_id": "hippo-facts"
}])

You can then search using text as usual. Both text and image fields will be searched:

results = mq.index("my-multimodal-index").search('animal')

Setting searchable_attributes to the image field ['My Image'] ensures only images are searched in this index:

results = mq.index("my-multimodal-index").search('animal',  searchable_attributes=['My Image'])

Searching using an image

Searching using an image can be achieved by providing the image link.

results = mq.index("my-multimodal-index").search('https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Standing_Hippopotamus_MET_DP248993.jpg/1920px-Standing_Hippopotamus_MET_DP248993.jpg')

Delete index

Delete an index.

results = mq.index("my-first-index").delete()

Delete documents

Delete documents.

results = mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])

Support

  • Join our Slack community and chat with other community members about ideas.
  • Marqo community meetings (coming soon!)