Skip to content

Documents

Add or replace documents

POST /indexes/{index_name}/documents

Add an array of documents or replace them if they already exist. If the provided index does not exist, it will be created.

If you send a document with an _id that corresponds to an existing document, the new document will overwrite the existing document.

This endpoint accepts the application/json content type.

Path parameters

Name Type Description
index_name String name of the index

Query parameters

Query Parameter Type Default Value Description
refresh Boolean true Forces a refresh after adding documents. This makes the documents available for searching. If you are happy to wait for the system to refresh, you can set this to false for better performance.
batch_size Integer 0 If this is greater than 0, documents will be added in these size batches. This reduces the number of internal IO operations and speeds up indexing. Useful to set this if indexing a large volume of docs. When using the Python client, use the parameter client_batch_size to set the size of batches sent from client to server, and server_batch_size to set the maximum batch size processed by the server. For a large volume of large docs, client_batch_size = 20 is a good default.
processes Integer 1 Tells Marqo to use these number of processes to index the documents. Increase this number to speed up indexing (at the cost of using more server resources).
device String null The device used to index the document. This allows you to use cuda GPUs to speed up indexing, if available. Defaults to the default device set on Marqo. Options include cpu and cuda, cuda1, cuda2 etc. The cuda option tells Marqo to use all available cuda devices.
non_tensor_fields Array of Strings [] The fields within these documents to not create tensors for. Tensor search cannot be performed on these fields in these documents; pre-filtering and lexical search are still viable.

Body

An array of documents. Each document is represented as a JSON object.

You can optionally set a document's ID with the special _id field. The _id must be string type. Otherwise, marqo will generate one.

[
  {
    "Title": "The Travels of Marco Polo",
    "Description": "A 13th-century travelogue describing Polo's travels"
  }, 
  {
    "Title": "Extravehicular Mobility Unit (EMU)",
    "Description": "The EMU is a spacesuit that provides environmental protection",
    "_id": "article_591"
  }
]

Example

curl -XPOST 'http://localhost:8882/indexes/my-first-index/documents?non_tensor_fields=Title&non_tensor_fields=Genre' \
-H 'Content-type:application/json' -d '
[ 
    {
         "Title": "The Travels of Marco Polo",
         "Description": "A 13th-century travelogue describing the travels of Polo",
         "Genre": "History"
      }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection",
        "_id": "article_591",
        "Genre": "Science"
    }
]'
mq.index("my-first-index").add_documents([
    {
         "Title": "The Travels of Marco Polo",
         "Description": "A 13th-century travelogue describing the travels of Polo",
         "Genre": "History"
      }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection",
        "_id": "article_591",
        "Genre": "Science"
    }], non_tensor_fields=["Title", "Genre"]
)
mq.addDocuments([{
        Title: "The Travels of Marco Polo",
        Description: "A 13th-century travelogue describing the travels of Polo"
    }, {
        Title: "Extravehicular Mobility Unit (EMU)",
        Description: "The EMU is a spacesuit that provides environmental protection",
        _id: "article_591"
    }],
    "my-first-index"
)

Response: 200 OK

{
   "errors":false,
   "items":[
      {
         "_id":"5aed93eb-3878-4f12-bc92-0fda01c7d23d",
         "result":"created",
         "status":201
      },
      {
         "_id":"article_591",
         "result":"updated",
         "status":200
      }
   ],
   "processingTimeMs":6,
   "index_name":"my-first-index"
}
The first document in this example had its _id generated by Marqo. In this example, there was already a document in Marqo with _id = article_591, so it was updated rather than created. In both the cURL and python examples, fields Title and Genre do not have tensors for these documents. They cannot be searched with tensor search. JS does not currently support non_tensor_fields.

Add or update documents

PUT /indexes/{index_name}/documents

Add an array of documents or update them if they already exist. If the provided index does not exist, it will be created.

If you send a document with an _id that corresponds to an existing document, the existing document will be partially updated with the content in the new document. Otherwise, a new document will be created.

If you are using this endpoint to update existing documents, we recommend only adding the fields that need to be updated in each document. This avoids redoing expensive indexing operations on existing fields.

This endpoint accepts the application/json content type.

Path parameters

Name Type Description
index_name String name of the index

Query parameters

Query Parameter Type Default Value Description
refresh Boolean true Forces a refresh after adding documents. This makes the documents available for searching. If you are happy to wait for the system to refresh, you can set this to false for better performance.
batch_size Integer 0 If this is greater than 0, documents will be added in these size batches. This reduces the number of internal IO operations and speeds up indexing. Useful to set this if indexing a large volume of docs.
processes Integer 1 Tells Marqo to use these number of processes to index the documents. Increase this number to speed up indexing (at the cost of using more server resources).
device String null The device used to index the document. This allows you to use cuda GPUs to speed up indexing, if available. Defaults to the default device set on Marqo. Options include cpu and cuda, cuda1, cuda2 etc. The cuda option tells Marqo to use all available cuda devices.
non_tensor_fields Array of Strings [] The fields within these documents to not create tensors for. Tensor search cannot be performed on these fields in these documents; pre-filtering and lexical search are still viable.

Body

An array of documents. Each document is represented as a JSON object.

You can optionally set a document's ID with the special _id field. The _id must be string type. Otherwise, marqo will generate one.

[
  {
    "Title": "The Travels of Marco Polo",
    "Description": "A 13th-century travelogue describing Polo's travels"
  }, 
  {
    "Title": "Extravehicular Mobility Unit (EMU)",
    "Description": "The EMU is a spacesuit that provides environmental protection",
    "_id": "article_591"
  }
]

Example

curl -XPUT 'http://localhost:8882/indexes/my-first-index/documents' -H 'Content-type:application/json' -d '
[ 
    {
         "Title": "The Travels of Marco Polo",
         "Description": "A 13th-century travelogue describing the travels of Polo"
      }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection",
        "_id": "article_591"
    }
]'
mq.index("my-first-index").update_documents([
    {
         "Title": "The Travels of Marco Polo",
         "Description": "A 13th-century travelogue describing the travels of Polo"
      }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection",
        "_id": "article_591"
    }]
)
mq.addDocuments([{
        Title: "The Travels of Marco Polo",
        Description: "A 13th-century travelogue describing the travels of Polo"
    }, {
        Title: "Extravehicular Mobility Unit (EMU)",
        Description: "The EMU is a spacesuit that provides environmental protection",
        _id: "article_591"
    }],
    "my-first-index"
)

Response: 200 OK

{
   "errors":false,
   "items":[
      {
         "_id":"5aed93eb-3878-4f12-bc92-0fda01c7d23d",
         "result":"created",
         "status":201
      },
      {
         "_id":"article_591",
         "result":"updated",
         "status":200
      }
   ],
   "processingTimeMs":6,
   "index_name":"my-first-index"
}
The first document in this example had its _id generated by Marqo. In this example, there was already a document in Marqo with _id = article_591, so it was updated rather than created.

Get one document

GET /indexes/{index_name}/documents/{document_id}
Gets a document using its ID.

Path parameters

Name Type Description
index_name String name of the index
document_id String ID of the document

Query parameters

Search parameter Type Default value Description
expose_facets Boolean False If true, the document's tensor facets are returned. This is a list of objects. Each facet object contains the data and its embedding (found in the facet's _embedding field)

Example

curl -XGET http://localhost:8882/indexes/my-first-index/documents/article_591?expose_facets=true
mq.index("my-first-index").get_document(
    document_id="article_591",
    expose_facets=True
)

Response

{'Blurb': 'A rocket car is a car powered by a rocket engine. This treatise '
          'proposes that rocket cars are the inevitable future of land-based '
          'transport.',
 'Title': 'Treatise on the viability of rocket cars',
 '_id': 'article_152',
 '_tensor_facets': [{'Title': 'Treatise on the viability of rocket cars',
                     '_embedding': [-0.10393160581588745,
                                    0.0465407557785511,
                                    -0.01760256476700306,
                                    ...]},
                    {'Blurb': 'A rocket car is a car powered by a rocket '
                              'engine. This treatise proposes that rocket cars '
                              'are the inevitable future of land-based '
                              'transport.',
                     '_embedding': [-0.045681700110435486,
                                    0.056278493255376816,
                                    0.022254955023527145,
                                    ...]}]
}
In this example, the GET document request was sent with the expose_facets parameter set to true. The _tensor_facets field is returned as a result. Within each facet, there is a key-value pair that holds the content of the facet, and an _embedding field, which is the content's vector representation.

Get multiple documents

GET /indexes/{index_name}/documents
Gets a selection of documents based on their IDs.

This endpoint accepts the application/json content type.

Path parameters

Name Type Description
index_name String name of the index

Query parameters

Search parameter Type Default value Description
expose_facets Boolean False If true, the document's tensor facets are returned. This is a list of objects. Each facet object contains the data and its embedding (found in the facet's _embedding field)

Body

An array of IDs. Each ID is a string.

["article_152", "article_490", "article_985"]

Example

curl -XGET http://localhost:8882/indexes/my-first-index/documents -H 'Content-Type: application/json' -d '
    ["article_152", "article_490", "article_985"]
'
mq.index("my-first-index").get_documents(
    document_ids=["article_152", "article_490", "article_985"]
)

Response

{'results': [{'Blurb': 'A rocket car is a car powered by a rocket engine. This '
                       'treatise proposes that rocket cars are the inevitable '
                       'future of land-based transport.',
              'Title': 'Treatise on the viability of rocket cars',
              '_found': true,
              '_id': 'article_152'},
             {'_found': false, '_id': 'article_490'},
             {'Blurb': "One must maintain one's space suite. It is, after all, "
                       'the tool that will help you explore distant galaxies.',
              'Title': 'Your space suit and you',
              '_found': true,
              '_id': 'article_985'}]}
In this response, the index has no document with and ID of article_490. As a result, the _found field is false.

Delete documents

Delete documents identified by an array of their ID's.

POST /indexes/{index-name}/documents/delete-batch

Path parameters

Name Type Description
index_name String name of the index

Body

An array of document IDs, to be deleted.

[ "article_591", "article_602" ]

Example

curl -XPOST  http://localhost:8882/indexes/my-first-index/documents/delete-batch -H 'Content-type:application/json' -d '[
  "article_591", "article_602"
]'
mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])

Response

{
  "index_name":"my-first-index",
  "status":"succeeded",
  "type":"documentDeletion",
  "details":{
    "receivedDocumentIds":2,
    "deletedDocuments":1
  },
  "duration":"PT0.084367S",
  "startedAt":"2022-09-01T05:11:31.790986Z",
  "finishedAt":"2022-09-01T05:11:31.875353Z"
}
In this example, one of the articles didn't exist in the index. Therefore, only one document was deleted.