Documents
Add or replace documents
POST /indexes/{index_name}/documents
Add an array of documents or replace them if they already exist. If the provided index does not exist, it will be created.
If you send a document with an _id
that corresponds to an existing document, the new document will overwrite the existing document.
This endpoint accepts the application/json
content type.
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the index |
Query parameters
Query Parameter | Type | Default Value | Description |
---|---|---|---|
refresh |
Boolean | true |
Forces a refresh after adding documents. This makes the documents available for searching. If you are happy to wait for the system to refresh, you can set this to false for better performance. |
batch_size |
Integer | 0 |
If this is greater than 0, documents will be added in these size batches. This reduces the number of internal IO operations and speeds up indexing. Useful to set this if indexing a large volume of docs. When using the Python client, use the parameter client_batch_size to set the size of batches sent from client to server, and server_batch_size to set the maximum batch size processed by the server. For a large volume of large docs, client_batch_size = 20 is a good default. |
processes |
Integer | 1 |
Tells Marqo to use these number of processes to index the documents. Increase this number to speed up indexing (at the cost of using more server resources). |
device |
String | null |
The device used to index the document. This allows you to use cuda GPUs to speed up indexing, if available. Defaults to the default device set on Marqo. Options include cpu and cuda , cuda1 , cuda2 etc. The cuda option tells Marqo to use all available cuda devices. |
non_tensor_fields |
Array of Strings | [] |
The fields within these documents to not create tensors for. Tensor search cannot be performed on these fields in these documents; pre-filtering and lexical search are still viable. |
Body
An array of documents. Each document is represented as a JSON object.
You can optionally set a document's ID with the special _id
field. The _id
must be string type. Otherwise, marqo will generate one.
[
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591"
}
]
Example
curl -XPOST 'http://localhost:8882/indexes/my-first-index/documents?non_tensor_fields=Title&non_tensor_fields=Genre' \
-H 'Content-type:application/json' -d '
[
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing the travels of Polo",
"Genre": "History"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591",
"Genre": "Science"
}
]'
mq.index("my-first-index").add_documents([
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing the travels of Polo",
"Genre": "History"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591",
"Genre": "Science"
}], non_tensor_fields=["Title", "Genre"]
)
mq.addDocuments([{
Title: "The Travels of Marco Polo",
Description: "A 13th-century travelogue describing the travels of Polo"
}, {
Title: "Extravehicular Mobility Unit (EMU)",
Description: "The EMU is a spacesuit that provides environmental protection",
_id: "article_591"
}],
"my-first-index"
)
Response: 200 OK
{
"errors":false,
"items":[
{
"_id":"5aed93eb-3878-4f12-bc92-0fda01c7d23d",
"result":"created",
"status":201
},
{
"_id":"article_591",
"result":"updated",
"status":200
}
],
"processingTimeMs":6,
"index_name":"my-first-index"
}
_id
generated by Marqo.
In this example, there was already a document in Marqo with _id
= article_591
, so it was updated
rather than created
.
In both the cURL and python examples, fields Title
and Genre
do not have tensors for these documents. They cannot be searched with tensor search. JS does not currently support non_tensor_fields
.
Add or update documents
PUT /indexes/{index_name}/documents
Add an array of documents or update them if they already exist. If the provided index does not exist, it will be created.
If you send a document with an _id
that corresponds to an existing document, the existing document will be partially updated with
the content in the new document. Otherwise, a new document will be created.
If you are using this endpoint to update existing documents, we recommend only adding the fields that need to be updated in each document. This avoids redoing expensive indexing operations on existing fields.
This endpoint accepts the application/json
content type.
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the index |
Query parameters
Query Parameter | Type | Default Value | Description |
---|---|---|---|
refresh |
Boolean | true |
Forces a refresh after adding documents. This makes the documents available for searching. If you are happy to wait for the system to refresh, you can set this to false for better performance. |
batch_size |
Integer | 0 |
If this is greater than 0, documents will be added in these size batches. This reduces the number of internal IO operations and speeds up indexing. Useful to set this if indexing a large volume of docs. |
processes |
Integer | 1 |
Tells Marqo to use these number of processes to index the documents. Increase this number to speed up indexing (at the cost of using more server resources). |
device |
String | null |
The device used to index the document. This allows you to use cuda GPUs to speed up indexing, if available. Defaults to the default device set on Marqo. Options include cpu and cuda , cuda1 , cuda2 etc. The cuda option tells Marqo to use all available cuda devices. |
non_tensor_fields |
Array of Strings | [] |
The fields within these documents to not create tensors for. Tensor search cannot be performed on these fields in these documents; pre-filtering and lexical search are still viable. |
Body
An array of documents. Each document is represented as a JSON object.
You can optionally set a document's ID with the special _id
field. The _id
must be string type. Otherwise, marqo will generate one.
[
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591"
}
]
Example
curl -XPUT 'http://localhost:8882/indexes/my-first-index/documents' -H 'Content-type:application/json' -d '
[
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing the travels of Polo"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591"
}
]'
mq.index("my-first-index").add_documents([
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing the travels of Polo"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591"
}]
)
mq.addDocuments([{
Title: "The Travels of Marco Polo",
Description: "A 13th-century travelogue describing the travels of Polo"
}, {
Title: "Extravehicular Mobility Unit (EMU)",
Description: "The EMU is a spacesuit that provides environmental protection",
_id: "article_591"
}],
"my-first-index"
)
Response: 200 OK
{
"errors":false,
"items":[
{
"_id":"5aed93eb-3878-4f12-bc92-0fda01c7d23d",
"result":"created",
"status":201
},
{
"_id":"article_591",
"result":"updated",
"status":200
}
],
"processingTimeMs":6,
"index_name":"my-first-index"
}
_id
generated by Marqo.
In this example, there was already a document in Marqo with _id
= article_591
, so it was updated
rather than created
.
Get one document
GET /indexes/{index_name}/documents/{document_id}
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the index |
document_id |
String | ID of the document |
Query parameters
Search parameter | Type | Default value | Description |
---|---|---|---|
expose_facets |
Boolean | False | If true, the document's tensor facets are returned. This is a list of objects. Each facet object contains the data and its embedding (found in the facet's _embedding field) |
Example
curl -XGET http://localhost:8882/indexes/my-first-index/documents/article_591?expose_facets=true
mq.index("my-first-index").get_document(
document_id="article_591",
expose_facets=True
)
Response
{'Blurb': 'A rocket car is a car powered by a rocket engine. This treatise '
'proposes that rocket cars are the inevitable future of land-based '
'transport.',
'Title': 'Treatise on the viability of rocket cars',
'_id': 'article_152',
'_tensor_facets': [{'Title': 'Treatise on the viability of rocket cars',
'_embedding': [-0.10393160581588745,
0.0465407557785511,
-0.01760256476700306,
...]},
{'Blurb': 'A rocket car is a car powered by a rocket '
'engine. This treatise proposes that rocket cars '
'are the inevitable future of land-based '
'transport.',
'_embedding': [-0.045681700110435486,
0.056278493255376816,
0.022254955023527145,
...]}]
}
GET document
request was sent with the expose_facets
parameter set to true
.
The _tensor_facets
field is returned as a result. Within each facet, there is a key-value pair that
holds the content of the facet, and an _embedding
field, which is the content's vector representation.
Get multiple documents
GET /indexes/{index_name}/documents
This endpoint accepts the application/json
content type.
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the index |
Query parameters
Search parameter | Type | Default value | Description |
---|---|---|---|
expose_facets |
Boolean | False | If true, the document's tensor facets are returned. This is a list of objects. Each facet object contains the data and its embedding (found in the facet's _embedding field) |
Body
An array of IDs. Each ID is a string.
["article_152", "article_490", "article_985"]
Example
curl -XGET http://localhost:8882/indexes/my-first-index/documents -H 'Content-Type: application/json' -d '
["article_152", "article_490", "article_985"]
'
mq.index("my-first-index").get_documents(
document_ids=["article_152", "article_490", "article_985"]
)
Response
{'results': [{'Blurb': 'A rocket car is a car powered by a rocket engine. This '
'treatise proposes that rocket cars are the inevitable '
'future of land-based transport.',
'Title': 'Treatise on the viability of rocket cars',
'_found': true,
'_id': 'article_152'},
{'_found': false, '_id': 'article_490'},
{'Blurb': "One must maintain one's space suite. It is, after all, "
'the tool that will help you explore distant galaxies.',
'Title': 'Your space suit and you',
'_found': true,
'_id': 'article_985'}]}
article_490
. As a result, the _found
field is false
.
Delete documents
Delete documents identified by an array of their ID's.
POST /indexes/{index-name}/documents/delete-batch
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the index |
Body
An array of document IDs, to be deleted.
[ "article_591", "article_602" ]
Example
curl -XPOST http://localhost:8882/indexes/my-first-index/documents/delete-batch -H 'Content-type:application/json' -d '[
"article_591", "article_602"
]'
mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])
Response
{
"index_name":"my-first-index",
"status":"succeeded",
"type":"documentDeletion",
"details":{
"receivedDocumentIds":2,
"deletedDocuments":1
},
"duration":"PT0.084367S",
"startedAt":"2022-09-01T05:11:31.790986Z",
"finishedAt":"2022-09-01T05:11:31.875353Z"
}