Skip to content

Using Datadog to monitor your Marqo Cloud indexes

Marqo Cloud can publish performance and error metrics about your index to your Datadog account. To set this up in the Marqo Cloud console, first navigate to the integrations page here. Then, click on the Edit button and paste your Datadog API Key and select the Datadog site you want to send the metrics to. Click Apply Changes.

Datadog sites

Datadog has multiple sites. To set up an integration between Marqo Cloud and Datadog you need to find your datadog site. You can find this information from your datadog website url. For instance if your datadog website url is https://app.datadoghq.com then in Marqo console you should select datadog site datadoghq.com when setting up your integration.

You can read more about datadog sites and find the mapping between webite url and datadog site here.

API Key

You will need a Datadog API key to set up your Marqo Cloud to Datadog integration. To generate the API key from your Datadog account, navigate to Profile > Organization settings > API Keys and click on New Key or copy one of the existing keys. These API Keys only allow Marqo Cloud to push metrics to your datadog account, but doesn't allow Marqo Cloud to access or read any data from your Datadog account.

You can read more about Datadog API Keys here

Metrics published to Datadog

Request Duration

Distribution metric that reports request processing time by Marqo. A new datapoint is published at least once every 30 seconds if there is any traffic to the index. No datapoint for this metric is published in absence of traffic.

Metric name: marqo.request.duration

Assigned Datadog tags

  • index_name: Name of the index that the request was made against.

  • request_path: API path of the request. This will be /indexes/{index_name}/documents for add docs request and /indexes/{index_name}/search for search request.

  • request_method: HTTP request method. This will be one of POST, GET, PUT, PATCH, DELETE.

  • resp_code: HTTPS response code from Marqo, This will be a value between 200 and 599.

Percentile calculations

marqo.request.duration is published as a distribution metric in Datadog and if you need percentile calculations for this metric, you need to enable it in Datadog by navigating to Metrics > Summary in the DD console and then select marqo.request.duration. This will open a side tab and you can toggle the switch for Enable percentiles and threshold queries there and click save.

Example usage

To monitor average duration for all search requests with 200 response code against index A:

avg:marqo.request.duration{index_name:A,request_path:indexes/A/search,resp_code:200,request_method:post}

To monitor count of all search requests with 200 response code against index A:

count:marqo.request.duration{index_name:A,request_path:indexes/A/search,resp_code:200,request_method:post}.as_count()

To monitor average duration for all requests against index A:

avg:marqo.request.duration{index_name:A}

Vector Count

Number of vectors currently stored in the index. A new datapoint is published every 30 seconds.

Metric Name: marqo.vector_count

Assigned Datadog tags

  • index_name: Name of the index that the metric is for.

Example usage

To monitor number of vectors in index A:

avg:marqo.vector_count{index_name:A}

Doc Count

Number of documents currently stored in the index. A new datapoint is published every 30 seconds.

Metric Name: marqo.doc_count

Assigned Datadog tags

  • index_name: Name of the index that the metric is for.

Example usage

To monitor number of documents in index A:

avg:marqo.doc_count{index_name:A}

Disk Used Percentage

Every index is provisioned with a fixed disk size based on the number of shards and type of storage shard you configure. This metric tells you the amount of disk used in percentage. You can aim to keep this number below 70% for optimal performance. A new datapoint is published once every 30 seconds.

Metric Name: marqo.disk_used_percentage

Assigned Datadog tags

  • index_name: Name of the index that the metric is for.

Example usage

To monitor number of vectors in index A:

avg:marqo.disk_used_percent{index_name:A}

Memory Used Percentage

Every index is provisioned with a fixed amount of memory based on the number of shards and type of storage shard you configure. This metric tells you the amount of memory used in percentage. You can aim to keep this number below 70% for optimal performance. A new datapoint is published once every 30 seconds.

Metric Name: marqo.memory_used_percentage

Assigned Datadog tags

  • index_name: Name of the index that the metric is for.

Example usage

To monitor number of vectors in index A:

avg:marqo.memory_used_percent{index_name:A}

Datadog dashboard setup example

To get started with your first Datadog dashboard for Marqo Cloud, you may copy this example Datadog dashboard config from here. Then navigate to your datadog website(https://{your_datadog_website}/dashboard/lists) and create a new dashboard by clicking on the New Dashboard button on the top-right corner of the Dashboard page.

Once your dashboard is created and opened. Click on the Configure button on the dashboard on the top right corner and then select Import dashboard JSON from the menu. Once the pop-up dialog is open, paste the JSON that you copied from the link above.

You have now successfully setup a dashboard for all indexes in your Marqo Cloud account.

Datadog monitors setup

To alert yourself on Marqo Cloud reported metrics, you can setup monitors by navigating to Monitors > New Monitor > Metric Monitor in your datadog website(https://{your_datadog_website}/monitors/create/metric) and configure a monitor following the documentation here.

We recommend that you setup atleast the following 2 Monitors by copying the JSON configurations listed below and importing it into your account by clicking on New from JSON on the top right corner of the screen and paste the copied JSON from below and save.

Disk utilisation too high

{
    "name": "Marqo Cloud index disk utilization too high",
    "type": "query alert",
    "query": "max(last_5m):avg:marqo.disk_used_percent{*} > 80",
    "message": "Marqo Cloud index disk utilisation too high. @all",
    "tags": [],
    "options": {
        "thresholds": {
            "critical": 80,
            "warning": 70,
            "critical_recovery": 79,
            "warning_recovery": 69
        },
        "notify_audit": false,
        "include_tags": false,
        "timeout_h": 1,
        "renotify_interval": 0,
        "escalation_message": ""
    },
    "priority": 2
}

Memory utilisation too high

{
    "name": "Marqo Cloud index memory utilization too high",
    "type": "query alert",
    "query": "max(last_5m):avg:marqo.memory_used_percent{*} > 80",
    "message": "Marqo Cloud index memory utilisation too high. @all",
    "tags": [],
    "options": {
        "thresholds": {
            "critical": 80,
            "warning": 70,
            "critical_recovery": 79,
            "warning_recovery": 69
        },
        "notify_audit": false,
        "include_tags": false,
        "timeout_h": 1,
        "renotify_interval": 0,
        "escalation_message": ""
    },
    "priority": 2
}