Skip to content

Marqo Docs

Offline Evaluation

Marqo Docs

Ecommerce API
Ecommerce API
- Get Started
  Get Started
  - Introduction
- Manage Index
  Manage Index
  - Create Index
  - Monitor Jobs
- Manage Products
  Manage Products
- Retrieve
  Retrieve
  - Search
  - Collections
- Recommendations
  Recommendations
- Agentic Discovery
  Agentic Discovery
- Image Search
  Image Search
  - AI Image Detection
- Event Tracking
  Event Tracking
- Integrating Marqo
  Integrating Marqo
  - Adobe Commerce Integration
  - Klaviyo Integration
- Manage Collections
  Manage Collections
  - Manage Collections
Merchandising
Merchandising
- Merchandising Controls
  Merchandising Controls
- Ranking Strategy & Relevance
  Ranking Strategy & Relevance
- Navigation, Facets & Discovery
  Navigation, Facets & Discovery
- Catalog Intelligence & Data Quality
  Catalog Intelligence & Data Quality
- Experimentation & Measurement
  Experimentation & Measurement
  - Overview
  - A/B & Multivariate Testing
  - Holdouts & Incrementality
  - Offline Evaluation Offline Evaluation
    Table of Contents
    
    Overview
    
    Log Replay
    
    Counterfactual Evaluation
    
    Relevance Judgments
    
    Best Practices
    
    Related Topics
- Analytics & Reporting
  Analytics & Reporting
- Explainability & Auditing
  Explainability & Auditing
- Operations, Workflow & Governance
  Operations, Workflow & Governance
Marqo Classic API
Marqo Classic API
- Quick Reference
  Quick Reference
  - Find Your API Key
  - Find Your Endpoint
- Indexes
  Indexes
- Documents
  Documents
- Search
  Search
  - Search
  - Recommend
- Typeahead
  Typeahead
  - Typeahead
- Embed
  Embed
  - Embed
- Index Info
  Index Info
- Model
  Model
  - Get Models
  - Eject a Loaded Model
- Device
  Device
  - Get CPU Info
  - Get CUDA Info
- Cloud Platform Reference
  Cloud Platform Reference
  - Best Practices
  - Sizing, Storage & Inference
  - Integrations
    Integrations
    
    Datadog
    
    Terraform/OpenTofu Provider
    Terraform/OpenTofu Provider
    
    Getting Started
    
    Configuration Options
    
    Structured Indexes
    
    Unstructured Indexes
    
    Listing Indexes
- Models Reference
  Models Reference
  - Bring Your Own Model
  - List of Models
- Guides & How-To
  Guides & How-To
  - How-to Guides & Best Practices
    How-to Guides & Best Practices
    
    Overview
    
    Model Selection
    Model Selection
    
    Text Search
    
    Multimodal Search
    
    Best Practices
    Best Practices
    
    Score Modifier Best Practices
    
    Multimodal Combination Field Best Practices
    
    Text Pre-Processing Best Practices
    
    Query Prompt Engineering
    
    Multi-term Queries
    
    Analysing Processing Time
    
    Sort And Relevance Cut-off
    
    Recipes
    Recipes
    
    Lexical Reciprocal Rank Fusion
    
    Similar Item Recommendations
    
    Diversifying Recommendations
    
    Semantic Filtering Templates
    
    Personalised Search and Recommendations
    
    Embedding Static Context
    
    Calculating Recall
    
    Content Moderation
    
    Duplicate Detection
    
    Search Result Grouping with Collapse Fields
    
    Implementing Typeahead Search
    
    Loading Sentence Transformers from Hugging Face
    
    Using Custom CLIP Weights with Marqo
    
    Recency Scoring Guide
    
    Index Types
    Index Types
    
    Unstructured Vs Structured Indexes
    
    Creating an Unstructured Index
    Creating an Unstructured Index
    
    Text Only Unstructured Index
    
    Multimodal Unstructured Index
    
    Filtering in Unstructured Indexes
    
    Creating a Structured Index
    Creating a Structured Index
    
    Text Only Structured Index
    
    Multimodal Structured Index
    
    Structured Index with Other Features
    
    Understanding HNSW Parameters
  - Additional Guides
    Additional Guides
    
    Query Filtering
    
    Understanding Storage
    
    Using Marqo with a GPU
    
    Checking Marqo's Version
    
    Advanced Usage
    Advanced Usage
    
    Change Storage Location
    
    Configuration
    
    Disabling Client Logging
    
    Document Fields
    
    Images
    
    Optimising Search
    
    Transferring State
    
    Running Separate Marqo Containers
    
    Preprocessing
    Preprocessing
    
    Images
    
    Text
    
    Deploying with a Cloud Provider
    Deploying with a Cloud Provider
    
    Running Marqo on AWS
  - Troubleshooting
    Troubleshooting
    
    Troubleshooting
    
    Community
    
    Marqo Release History
    
    Py-Marqo Release History
    
    Best Practices
    Best Practices
    
    Technical Best Practices
    
    Solution Best Practices
Support
Internal Docs

Offline Evaluation

Offline evaluation lets you replay logs, perform counterfactual evaluation, and use relevance judgments to evaluate changes without affecting live traffic.

Overview

Offline evaluation allows you to test ranking strategies, recommendations, and other changes on historical data before deploying to production, reducing risk and enabling faster iteration. Historical data is automatically collected by the Marqo pixel, so you can replay past user interactions and queries without any manual data preparation.

Log Replay

Replay historical queries:

{
  "evaluation_method": "log_replay",
  "data_period": "30_days",
  "queries": "all",
  "metrics": [
    "ndcg@10",
    "mrr",
    "precision@10"
  ]
}

Counterfactual Evaluation

Evaluate what would have happened:

{
  "evaluation_method": "counterfactual",
  "baseline": "current_ranking",
  "variant": "new_ranking",
  "metrics": [
    "estimated_conversions",
    "estimated_revenue",
    "estimated_clicks"
  ]
}

Relevance Judgments

Use human relevance judgments:

{
  "evaluation_method": "relevance_judgments",
  "judgments": [
    {
      "query": "running shoes",
      "product_id": "prod_123",
      "relevance": 4,
      "judge": "expert"
    }
  ],
  "metrics": [
    "ndcg",
    "map",
    "mrr"
  ]
}

Best Practices

Use representative data: Ensure evaluation data matches production
Validate offline metrics: Confirm offline metrics correlate with online performance
Use multiple methods: Combine log replay, counterfactual, and judgments
Regular evaluation: Run offline evaluation regularly
Document results: Keep records of offline evaluation outcomes

A/B & Multivariate Testing - Validate offline results online
Analytics & Reporting - Measure performance