Search Result Grouping with Collapse Fields
Collapse fields provide a powerful way to group search results and ensure diversity by displaying only the highest-scoring document from each group. This feature is particularly useful for e-commerce, content aggregation, and any scenario where you want to avoid overwhelming users with similar results.
Overview
When you search for products like "running shoes", you might get multiple variants of the same product (different colors, sizes). Collapse fields help you show diverse results by grouping documents based on specified field values and returning only the top-scoring document from each group.
Key Benefits: - Result diversity: Ensure varied search results across different categories, brands, or content types - Improved user experience: Reduce redundancy in search results - Flexible grouping: Use any string field as a grouping mechanism - Maintained relevance: Top-scoring document from each group is preserved
Index Creation
Before you can use collapse fields in search, you must define them during index creation.
curl -X POST 'http://localhost:8882/indexes/ecommerce-products' \
-H "Content-Type: application/json" \
-d '{
"type": "unstructured",
"model": "hf/e5-base-v2",
"collapseFields": [{"name": "product_family"}]
}'
import marqo
mq = marqo.Client("http://localhost:8882")
# Create index with collapse fields
index_settings = {
"type": "unstructured",
"model": "hf/e5-base-v2",
"collapseFields": [{"name": "product_family"}],
}
mq.create_index("ecommerce-products", settings_dict=index_settings)
Adding Documents
When adding documents to an index with collapse fields, ensure all documents have valid values for the collapse fields.
# Example product documents
documents = [
{
"_id": "nike-air-max-red-10",
"title": "Nike Air Max Running Shoes",
"description": "Comfortable running shoes with air cushioning",
"brand": "Nike",
"category": "Running Shoes",
"product_family": "Air Max",
"color": "Red",
"size": "10",
"price": 120,
},
{
"_id": "nike-air-max-blue-9",
"title": "Nike Air Max Running Shoes",
"description": "Comfortable running shoes with air cushioning",
"brand": "Nike",
"category": "Running Shoes",
"product_family": "Air Max",
"color": "Blue",
"size": "9",
"price": 125,
},
{
"_id": "adidas-ultraboost-black-11",
"title": "Adidas Ultraboost Running Shoes",
"description": "Premium running shoes with boost technology",
"brand": "Adidas",
"category": "Running Shoes",
"product_family": "Ultraboost",
"color": "Black",
"size": "11",
"price": 180,
},
{
"_id": "nike-pegasus-white-10",
"title": "Nike Air Zoom Pegasus",
"description": "Versatile running shoes for daily training",
"brand": "Nike",
"category": "Running Shoes",
"product_family": "Pegasus",
"color": "White",
"size": "10",
"price": 100,
},
]
# Add documents to the index
mq.index("ecommerce-products").add_documents(documents)
curl -X POST 'http://localhost:8882/indexes/ecommerce-products/documents' \
-H 'Content-Type: application/json' \
-d '{
"documents": [
{
"_id": "nike-air-max-red-10",
"title": "Nike Air Max Running Shoes",
"description": "Comfortable running shoes with air cushioning",
"brand": "Nike",
"category": "Running Shoes",
"product_family": "Air Max",
"color": "Red",
"size": "10",
"price": 120
},
{
"_id": "adidas-ultraboost-black-11",
"title": "Adidas Ultraboost Running Shoes",
"description": "Premium running shoes with boost technology",
"brand": "Adidas",
"category": "Running Shoes",
"product_family": "Ultraboost",
"color": "Black",
"size": "11",
"price": 180
}
]
}'
Document Validation Rules:
- All documents must have values for collapse fields (Please note that we support just one collapse field currently)
- Collapse field values must be strings (not numbers or other types)
- Collapse field values cannot be null
, empty strings, or whitespace-only
- Documents missing collapse field values will be rejected (with individual 400 error code)
Updating Documents
When updating documents, the same validation rules apply. You can modify collapse field values, but they must remain valid strings.
# Update a document's collapse field value
updated_document = {
"_id": "nike-air-max-red-10",
"title": "Nike Air Max Running Shoes - Updated",
"brand": "Nike Premium",
"category": "Running Shoes",
"product_family": "Nike Air Max", # Updated product_family value
"color": "Red",
"size": "10",
"price": 130,
}
mq.index("ecommerce-products").add_documents([updated_document])
Best Practices for Updates: - Ensure updated collapse field values maintain logical groupings - Consider the impact on search result diversity when changing collapse field values - Use consistent naming conventions for collapse field values across your dataset - Please note that partial update of collapse field is currently not supported
Search with Collapse Fields
Use collapse fields in search queries to group results and ensure diversity.
Basic Usage
# Search with brand-based grouping
results = mq.index("ecommerce-products").search(
q="running shoes",
search_method="HYBRID",
collapse_fields=[{"name": "product_family"}],
limit=10,
)
# This will return the top-scoring product from each brand
for hit in results["hits"]:
print(f"Product Family: {hit['product_family']}, Product: {hit['title']}")
curl -X POST 'http://localhost:8882/indexes/ecommerce-products/search' \
-H 'Content-Type: application/json' \
-d '{
"q": "running shoes",
"searchMethod": "HYBRID",
"collapseFields": [{"name": "product_family"}],
"limit": 10
}'
Advanced Usage with Filtering and Faceting
# Collapse by product family with price filtering and category faceting
results = mq.index("ecommerce-products").search(
q="comfortable shoes",
search_method="HYBRID",
collapse_fields=[{"name": "product_family"}],
filter_string="price:[* TO 150]", # Under $150
facets={"fields": {"brand": {"type": "string"}, "category": {"type": "string"}}},
limit=5,
)
print("Search Results:")
for hit in results["hits"]:
print(f"Family: {hit['product_family']}, Price: ${hit['price']}")
print("\nBrand Facets:")
for brand, count in results["facets"]["brand"].items():
print(f"{brand}: {count} products")
curl -X POST 'http://localhost:8882/indexes/ecommerce-products/search' \
-H 'Content-Type: application/json' \
-d '{
"q": "comfortable shoes",
"searchMethod": "HYBRID",
"collapseFields": [{"name": "product_family"}],
"filter": "price:[* TO 150]",
"facets": {
"fields": {
"brand": {"type": "string"},
"category": {"type": "string"}
}
},
"limit": 5
}'
Please note that the facet count is grouped by the collapse field. In the example above, each product family is counted only once in the brand or category bucket.
Use Cases & Examples
1. E-commerce Product Search
Scenario: Online store with multiple product variants (colors, sizes)
# Group by product family to show diverse products
results = mq.index("products").search(
q="winter jacket",
search_method="HYBRID",
collapse_fields=[{"name": "product_family"}],
limit=12,
)
Result: Instead of seeing 5 black jackets, 4 blue jackets, and 3 red jackets from the same product line, users see 12 different jacket styles.
2. Content Aggregation
Scenario: News website aggregating articles from multiple sources
# Group by category to ensure topic diversity
results = mq.index("articles").search(
q="artificial intelligence",
search_method="HYBRID",
collapse_fields=[{"name": "category"}],
limit=10,
)
Result: Users see articles from different categories (technology, business, science, etc.) rather than 10 similar tech articles.
3. Job Search Platform
Scenario: Job board with multiple positions at the same company
# Group by company to show diverse employers
results = mq.index("jobs").search(
q="software engineer",
search_method="HYBRID",
collapse_fields=[{"name": "company"}],
filter_string='location:"San Francisco"',
limit=20,
)
Result: Shows software engineering jobs from 20 different companies rather than multiple positions from the same few companies.
Performance Considerations
Search Performance
- Collapse fields add processing overhead during search
- Performance impact increases with the number of unique groups
- Consider using
limit
to control result set size
Best Practices
- Choose collapse fields that create meaningful groups (typically 100-10,000 unique values)
- Avoid fields with too few unique values (limited diversity) or too many (reduced grouping effect)
- Use descriptive field names that clearly indicate their grouping purpose
- Test different collapse fields to find optimal user experience
Limitations
Current Limitations:
- Only one collapse field is supported
- Only available for HYBRID
search method (not TENSOR
or LEXICAL
)
- Only works with unstructured indexes
- Requires Marqo v2.23.0 or later
- Collapse fields must be defined at index creation time
- Collapse fields are not lexically searchable
Field Requirements: - Values must be strings - Cannot be null, empty, or whitespace-only - Must be present in all documents - Cannot be changed after index creation
Troubleshooting
Common Issues
Error: "Field 'collapseFields
during index creation
Wrong count in the first range bucket of a numeric field - Possible Cause: missing value for a numeric field in one of the documents - Solution: Use an extra [-inf, 0) bucket to capture null/missing values and do not display it in UI
Poor result diversity - Solution: Choose a collapse field with more balanced value distribution - Consider using a different field that creates more meaningful groups
Missing expected results - Remember: Only the top-scoring document from each group is returned - Use faceting to understand the distribution of documents across groups