Then and now: what happened to search between Sitecore 8 and SitecoreAI
A long-time Sitecore developer's guide to what's changed, what's gone, and what finally works in the headless search era.
Start typing to search...
Back in my Sitecore 8 days, the answer to every search question was the same: spin up a Solr instance, configure your cores, and write some LINQ queries. We owned the infrastructure, we owned the content delivery servers, and we lived in the backend.
Stepping back into the ecosystem now with SitecoreAI is a different world entirely. Content Delivery servers are gone, the ContentSearch API is effectively retired in the composable stack, and the headless architecture demands a completely different approach. Let's break down what replaced it, and why Sitecore Search has become the definitive native choice.
In Sitecore 8 XP, Lucene was the default search provider, with Solr available as an alternative.
Depending on the implementation, Solr could handle both internal CM indexing and external CD website search, though front-end search was often delegated to a dedicated solution like Coveo.
As developers, we spent hours mapping Sitecore fields to Solr dynamic fields, managing Zookeeper clusters, and writing custom indexers. If you were around back then, this XML configuration should trigger some nostalgia (read: anxiety):
<fieldTypes hint="raw:AddFieldByFieldTypeName">
<fieldType fieldTypeName="single-line text" returnType="text" />
</fieldTypes>
<typeMatches hint="raw:AddTypeMatch">
<typeMatch typeName="text" type="System.String" fieldNameFormat="{0}_t" />
</typeMatches>
We constantly battled synchronous indexing bottlenecks that would drain server resources during large publish events, tweaking ContentSearch.SearchMaxResults just to keep memory allocation from crashing the cluster. While SitecoreAI still uses Solr under the hood for internal Content Management operations, you can no longer expose a local Solr index to your front-end via a CD server. CD servers don't exist in the SaaS model. If you want a front-end search experience today, you need a composable API-driven solution.
Nothing makes the architectural shift more concrete than seeing what the same basic task looks like in each era. Let's take a common real-world scenario: fetching published articles filtered by category, sorted by publish date.
This was the standard pattern. You'd define a typed search result model, resolve the index by name, open a search context, and compose a LINQ query that the ContentSearch provider would translate into a Solr query under the hood.
// SearchResultItem model — field names had to match your Solr dynamic field mappings exactly
public class ArticleSearchResultItem : SearchResultItem
{
[IndexField("category_t")]
public string Category { get; set; }
[IndexField("publish_date_tdt")]
public DateTime PublishDate { get; set; }
[IndexField("author_t")]
public string Author { get; set; }
}
// In your search service
public IEnumerable<ArticleSearchResultItem> GetArticlesByCategory(string category)
{
var index = ContentSearchManager.GetIndex("sitecore_web_index");
using (var context = index.CreateSearchContext())
{
return context.GetQueryable<ArticleSearchResultItem>()
.Where(item =>
item.Category == category &&
item.TemplateName == "Article" &&
item.Language == "en" &&
item.LatestVersion())
.OrderByDescending(item => item.PublishDate)
.Take(10)
.GetResults()
.Hits
.Select(hit => hit.Document)
.ToList();
}
}
This worked. But notice the ceremony: the index name is a magic string, the field names are magic strings tied to your Solr schema, and one wrong fieldNameFormat in your config would silently return zero results with no useful error.
You also had to keep your field type mappings, index configuration patches, and Solr schema in sync manually, across environments.
Here is the equivalent query using the Sitecore Search React SDK. The infrastructure concern disappears entirely. You are declaring intent, not managing plumbing.
import {
widget,
WidgetDataType,
useSearchResults,
FilterEqual
} from '@sitecore-search/react';
const ArticleListWidget = () => {
const {
widgetRef,
actions: { onSortChange },
state: { sortType },
queryResult: {
data: {
content: articles = [],
sort: { choices: sortChoices = [] } = {}
} = {}
}
} = useSearchResults({
query: (query) => {
query
.getRequest()
.setSearchFilter(new FilterEqual('category', 'digital-experience'));
},
state: {
itemsPerPage: 10,
sortType: 'publish_date_desc',
}
});
return (
<div ref={widgetRef}>
<select value={sortType} onChange={(e) => onSortChange({ sortType: e.target.value })}>
{sortChoices.map((sort) => (
<option key={sort.name} value={sort.name}>
{sort.label}
</option>
))}
</select>
<ul>
{articles.map((article) => (
<li key={article.id}>
<a href={article.url}>{article.title}</a>
<span>{article.publish_date}</span>
</li>
))}
</ul>
</div>
);
};
export default widget(ArticleListWidget, WidgetDataType.SEARCH_RESULTS, 'content');
No index name. No field type mapping file. No Zookeeper.
The widget decorator binds the component into the Sitecore Search event system, meaning click tracking, analytics, and behavioral boosting come along for free.
What used to be a service class, a configuration patch file, and a Solr schema update is now roughly 30 lines of declarative JavaScript.
For returning developers, this is the moment it clicks: you are no longer managing a search engine. You are consuming a search product.
If you spent real time on Sitecore XP, you didn't just use the search stack, you fought it. Here are the failure modes that were practically a rite of passage, and exactly why they no longer apply.
The scenario: a content editor triggers a full site publish, perhaps after a template change or a large content migration. The synchronous indexing pipeline tries to process thousands of items in real time, hammering the server CPU and I/O simultaneously. Publish times stretch from seconds to minutes. In the worst cases, the Solr indexing pipeline would deadlock, leaving the index in a half-rebuilt state and editors staring at a spinning publish dialog.
The fix was usually a combination of batching, async indexing configuration, and prayer.
In SitecoreAI: There is no web index to rebuild. Published content goes to Experience Edge. From there, you can configure advanced crawlers to re-index your rendered pages on a schedule, or wire up a custom webhook handler that pushes updated content into the Sitecore Search Ingestion API on publish. Either way, the indexing happens asynchronously, outside the CMS publish pipeline entirely. An author hitting Publish has zero awareness of the search indexing layer.
This was a lose-lose configuration. The default value for ContentSearch.SearchMaxResults was 1,000,000. That means every single ContentSearch API request told Solr to allocate memory for up to one million rows, whether you needed ten results or ten thousand. On a busy CD server with concurrent queries, this was a recipe for severe memory overallocation and performance degradation on the Solr cluster.
So teams would lower it. Set it to 500, maybe 10,000, something reasonable. Problem solved, right? Not quite. If your query matched more items than the cap, results were silently truncated without warning, without error, just quietly missing data. Facet counts would be wrong. "Show all" features would lie. You'd only find out when a stakeholder noticed that a category page was only showing 500 articles out of 1,200.
And then there were the teams that went the other direction entirely, patching the value to int.MaxValue to guarantee they'd never miss a result:
<!-- The config patch that lived in every Sitecore project -->
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<settings>
<setting name="ContentSearch.SearchMaxResults">
<patch:attribute name="value">2147483647</patch:attribute>
</setting>
</settings>
</sitecore>
</configuration>
Which circled right back to the memory problem. There was no good middle ground without careful per-query tuning, and the ContentSearch API made that awkward at best.
In Sitecore Search: Pagination, result limits, and performance throttling are managed at the SaaS platform level with transparent API response contracts. You define your page size in the query. There is no silent ceiling on results, no global setting to get wrong, and no heap to tune.
For anyone running SolrCloud (which was the right call for any production Sitecore install at scale) Zookeeper was the distributed coordination layer that kept your Solr nodes in agreement.
If you lost a majority of Zookeeper nodes (quorum loss), Solr would refuse to accept new index writes and often refuse reads too. Your entire site search would go dark, and the recovery process involved carefully restarting nodes in the right order while hoping the ensemble would reform cleanly.
Most developers encountered this at the worst possible time: during a high-traffic event, or immediately after a cloud VM restart during a maintenance window.
In SitecoreAI / Sitecore Search: There is no Zookeeper. There is no Solr node ensemble to manage. There is no quorum to lose. The entire distributed coordination problem is Sitecore's operational responsibility.
When Solr's capabilities weren't enough, Coveo was historically the premium enterprise alternative. It brought machine learning and personalization to Sitecore platform long before "AI" became the ubiquitous industry buzzword it is today.
Coveo shines in complex, enterprise-level search. If you are building an experience that requires unifying search results across an ERP, a legacy Sitecore instance, a separate Salesforce community, and an external knowledge base, Coveo's massive library of out-of-the-box connectors is unmatched.
However, for many standard SitecoreAI implementations, Coveo can be overkill. It comes with a premium price tag, a steep learning curve, and a heavier implementation footprint than a standard content site might need. If your primary goal is indexing and surfacing your published web content and digital assets with modern relevance, there is now a more streamlined, native path.
For those of us working deeply within the SitecoreAI ecosystem and SitecoreAI, Sitecore Search feels like the missing puzzle piece that bridges content creation and content discovery.
Here is why it wins in this new era:
Instead of building complex indexing pipelines in C#, Sitecore Search thrives on APIs and webhooks. You have the flexibility to set up advanced crawlers to index your rendered Next.js pages, or you can configure a Sitecore Search source to listen for Experience Edge webhook events, triggering near-real-time index updates whenever an author publishes. This keeps your search results tightly in sync with your live content without manual intervention.
A minimal webhook handler to push a published item into Sitecore Search might look something like this:
// /api/search-index-update.js — Next.js API route
// Triggered by a SitecoreAI Experience Edge webhook on item publish
// Sitecore language codes (e.g. 'en', 'fr-FR') must be mapped to the
// underscore-delimited locale format expected by the Sitecore Search Ingestion API.
// Extend this map to cover all languages active in your Sitecore instance.
const localeMap = {
'en': 'en_us',
'fr-FR': 'fr_fr',
};
export default async function handler(req, res) {
if (req.method !== 'POST') return res.status(405).end();
const { item_id, language, fields } = req.body;
const locale = localeMap[language] ?? 'en_us';
const indexDocument = {
id: item_id,
fields: {
title: fields?.PageTitle?.value,
description: fields?.MetaDescription?.value,
url: fields?.Url?.value,
category: fields?.Category?.value,
publish_date: new Date().toISOString()
}
};
const {
SEARCH_BASE_URL,
SEARCH_DOMAIN_ID,
SEARCH_SOURCE_ID,
SEARCH_ENTITY_ID,
SEARCH_API_KEY
} = process.env;
await fetch(
`${SEARCH_BASE_URL}/ingestion/v1/domains/${SEARCH_DOMAIN_ID}/sources/${SEARCH_SOURCE_ID}/entities/${SEARCH_ENTITY_ID}/documents?locale=${locale}`,
{
method: 'POST',
headers: {
'Authorization': SEARCH_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({ indexDocument })
}
);
return res.status(200).json({ indexed: item_id });
}
This replaces what used to be a custom IIndexUpdateHandler, a pipeline processor registration in a config patch, and a deployment. It is a single API route.
Sitecore Search goes beyond keyword matching by leveraging machine learning to understand user intent. It applies semantic understanding and supports synonym management, with marketers able to curate and expand synonym rules through the Customer Engagement Console (CEC) alongside real-time predictive search and behavioral boosting to automatically surface the most relevant content, reducing the need for manual relevance tuning. It fundamentally aligns with the SitecoreAI vision where content, personalization, and search are a unified ecosystem.
In the Solr days, setting up a promotional boost for a specific campaign required a developer to deploy a code change or update an XML config. Sitecore Search provides a robust, business-friendly CEC. Marketers can log in, pin specific results, configure synonyms, manage facets, and set up personalization rules without ever submitting a Jira ticket to the dev team.
No more Solr nodes to scale for your front-end search experience. No more index rebuilds taking down production performance. Sitecore Search is a fully managed SaaS product, aligning perfectly with the cloud-native philosophy of SitecoreAI.
In C# and the ContentSearch API, you had granular control. You could write a custom IIndexableBuilderFactory, override how specific field types were computed, inject pipeline processors that transformed data mid-index, and introspect the index state programmatically.
In Sitecore Search, the core relevance and indexing pipeline is a black box. You control what you push in, but the AI-driven ranking internals are not exposed. You do have a set of override tools (boost/bury/pin/blacklist rules at the widget and API level), but these operate on top of the black-box relevance engine rather than replacing it. For standard content indexing this is fine. For complex custom relevance logic that needs to deeply influence scoring (rather than just nudge or override it), you will feel the constraint.
When a Solr query returned wrong results, you could pop open the Solr admin UI, run the raw query, inspect the scores, tweak the boost factors, and iterate in minutes. The feedback loop was tight and entirely within your control.
When Sitecore Search returns unexpected results, the feedback loop runs through the CEC analytics, behavioral data you may not yet have, and ML signals that take time to build. There is no equivalent of the Solr debug query. Relevance problems in a SaaS ML system require a different diagnostic approach. It's less about fixing the query, and more about reviewing the signals and waiting for the model to adjust.
Transitioning from the monolithic, backend-heavy .NET architecture of Sitecore 8 to the headless, SaaS-first world of SitecoreAI can be a lot to take in. We no longer have to be infrastructure managers and Solr administrators. By adopting Sitecore Search, we can leverage real machine learning capabilities to deliver hyper-relevant, personalized experiences powered by content published through Experience Edge, and that's a craft worth building.
It's also worth keeping an eye on Sitecore Search Experiences, a newer addition to the SitecoreAI platform that aims to take search even further. Search Experiences builds on top of Sitecore Search to provide a more streamlined, out-of-the-box way to create and manage search-driven pages and components, reducing the custom widget code developers need to write while giving marketers even more direct control over how search results are presented. If the trajectory holds, this could further close the gap between content strategy and content discovery, making search feel less like a standalone integration and more like a native extension of the authoring and delivery workflow. It's early days, but for anyone invested in the Sitecore Search ecosystem, it's a space worth watching closely.