March 24, 202613 min read

Then and now: what happened to search between Sitecore 8 and SitecoreAI

A long-time Sitecore developer's guide to what's changed, what's gone, and what finally works in the headless search era.

Thomas Hoad

Sitecore Developer

Thomas is a full-stack Sitecore developer who builds secure, accessible, enterprise-grade CMS solutions that balance performance, precision, and polish.

From Solr configs to SaaS: the shift every returning Sitecore dev needs to understand

Back in my Sitecore 8 days, the answer to every search question was the same: spin up a Solr instance, configure your cores, and write some LINQ queries. We owned the infrastructure, we owned the content delivery servers, and we lived in the backend.

Stepping back into the ecosystem now with SitecoreAI is a different world entirely. Content Delivery servers are gone, the ContentSearch API is effectively retired in the composable stack, and the headless architecture demands a completely different approach. Let's break down what replaced it, and why Sitecore Search has become the definitive native choice.

The ghost of searches past: The Solr monolith

In Sitecore 8 XP, Lucene was the default search provider, with Solr available as an alternative.

Depending on the implementation, Solr could handle both internal CM indexing and external CD website search, though front-end search was often delegated to a dedicated solution like Coveo.

As developers, we spent hours mapping Sitecore fields to Solr dynamic fields, managing Zookeeper clusters, and writing custom indexers. If you were around back then, this XML configuration should trigger some nostalgia (read: anxiety):

<fieldTypes hint="raw:AddFieldByFieldTypeName">
  <fieldType fieldTypeName="single-line text" returnType="text" />
</fieldTypes>
<typeMatches hint="raw:AddTypeMatch">
  <typeMatch typeName="text" type="System.String" fieldNameFormat="{0}_t" />
</typeMatches>

We constantly battled synchronous indexing bottlenecks that would drain server resources during large publish events, tweaking ContentSearch.SearchMaxResults just to keep memory allocation from crashing the cluster. While SitecoreAI still uses Solr under the hood for internal Content Management operations, you can no longer expose a local Solr index to your front-end via a CD server. CD servers don't exist in the SaaS model. If you want a front-end search experience today, you need a composable API-driven solution.

Then vs. now: the same query, two eras

Nothing makes the architectural shift more concrete than seeing what the same basic task looks like in each era. Let's take a common real-world scenario: fetching published articles filtered by category, sorted by publish date.

Then: Sitecore 8, C#, ContentSearch API

This was the standard pattern. You'd define a typed search result model, resolve the index by name, open a search context, and compose a LINQ query that the ContentSearch provider would translate into a Solr query under the hood.

// SearchResultItem model — field names had to match your Solr dynamic field mappings exactly
public class ArticleSearchResultItem : SearchResultItem
{
    [IndexField("category_t")]
    public string Category { get; set; }
    [IndexField("publish_date_tdt")]
    public DateTime PublishDate { get; set; }
    [IndexField("author_t")]
    public string Author { get; set; }
}
// In your search service
public IEnumerable<ArticleSearchResultItem> GetArticlesByCategory(string category)
{
    var index = ContentSearchManager.GetIndex("sitecore_web_index");
    using (var context = index.CreateSearchContext())
    {
        return context.GetQueryable<ArticleSearchResultItem>()
            .Where(item =>
                item.Category == category &&
                item.TemplateName == "Article" &&
                item.Language == "en" &&
                item.LatestVersion())
            .OrderByDescending(item => item.PublishDate)
            .Take(10)
            .GetResults()
            .Hits
            .Select(hit => hit.Document)
            .ToList();
    }
}

This worked. But notice the ceremony: the index name is a magic string, the field names are magic strings tied to your Solr schema, and one wrong fieldNameFormat in your config would silently return zero results with no useful error.

You also had to keep your field type mappings, index configuration patches, and Solr schema in sync manually, across environments.

Now: Sitecore Search, JavaScript SDK

Here is the equivalent query using the Sitecore Search React SDK. The infrastructure concern disappears entirely. You are declaring intent, not managing plumbing.

import {
  widget,
  WidgetDataType,
  useSearchResults,
  FilterEqual
} from '@sitecore-search/react';
const ArticleListWidget = () => {
 const {
   widgetRef,
   actions: { onSortChange },
   state: { sortType },
   queryResult: {
     data: {
       content: articles = [],
       sort: { choices: sortChoices = [] } = {}
     } = {}
   }
 } = useSearchResults({
   query: (query) => {
     query
       .getRequest()
       .setSearchFilter(new FilterEqual('category', 'digital-experience'));
   },
   state: {
     itemsPerPage: 10,
     sortType: 'publish_date_desc',
   }
 });
 return (
   <div ref={widgetRef}>
     <select value={sortType} onChange={(e) => onSortChange({ sortType: e.target.value })}>
       {sortChoices.map((sort) => (
         <option key={sort.name} value={sort.name}>
           {sort.label}
         </option>
       ))}
     </select>
     <ul>
       {articles.map((article) => (
         <li key={article.id}>
           <a href={article.url}>{article.title}</a>
           <span>{article.publish_date}</span>
         </li>
       ))}
     </ul>
   </div>
 );
};
export default widget(ArticleListWidget, WidgetDataType.SEARCH_RESULTS, 'content');

No index name. No field type mapping file. No Zookeeper.

The widget decorator binds the component into the Sitecore Search event system, meaning click tracking, analytics, and behavioral boosting come along for free.

What used to be a service class, a configuration patch file, and a Solr schema update is now roughly 30 lines of declarative JavaScript.

For returning developers, this is the moment it clicks: you are no longer managing a search engine. You are consuming a search product.

The scars you earned: what broke your search in the old days

If you spent real time on Sitecore XP, you didn't just use the search stack, you fought it. Here are the failure modes that were practically a rite of passage, and exactly why they no longer apply.

1. Index rebuild taking down a publish event

The scenario: a content editor triggers a full site publish, perhaps after a template change or a large content migration. The synchronous indexing pipeline tries to process thousands of items in real time, hammering the server CPU and I/O simultaneously. Publish times stretch from seconds to minutes. In the worst cases, the Solr indexing pipeline would deadlock, leaving the index in a half-rebuilt state and editors staring at a spinning publish dialog.

The fix was usually a combination of batching, async indexing configuration, and prayer.

In SitecoreAI: There is no web index to rebuild. Published content goes to Experience Edge. From there, you can configure advanced crawlers to re-index your rendered pages on a schedule, or wire up a custom webhook handler that pushes updated content into the Sitecore Search Ingestion API on publish. Either way, the indexing happens asynchronously, outside the CMS publish pipeline entirely. An author hitting Publish has zero awareness of the search indexing layer.

2. ContentSearch.SearchMaxResults: the setting that bit you either way

This was a lose-lose configuration. The default value for ContentSearch.SearchMaxResults was 1,000,000. That means every single ContentSearch API request told Solr to allocate memory for up to one million rows, whether you needed ten results or ten thousand. On a busy CD server with concurrent queries, this was a recipe for severe memory overallocation and performance degradation on the Solr cluster.

So teams would lower it. Set it to 500, maybe 10,000, something reasonable. Problem solved, right? Not quite. If your query matched more items than the cap, results were silently truncated without warning, without error, just quietly missing data. Facet counts would be wrong. "Show all" features would lie. You'd only find out when a stakeholder noticed that a category page was only showing 500 articles out of 1,200.

And then there were the teams that went the other direction entirely, patching the value to int.MaxValue to guarantee they'd never miss a result:

<!-- The config patch that lived in every Sitecore project -->
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <settings>
      <setting name="ContentSearch.SearchMaxResults">
        <patch:attribute name="value">2147483647</patch:attribute>
      </setting>
    </settings>
  </sitecore>
</configuration>

Which circled right back to the memory problem. There was no good middle ground without careful per-query tuning, and the ContentSearch API made that awkward at best.

In Sitecore Search: Pagination, result limits, and performance throttling are managed at the SaaS platform level with transparent API response contracts. You define your page size in the query. There is no silent ceiling on results, no global setting to get wrong, and no heap to tune.

3. Zookeeper Quorum loss

For anyone running SolrCloud (which was the right call for any production Sitecore install at scale) Zookeeper was the distributed coordination layer that kept your Solr nodes in agreement.

If you lost a majority of Zookeeper nodes (quorum loss), Solr would refuse to accept new index writes and often refuse reads too. Your entire site search would go dark, and the recovery process involved carefully restarting nodes in the right order while hoping the ensemble would reform cleanly.

Most developers encountered this at the worst possible time: during a high-traffic event, or immediately after a cloud VM restart during a maintenance window.

In SitecoreAI / Sitecore Search: There is no Zookeeper. There is no Solr node ensemble to manage. There is no quorum to lose. The entire distributed coordination problem is Sitecore's operational responsibility.

The heavyweight contender: Coveo

When Solr's capabilities weren't enough, Coveo was historically the premium enterprise alternative. It brought machine learning and personalization to Sitecore platform long before "AI" became the ubiquitous industry buzzword it is today.

Coveo shines in complex, enterprise-level search. If you are building an experience that requires unifying search results across an ERP, a legacy Sitecore instance, a separate Salesforce community, and an external knowledge base, Coveo's massive library of out-of-the-box connectors is unmatched.

However, for many standard SitecoreAI implementations, Coveo can be overkill. It comes with a premium price tag, a steep learning curve, and a heavier implementation footprint than a standard content site might need. If your primary goal is indexing and surfacing your published web content and digital assets with modern relevance, there is now a more streamlined, native path.

The native powerhouse: Sitecore Search

For those of us working deeply within the SitecoreAI ecosystem and SitecoreAI, Sitecore Search feels like the missing puzzle piece that bridges content creation and content discovery.

Here is why it wins in this new era:

1. API-First, Edge-ready architecture

Instead of building complex indexing pipelines in C#, Sitecore Search thrives on APIs and webhooks. You have the flexibility to set up advanced crawlers to index your rendered Next.js pages, or you can configure a Sitecore Search source to listen for Experience Edge webhook events, triggering near-real-time index updates whenever an author publishes. This keeps your search results tightly in sync with your live content without manual intervention.

A minimal webhook handler to push a published item into Sitecore Search might look something like this:

// /api/search-index-update.js — Next.js API route
// Triggered by a SitecoreAI Experience Edge webhook on item publish
// Sitecore language codes (e.g. 'en', 'fr-FR') must be mapped to the
// underscore-delimited locale format expected by the Sitecore Search Ingestion API.
// Extend this map to cover all languages active in your Sitecore instance.
const localeMap = {
  'en':    'en_us',
  'fr-FR': 'fr_fr',
};
export default async function handler(req, res) {
  if (req.method !== 'POST') return res.status(405).end();
  const { item_id, language, fields } = req.body;
  const locale = localeMap[language] ?? 'en_us';
  const indexDocument = {
    id: item_id,
    fields: {
      title:       fields?.PageTitle?.value,
      description: fields?.MetaDescription?.value,
      url:         fields?.Url?.value,
      category:    fields?.Category?.value,
      publish_date: new Date().toISOString()
    }
  };
  const {
    SEARCH_BASE_URL,
    SEARCH_DOMAIN_ID,
    SEARCH_SOURCE_ID,
    SEARCH_ENTITY_ID,
    SEARCH_API_KEY
  } = process.env;
  await fetch(
    `${SEARCH_BASE_URL}/ingestion/v1/domains/${SEARCH_DOMAIN_ID}/sources/${SEARCH_SOURCE_ID}/entities/${SEARCH_ENTITY_ID}/documents?locale=${locale}`,
    {
      method: 'POST',
      headers: {
        'Authorization': SEARCH_API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ indexDocument })
    }
  );
  return res.status(200).json({ indexed: item_id });
}

This replaces what used to be a custom IIndexUpdateHandler, a pipeline processor registration in a config patch, and a deployment. It is a single API route.

2. ML-Driven relevance and personalization

Sitecore Search goes beyond keyword matching by leveraging machine learning to understand user intent. It applies semantic understanding and supports synonym management, with marketers able to curate and expand synonym rules through the Customer Engagement Console (CEC) alongside real-time predictive search and behavioral boosting to automatically surface the most relevant content, reducing the need for manual relevance tuning. It fundamentally aligns with the SitecoreAI vision where content, personalization, and search are a unified ecosystem.

3. True marketer empowerment

In the Solr days, setting up a promotional boost for a specific campaign required a developer to deploy a code change or update an XML config. Sitecore Search provides a robust, business-friendly CEC. Marketers can log in, pin specific results, configure synonyms, manage facets, and set up personalization rules without ever submitting a Jira ticket to the dev team.

4. Zero infrastructure management

No more Solr nodes to scale for your front-end search experience. No more index rebuilds taking down production performance. Sitecore Search is a fully managed SaaS product, aligning perfectly with the cloud-native philosophy of SitecoreAI.

What you actually lose (and why the trade is still worth it)

In C# and the ContentSearch API, you had granular control. You could write a custom IIndexableBuilderFactory, override how specific field types were computed, inject pipeline processors that transformed data mid-index, and introspect the index state programmatically.

In Sitecore Search, the core relevance and indexing pipeline is a black box. You control what you push in, but the AI-driven ranking internals are not exposed. You do have a set of override tools (boost/bury/pin/blacklist rules at the widget and API level), but these operate on top of the black-box relevance engine rather than replacing it. For standard content indexing this is fine. For complex custom relevance logic that needs to deeply influence scoring (rather than just nudge or override it), you will feel the constraint.

Debugging relevance is a different skill set

When a Solr query returned wrong results, you could pop open the Solr admin UI, run the raw query, inspect the scores, tweak the boost factors, and iterate in minutes. The feedback loop was tight and entirely within your control.

When Sitecore Search returns unexpected results, the feedback loop runs through the CEC analytics, behavioral data you may not yet have, and ML signals that take time to build. There is no equivalent of the Solr debug query. Relevance problems in a SaaS ML system require a different diagnostic approach. It's less about fixing the query, and more about reviewing the signals and waiting for the model to adjust.

Looking ahead

Transitioning from the monolithic, backend-heavy .NET architecture of Sitecore 8 to the headless, SaaS-first world of SitecoreAI can be a lot to take in. We no longer have to be infrastructure managers and Solr administrators. By adopting Sitecore Search, we can leverage real machine learning capabilities to deliver hyper-relevant, personalized experiences powered by content published through Experience Edge, and that's a craft worth building.

It's also worth keeping an eye on Sitecore Search Experiences, a newer addition to the SitecoreAI platform that aims to take search even further. Search Experiences builds on top of Sitecore Search to provide a more streamlined, out-of-the-box way to create and manage search-driven pages and components, reducing the custom widget code developers need to write while giving marketers even more direct control over how search results are presented. If the trajectory holds, this could further close the gap between content strategy and content discovery, making search feel less like a standalone integration and more like a native extension of the authoring and delivery workflow. It's early days, but for anyone invested in the Sitecore Search ecosystem, it's a space worth watching closely.

Platforms

Using Figma MCP with Cursor to create components in Next.js

Streamline your front-end workflow by transforming Figma designs into production-ready Next.js components directly within the Cursor IDE.

June 22, 20261 min read

Dev

Connecting Sitecore XM Cloud to Astro, using the official Content SDK

Pairing the Sitecore Content SDK with Astro to build a fast, server-first CMS front end — Layout Service, GraphQL, and editing all handled.

June 19, 20261 min read

AI solutions

Digital solutions

Industries