How We Built Search Over All of Wikipedia in 30 minutes with 34% Better Relevance

July 10, 2024

Team Objective

Imagine you were tasked with building a search system for wikipedia.org in 2024. Before ChatGPT changed the industry’s perceptions of what’s possible with AI, you would have likely reached for Elasticsearch first. Lucene-based search engines like Elasticsearch, OpenSearch and Solr have been the primary workhorses of the search industry for a long time. In fact, Wikipedia.org itself uses Elasticsearch to power their own search experience. But in the last few years, the innovation & developer tooling options around semantic search have exploded. There are now lots of options for developers, from embedding models (OpenAI, together.ai, etc.) to vector databases (Pinecone, Qdrant, etc.), and choices at every other layer of the stack. But, it means every developer is now faced with cobbling a solution together after answering a mountain of questions:

How do I pick the right embedding model?
How do I keep the search experience up-to-date?
How do I evaluate the search experience?
Where do I deploy the orchestration code for both indexing and search?
How do I improve quality when things don’t work?
How do I do all of this at scale with high throughput and reliability, low latency, etc.?

Many search problems show characteristics that are similar to Wikipedia search: millions of articles, many queries per second, information-dense, long-form text, etc. If your search problem has any of these characteristics, you probably will have to answer all of the questions above, and more. If doing all this work sounds like fun to you, see you in a few months (and we recommend buying a good pager)! However, if you want to ship something today instead of Q4, keep on reading.

There is a better way!

We used the Objective API to build a Wikipedia search system that outperforms the Lucene-based solution you use today to search on wikipedia.org by a wide margin. You can try it out and compare side-by-side. Objective Search allows anyone to build a highly scalable, high-quality search experience. To build Wikipedia search, here’s how we did it.

Grab a Wikipedia dump
‍We download a Wikipedia dump from dumps.wikimedia.org and then use the Objective SDK to upload the objects to our Object Store.


 curl https://dumps.wikimedia.org/other/cirrussearch/20240610/enwiki-20240610-cirrussearch-content.json.gz \
		| zcat \
		| jq 'select(.title) | {title, opening_text, language, wiki, wikibase_item, version, timestamp, popularity_score, page_id, language, category}' -cr \
		> enwiki-latest-concise.jsonl

‍Upload objects and create an index
‍We then create a text index and set the searchable fields to title and opening_text (i.e. the title and the first few paragraphs of each Wikipedia article) since these fields contain the majority of the data we’re interested in. You can do this via the API or directly in the app.

Automatically generate queries
Next, we use Objective to optimize the quality of the search results for our use case. All we need for this step is a list of queries we want to optimize for. If you don’t have any queries handy, you can simply choose to have Objective generate queries for you based on the contents of your Object Store.

Automatically finetune your Index
Objective will walk you through the steps to finetune your index. All you need are the queries generated in the previous step and Objective will figure out the rest.

Optional step: Add a ranking expression
Wikipedia has additional signals that can help with search! In our case, we add a ranking expression to consider an article’s popularity. Popularity is a number between 0 and 1, computed from the number of times a Wikipedia page has been viewed. For the sake of illustration our example ranking expression raises the popularity to the 0.1th power and multiplies it by relevance. As a result, the expression outputs a slightly reduced overall score when Wikipedia pages are less popular, but does so without overly distorting the relevance score even when popularity is very high or very low.

uncalibrated_relevance * (object.popularity_score ^ 0.01)

‍Final step: Look like a genius!

Comparing against Wikipedia.org’s Search

Let’s check out the results! Our steps above created a production-ready, fast and and easy-to-use search index that handily outperforms Wikipedia.org’s Elasticsearch on industry standard search metrics. Let’s dive a little deeper.

Metric	wikipedia.org Search	Objective Auto-Finetuned Index	Improvement
NDCG@10	54.97	69.33	+26.12%
MAP@10	41.40	55.63	+34.37%
Recall@10	54.56	68.14	+24.89%

Table 1: Objective Auto-finetuned Index shows relevance improvements as measured by NDCG, MAP, and Recall at 10 on our evaluation dataset.

To compare our new search index against Wikipedia.org’s search, we first generated 100 new, unseen queries and then scraped both Wikipedia’s API and our newly created Objective Search index. Each search result was rated as being relevant or irrelevant to the query — a process also known as search relevance rating. Relevance rating was done blindly — without knowledge of which system generated it. The rating results are then used to calculate industry-standard retrieval metrics that describe the quality of the search experience.

NDCG evaluates the quality of the ranking (i.e. are great results consistently listed above bad results?), MAP evaluates how accurate the results are while also taking into account their order, and Recall measures how many of the known good results are successfully returned by the search system. Evaluation is performed using trec_eval and we are sharing our evaluation code and data (available on Github). The results show that Objective’s out-of-the-box search results are higher quality, leading to a significant jump in precision and an additional gain in recall.

To accomplish what we did here, you won't have to learn Elasticsearch's query language, or tinker with hundreds of parameters in order to optimize results. Objective gets us a better starting point so that we can use our engineering effort on the highest value tasks. Here, it enabled us to quickly serve Wikipedia’s scale of millions of long text documents. And importantly, we can also scale it out to serve Wikipedia.org levels of traffic with the click of a button! Try it out yourself here!

How We Built Search Over All of Wikipedia in 30 minutes with 34% Better Relevance

There is a better way!

Comparing against Wikipedia.org’s Search

Subscribe to our Newsletter!

We recommend you to read

Meet the Objective V3 Ensembles: Enhanced Intent Understanding

Objective Search Evaluator: An Open-Source Tool for Search Quality Assessment

Search by Image with AI-Native Intelligence, Now in Public Beta

Meet Anton: AI-Powered Search Evaluation in an API Built for Search Engineers

Remove Bad Results in Vector Search with Ranking Refinement, Now in Beta

Get Quantitative and Comprehensive Insights into your Search Quality with Auto-Evaluations

How We Built Search Over All of Wikipedia in 30 minutes with 34% Better Relevance

Automatically Generate Search Queries Tailored to Your Data. Query Generation Moves into Beta!

Build Contextually Aware Search with Ranking Signals, Now in Beta