ESS - 3.Tools and Technologies involved in modern search architecture

ESS - 3.Tools and Technologies involved in modern search architecture - Part I

How Search works?

It's amazing how much processing is done behind the scenes of a single search. Here are four crucial tools that form the core of search processing:

Web Crawler (Spiders)
Indexing (Inverted Index)
Relevance score
SERP (Search Engine Results Page)

When you make a search in google, yahoo, bing or any popular search engine. The process doesn't involve running a marathon through entire documents present in world wide web. As it becomes heavily complex for every search and increase in data over time so instead every search engine has an index which is simply a lookup database to quickly find the relevant information when say quick they mean it. Google claims it's search results will be delivered in less than 1/2 a second 💪

Web Crawler:

- Web crawlers are those little software spiders that goes and indexes all the content following links within one document to another. These web crawlers do their job continuously as the new content gets added or any changes made to existing content.

To better understand the behind the scenes job please visit the following videos:

Inverted Index:

- Indexing is used to quickly perform the data retrieval operations by trading some space and processing time. Different types of index include:

Forward Index:

Here indexing is mapped from documents to content.
Consider these two documents:

AUTOBOTS
A gang of good transformers

DECEPTICONS
gang of evil transformers

Document	Words
AUTOBOTS	a,gang,of,good,transformers
DECEPTICONS	gang,of,evil,transformers

Observe the way the words are normalized by making them all lower case this is just one rule search engines in real time apply many rules for indexing like adding anagrams, removing punctuation, phrasing, handling synonyms e.t.c..

Inverted Index:

Inverted Index is opposite of forward index where mapping is done from content/ words to documents in general it is a table with list of words and the documents in which they appeared.

WORDS	Frequency	Documents
a	1	AUTOBOTS
gang	2	AUTOBOTS, DECEPTICONS
of	2	AUTOBOTS, DECEPTICONS
good	1	AUTOBOTS
transformers	2	AUTOBOTS, DECEPTICONS
evil	1	DECEPTICONS

The above index will usually be sorted by words to make the look up easy.
Using the inverted index, you return the documents that are mapped to the search entries of the user.
Inverted Index is further divided into record-level index which we just saw and the other one word-level index where you also add the information about where the word is present in the document.

The above discussed Crawler and Inverted Index are must for any modern search engine. To keep this document short enough and not bore you 😇. I'll be sharing remaining details about Relevance Score and SERP in my next post.

Happy Learning ✌

Rake's Artifacts

Search This Blog

ESS - 3.Tools and Technologies involved in modern search architecture - Part I

Labels

Comments

Post a Comment

Popular posts from this blog

Spring Boot - RestTemplate PATCH request fix

RADUS#4 - Caching the response in REST API's

Settings.xml for Maven, JFrog