How Search works?
It's amazing how much processing is done behind the scenes of a single search. Here are four crucial tools that form the core of search processing:
- Web Crawler (Spiders)
- Indexing (Inverted Index)
- Relevance score
- SERP (Search Engine Results Page)
When you make a search in google, yahoo, bing or any popular search engine. The process doesn't involve running a marathon through entire documents present in world wide web. As it becomes heavily complex for every search and increase in data over time so instead every search engine has an index which is simply a lookup database to quickly find the relevant information when say quick they mean it. Google claims it's search results will be delivered in less than 1/2 a second 💪
Web Crawler:
- Web crawlers are those little software spiders that goes and indexes all the content following links within one document to another. These web crawlers do their job continuously as the new content gets added or any changes made to existing content.
To better understand the behind the scenes job please visit the following videos:
Inverted Index:
- Indexing is used to quickly perform the data retrieval operations by trading some space and processing time. Different types of index include:
Forward Index:
- Here indexing is mapped from documents to content.
- Consider these two documents:
- Observe the way the words are normalized by making them all lower case this is just one rule search engines in real time apply many rules for indexing like adding anagrams, removing punctuation, phrasing, handling synonyms e.t.c..
AUTOBOTS |
---|
A gang of good transformers |
DECEPTICONS |
---|
gang of evil transformers |
Forward index for above documents would be something like this and this index will be built while parsing the document.
Document | Words |
---|---|
AUTOBOTS | a,gang,of,good,transformers |
DECEPTICONS | gang,of,evil,transformers |
Inverted Index:
- Inverted Index is opposite of forward index where mapping is done from content/ words to documents in general it is a table with list of words and the documents in which they appeared.
- The above index will usually be sorted by words to make the look up easy.
- Using the inverted index, you return the documents that are mapped to the search entries of the user.
- Inverted Index is further divided into record-level index which we just saw and the other one word-level index where you also add the information about where the word is present in the document.
WORDS | Frequency | Documents |
---|---|---|
a | 1 | AUTOBOTS |
gang | 2 | AUTOBOTS, DECEPTICONS |
of | 2 | AUTOBOTS, DECEPTICONS |
good | 1 | AUTOBOTS |
transformers | 2 | AUTOBOTS, DECEPTICONS |
evil | 1 | DECEPTICONS |
The above discussed Crawler and Inverted Index are must for any modern search engine. To keep this document short enough and not bore you 😇. I'll be sharing remaining details about Relevance Score and SERP in my next post.
Happy Learning ✌
Comments
Post a Comment