Skip to main content

ESS - 3.Tools and Technologies involved in modern search architecture - Part I


How Search works?

It's amazing how much processing is done behind the scenes of a single search. Here are four crucial tools that form the core of  search processing:
  1. Web Crawler (Spiders)
  2. Indexing (Inverted Index)
  3. Relevance score
  4. SERP (Search Engine Results Page)
When you make a search in google, yahoo, bing or any popular search engine. The process doesn't involve running a marathon through entire documents present in world wide web. As it becomes heavily complex for every search and increase in data over time so instead every search engine has an index which is simply a lookup database to quickly find the relevant information when say quick they mean it. Google claims it's search results will be delivered in less than 1/2 a second 💪

Web Crawler:
- Web crawlers are those little software spiders that goes and indexes all the content following links within one document to another. These web crawlers do their job continuously as the new content gets added or any changes made to existing content.



To better understand the behind the scenes job please visit the following videos:
Inverted Index:
- Indexing is used to quickly perform the data retrieval operations by trading some space and processing time. Different types of index include:

   Forward Index:
  • Here indexing is mapped from documents to content.
  • Consider these two documents:
  • AUTOBOTS
     A gang of good transformers

    DECEPTICONS
     gang of evil transformers

    Forward index for above documents would be something like this and this index will be built while parsing the document.
    DocumentWords
    AUTOBOTS a,gang,of,good,transformers
    DECEPTICONS gang,of,evil,transformers
  • Observe the way the words are normalized by making them all lower case this is just one rule search engines in real time apply many rules for indexing like adding anagrams, removing punctuation, phrasing, handling synonyms e.t.c..
   Inverted Index:
  • Inverted Index is opposite of forward index where mapping is done from content/ words to documents in general it is a table with list of words and the documents in which they appeared.
  • WORDS Frequency Documents
    a 1 AUTOBOTS
    gang 2 AUTOBOTS, DECEPTICONS
    of 2 AUTOBOTS, DECEPTICONS
    good 1 AUTOBOTS
    transformers 2 AUTOBOTS, DECEPTICONS
    evil 1 DECEPTICONS
  • The above index will usually be sorted by words to make the look up easy.
  • Using the inverted index, you return the documents that are mapped to the search entries of the user.
  • Inverted Index is further divided into record-level index which we just saw and the other one  word-level index where you also add the information about where the word is present in the document.

The above discussed Crawler and Inverted Index are must for any modern search engine. To keep this document short enough and not bore you 😇. I'll be sharing remaining details about Relevance Score and SERP in my next post.

Happy Learning ✌




Comments

Popular posts from this blog

Spring Boot - RestTemplate PATCH request fix

  In Spring Boot, you make a simple http request as below: 1. Define RestTemplate bean @Bean public RestTemplate restTemplate () { return new RestTemplate (); } 2. Autowire RestTemplate wherever you need to make Http calls @Autowire private RestTemplate restTemplate ; 3. Use auto-wired RestTemplate to make the Http call restTemplate . exchange ( "http://localhost:8080/users" , HttpMethod . POST , httpEntity , String . class ); Above setup works fine for all Http calls except PATCH. The following exception occurs if you try to make a PATCH request as above Exception: I / O error on PATCH request for \ "http://localhost:8080/users\" : Invalid HTTP method: PATCH ; nested exception is java . net . ProtocolException : Invalid HTTP method: PATCH Cause: Above exception happens because of the HttpURLConnection used by default in Spring Boot RestTemplate which is provided by the standard JDK HTTP library. More on this at this  bug Fix: This can b...

RADUS#4 - Caching the response in REST API's

  Caching in spring boot app: Caching can be used to provide a performance boost to your application users by avoiding the business logic processing involved again and again, load on your DB, requests to external systems if the users request data that's not changed frequently Different types of caching: We'll be focusing more on in-memory caching in this post i listed other options available to have an idea. In-memory caching You'll have a key-value data stores that stores the response of the request after it is served for the first time There are multiple systems like Redis, Memcached that do this distributed caching very well By default Spring provides concurrent hashmap as default cache, but you can override CacheManager to register external cache providers. Database caching Web server caching Dependencies needed: Maven < dependency > < groupId > org . springframework . boot </ groupId > < artifactId > spring - boot - starter - cache ...

Set BIND VARIABLE and EXECUTE QUERY programmatically in ADF

A very common scenario in ADF is to set a bind variable and execute query programmatically within AMImpl/ VOImpl classes. Here's a simple way to do this: To set bind variable for all rowsets:       ViewObjectImpl someVO = this.getSomeViewObject();       VariableValueManager vMngr = someVO.ensureVariableManager();        vMngr.setVariableValue("DefinedBindVariable",value);        someVO,executeQuery(); To set bind variable for default rowset:          ViewObjectImpl someVO = this.getSomeViewObject();          someVO.setNamedWhereClauseParam("DefinedBindVariable",value);          someVO,executeQuery();