Have you ever wondered how search engines actually work?
If you go to Google, and conduct a search for ‘backlinks,’ you will get a search engine results page (SERP) with a couple of paid ads, a featured snippet, and the top 10 organic results for that keyword.
But if you look closer, just below the search bar, you’ll see that those results were pulled from a total of 36,900,000 possible results for backlinks, and were delivered to you in .92 seconds.
That’s almost 37 million possibilities, sorted, ranked, and delivered to your web browser in precisely ninety-two one-hundredths of one second.
That pretty impressive, even for a computer right?
There’s a lot that goes on behind the scenes (that most people never even think about), every time you perform a Google search.
In this article, we’re going top cover the ‘behind the curtain’ aspects of a search engine, and explain how Google can sort through that much information on a topic, and present you with the most relevant results in less than a second.
What is a search engine?
Search engines like Google consist of three basic components:
- the crawler
- the algorithm
- the index
How search engines crawl websites
The part of a search engine that actually touches a website is a crawler. Crawlers are also referred to as search engine spiders, but both terms refer to the same thing.
Before the crawlers go to a webpage, the search engines create a list of URL’s to be crawled and then go to a scheduler.
Scheduling the crawlers
The scheduler is not a person, it’s a piece of the search engine itself.
The scheduler then decides when each URL (webpage) should be crawled, based on the relative importance of both new and known (previously crawled) URL’s.
Then the scheduler sends the search engine spiders to crawl the page at the assigned time.
Send in the spiders (crawling)
A search engine spider (or crawler) is essentially a computer program designed to download the content of webpages.
Crawlers are how search engines ‘discover’ new content on the web by re-crawling known URL’s to see if new links have been added to the page.
For example, every time we publish a new article to our blog, the title of the new article, along with the featured image and the URL for the webpage appears in the first (top-left) position on our main blog page.
Then whenever the search engines re-crawl and download the content of our main blog page, they find the link to our new blog article.
After the crawler has downloaded the content of a page (URL), the contents pf the page are then passed to what is known as the parser.
The search engine parser’s job is to extract the links, as well as other important information from the downloaded page.
The parser will then send the list of URL’s that have been crawled (and downloaded) to the scheduler.
The links, and important information that the crawlers downloaded from the page are then indexed by the search engine.
Indexing, also referred to as indexation, is the step in the process where the parsed information from the URL’s that were crawled is added to the search engine’s database.
In technical terms, Google’s database is referred to as a search index.
The search index is like a giant library, a digital library, that contains the information from billions of webpages that have been published on the internet.
FURTHER READING: Now that you’ve got an idea of how search engines work, you might be wondering how long does SEO take to show results?