Lesson 2 - How Googlebot Crawls The Web

Transcript

Before your pages can appear in Google search results, Google first needs to find them. This is the process we spoke about in the previous video, which is known as crawling and is the foundation of everything else in SEO.

If Googlebot can’t access your pages, they will not show up when your potential customers are searching for what you offer. It don’t matter how good your content is. If it can’t be crawled, it might as well not exist.

Now, a good way to think about this is like a shop that doesn’t have any roads or any pathways leading to it. The products can be great, but if no one can get there, nobody’s going to buy anything. That’s why understanding how Googlebot works is so important.

And once you understand crawling, you’ll be able to make better decisions about your website without ever having to guess.

Now, by the time you finish watching this video, you’re going to clearly understand what crawling is and how Google bot discovers your pages.

Googlebot is Google’s web crawler. It’s an automated program that explores the internet by visiting websites and collecting information about their pages. Its job is actually very simple. It just finds pages, reads them, and then sends that information back to Google so it can decide how those pages should appear in the search results.

This process of discovering pages is what we call crawling. Googlebot moves around the web by following links much like a person clicking from page to page. As it visits a page, it looks at things like the text, the images, and the underlying code, and then passes that data back to Google for further analysis.

Today, most websites are crawled using Googlebot smartphone. This means Google mainly looks at the mobile version of your site and not the desktop version. And that’s why mobile usability is not just optional anymore.

If your site doesn’t work very well on a mobile phone, it can directly affect how Google understands and crawls it.

There are a few different ways that Googlebot discovers pages. And in this section of the video, we’re going to go through some of the most common ones.

Now, the first is by following links. Just how you might click a link to move from one page to another. Googlebot follows links to find new content. These could be internal links which connect one of the pages of your website to another, or it could be external backlinks which come from a totally different website that links through to yours.

An important takeaway here is that if you have pages on your website that aren’t linked to from anywhere else, which are known as author pages, Googlebot might never find them. And that’s why we’ll take a close look at internal links later on in this course.

The second way that Googlebot finds pages is through an XML sitemap, which is exactly what it sounds like. It’s basically a list of all the different pages on your website that you want Google to know about.

Once you’ve got your sitemap, you can submit it through a tool called Google Search Console, which we’re also going to take a deeper look at later on in this course as well. Now, most websites, especially those built using a content management system like WordPress, Joomla, Magento, they’ve got an automated sitemap, which means that pages get automatically added and removed from it as you make changes to your website.

This is really good as it keeps Google up to date with which pages you have available on your website right now.

And then the third way is by manually prompting Google to crawl a page using the URL inspection tool that’s also inside of Google Search Console. You simply paste in the page URL and then you click request indexing.

Now this is extremely useful when you have launched a new page or if you’ve made some important updates to an existing page and you want Google to check it sooner rather than later.

Internal links do more than just help Googlebot to find pages. They also help Google understand which pages matter most on your website. You can think of your internal links as signals of importance. When you link to a page multiple times across your site, you’re effectively telling Google that this page is worth paying attention to.

For example, imagine a website that sells coffee machines. Now, if the blog posts, the category pages, and the other content regularly link to a page called best coffee machines for offices, Google can reasonably assume that that page is one of importance.

That said, internal links should always be used in a way that makes sense for users, too. Adding links purely for SEO without any real relevance usually does more harm than good. Internal links should actually help people to navigate your website and understand it more easily.

Crawl budget refers to how many pages Googlebot will crawl on your website within a certain period of time. This is because Google don’t have unlimited resources and they need to decide how much attention they’re going to give to each website.

For most small to medium-sized websites, crawl budget is not something you’re going to need to worry about. If your site’s got fewer than around 10,000 pages, Google is usually very good at crawling it efficiently.

But if your site does have more than 10,000 pages, you may want to have a look into crawl budget. But that certainly goes outside the scope of this course, as it’s not going to apply to the vast majority of people who are watching.

In fact, at this point, you already understand more about crawling and how Google finds pages than the vast majority of people who work in SEO. And in the next video, we’re going to learn about indexation and how Google decides whether a page is eligible to appear in the SERPs.

Lesson 2 – How Googlebot Crawls The Web

Transcript

Meet Dan M. Jones