If you want your website to appear among the first results of the search pages, you should know what is crawling a URL with Google and which is the best way to do it. This way, you will ensure that the search engine correctly indexes all the content of your website.
Table of contents
what is URL crawling?
To understand what itmeans to crawl a URL, you must be clear about what spiders or web crawlers are.
Spiders, also called crawlers, are programs that travel the web looking for new content. They are like web crawlers. When they find one that was not in the index, they navigate through it, following links, activating scripts and gathering information. Then, they send the result of their research to the search engine, so that it can index the new content.
This process is what we call crawling a URL. And it is not specific to Google, as all search engines launch their crawlers in search of “updates” on the Web.
how to crawl a web page?
There are 3 recommended methods tocrawl your websitewith Google, and you should use all of them at the same time if you want optimal results.
1. Meta canonical
It is a powerful resource to help Google crawl your web pages. Meta canonical is the HTML attribute that tells the search engine to index only a specific URL.
For example, let’s say your site is accessible through several URLs:
www.tusitio.com tusitio.com www.tusitio.com/index.php tusitio.com/index.php
As you can see, it is the same site but referenced differently each time. The search engine may assume that there is duplicate content, as it has detected several URLs that lead to the same content. To avoid this, you add a preferred URL in the header between the <head> and </head> of your page’s HTML:
<link rel='canonical' href='tusitio.com'/>
Hreflang attributes
The hreflang attribute is included in the <link> tag to indicate what language the page content is in. That is, of course, if you have language versions of your site. For example:
<link rel='alternate' hreflang='en' href='http://www.tusitio.es'>
Note that the URL does not have to be as above. You could also put: en.tusitio.com and www.tusitio.com/es, among other variants.
Robot directives
Therobot directives tell Google how to index your web pages without hindering users’ navigation. Ell allows, among other things
avoid duplicate content problems
not to index parts of the site that you prefer
to hide
To do this, simply include the <robots> meta tag in the code of the page whose crawling you want to control:
<meta name="robots" content="directive1, directive2, ..." />
The value of contents can be the following directives:
noindex: prevents Google from indexing the
website (its opposite is index
and you don’t have to indicate it, as the search engine assumes it by default)
nofollow: prevents crawlers
from following the link (its opposite is follow,
and works the same as index)
notranslate: tells the search engine that
does not offer translations of the page
noarchive: do not show the page
available Google cache
nosnippet: do not display the
information from the snippets in the
search results
unavailable_after: the page will no longer appear in the search results after a while
how to re-crawl a URL?
Related Posts
Subscribe to Newsletter
Subscribe to our newsletter to be informed about Innovadeluxe news and offers, and you will also receive free information about marketing tips to increase sales in your business.
Basic information on data protection
Responsibility: We inform you that the personal data you provide by filling in this form will be processed by IDX with CIF: B86091451 as the company that owns and is responsible for this website. Purpose: The purpose of the collection and processing of personal data is to manage the request for information on products, services or commercial promotions offered by IDX through www.innovadeluxe.co.uk. Legitimation: Consent of the interested party. Recipients: Our system hosts the information on servers located at www.ovh.es, and we also use Acumbamail.com to manage our email communications. Rights: You may exercise your rights of access, rectification, limitation and deletion of data at rgpd[at]innovadeluxe.com as well as the right to lodge a complaint with a supervisory authority. Additional Information: You can consult the additional and detailed information on Data Protection in the Legal Notice.