Google webmaster tools provide suggestions to improve your site ranking. The suggestions generally involve making it easier for their crawlers to find your content. One such suggestion is adding a sitemap. Adding a sitemap can increase your visibility on the internet.
A sitemap is a listing of pages within your site. Web crawlers often work by starting at the root of your website (like https://example.com/), and then navigating to each of the links on that page. The crawler will repeat the process until it can’t find any more links. For a well-linked site, this process works well. Unfortunately, if you have a page that is not linked to other pages, the crawler will not know that it exists. The benefit of a sitemap is that you can inform the crawler about all your pages, whether they are linked well to other pages or not.
handroll 3.1 includes a sitemap extension
that will generate a sitemap for you automatically.
To use it, add the following to your handroll.conf
.
[site]
with_sitemap = True
That’s it!
From now on,
any of your HTML files will be included
in a sitemap.txt
file.
Once you have a sitemap,
you should inform web crawlers
of its location.
Conventionally, websites “communicate”
with web crawlers
via a robots.txt
file.
This file gives instructions
of what a crawler should
or should not crawl.
It also happens to be the place
where you can specify the location
of a sitemap file.
robots.txt
wants the full URL
to the sitemap file
so I used handroll’s new Jinja 2 template composer
to generate my file
without hardcoding my domain.
The whole file,
named robots.txt.j2
,
looks like:
User-agent: *
Disallow:
Sitemap: {{ config.domain }}/sitemap.txt
With one additional line in my configuration file and three lines in a template file, I made it easier for web crawlers to find everything I care about on my website.