Creating Google sitemaps with Perch
I wrote this post a while back. The content can still be relevant but the information I've linked to may not be available.
I've been using a third-party service for generating a search engine (Google) sitemap for some of my Perch sites. It works well but it's not as convenient as your own method to create the sitemap file. So, I'm now creating Google sitemaps for Perch sites with a new method. Here's how:
The Perch documentation is great and there's a nifty explanation about how to create a simple Google sitemap. I searched the Perch forum as well and found this thread that describes a more extensive method. That was my starting point.
For the CVW Web Design site, I want a sitemap file that lists my pages, blog posts and blog categories. And for some sites, including this one, I want multiple sitemaps for the different sections. I'll explain that later.
More about XML Sitemaps: The Sitemaps protocol.
A single sitemap
Contrary to what many people think, my sitemap file doesn't
need to have an
.xml extension so I don't need to parse XML as PHP. However, it does need to be formatted as
.xml. So, I need a
sitemap.php page (in the root of my site) and appropriate Perch templates.
<?php header('Content-type: application/xml'); include('perch/runtime.php'); echo '<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'; perch_pages_navigation(array( 'template' => 'sitemap.html', 'add-trailing-slash' => true, 'flat' => true, 'hide-extensions' => true )); perch_pages_navigation(array( 'navgroup' =>'secondary-links', 'template' => 'secondary-links-sitemap.html', 'add-trailing-slash' => true, 'flat' => true, 'hide-extensions' => true )); perch_blog_custom(array( 'template' => 'sitemap-blog.html', 'sort'=>'postDateTime', 'sort-order'=>'DESC', 'count' => 3000 )); perch_blog_categories(array( 'sort' => 'catSlug', 'sort=order'=> 'ASC', 'template'=> 'sitemap-category.html' )); echo '</urlset>'; ?>
The main pages
perch_pages_navigation to get my pages and outputting the URLs using a
sitemap.html template in
/perch/templates/navigation. My site's rewrite rules mean that I need to hide the
.php extension and add a trailing slash. Your mileage here may vary.
<url> <loc>http://www.cvwdesign.co.uk<perch:pages id="pagePath" /></loc> <changefreq>monthly</changefreq> <priority>1.00</priority> </url>
The secondary pages
On my site, I have several secondary pages that are hidden from the main navigation but should be included in the sitemap. Also, the site has a few test pages which I
don't want in the sitemap. So, I've created a Perch navigation group that contains the secondary pages and I include these in the sitemap with a second call to
secondary-links-sitemap.html template in
/perch/templates/navigation is the same as my
sitemap.html template but with a different priority value (0.60). I'm not sure if search engines pay much attention to the priority value but I don't want all pages to have the same value.
To get the blog post URLs, I use
perch_blog_custom and a
sitemap-blog.html template, which I have in
/perch/templates/blog. I use a high
count value to retrieve all posts and I've also sorted the blog posts by date; there's no requirement to do this for the sitemap but it's easier for me to check the URLs when they are in date order.
The sitemap-blog.html template:
<url> <loc>http://www.cvwdesign.co.uk<perch:blog id="postURL" /></loc> <changefreq>monthly</changefreq> <priority>0.80</priority> </url>
Finally, for the blog category URLs, I've used
perch_blog_categories and a
sitemap-category.html template, in
<url> <loc>http://www.cvwdesign.co.uk/news/category/<perch:category id="catSlug" /></loc> <changefreq>monthly</changefreq> <priority>0.60</priority> </url>
The URL path is based on my rewrite rules. Again, I've sorted the category URLs but that's only for my benefit.
So, that's my sitemap method for CVW Web Design. There may be ways that I can improve this but it's working great at the moment.
For this site, I've recently moved it from
Textpattern to Perch and I have redirected old URLs to a new site structure. I want to see how Google indexes the new URLs and have used the same sitemap method but with separate sitemap files for site pages, blog posts and categories. It's perfectly acceptable to do this but you need to have a
sitemap_index.xml file that lists the separate sitemaps.
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.cvwdesign.com/sitemap.php</loc> </sitemap> <sitemap> <loc>http://www.cvwdesign.com/sitemap-blog.php</loc> </sitemap> <sitemap> <loc>http://www.cvwdesign.com/sitemap-categories.php</loc> </sitemap> </sitemapindex>
It's early days but I can see that Google has started to index the new blog URLs but hasn't done much with the category URLs so far. The separate sitemap approach is a good one if you want to identify problems with site indexing.
This sitemap method is working well for me and I'll be using it on future Perch CMS sites. Hopefully, I'll improve and adapt the code as I do more with it; for example, I might see if I can add the priority and change frequency properties into page attributes in Perch to allow site owners to change these.
Anyway, if you have suggestions or other ways of creating sitemaps with Perch, let me know. It would be good to compare methods.