Javascript: Generate & Validate Sitemap to Improve the new SEO.
Generate and validate sitemap.xml with javascript
Hi everyone, Hopefully, you are healthy and well.
Long time no see yaa, In this article I want to share how to generate sitemap.xml for a website with javascript.
The sitemap.xml is very important for SEO because it will give your website a good score on SEO. Also, to make how your pages on your website show on the first-page search engine Google.
Intro
We will discuss sitemap.xml in the website that uses SPA (Single Page Applications) like angular, react, vue, etc. Maybe we know that a lot of websites can generate sitemaps for websites that are already hosted. Just copy the link and walaaa, the file sitemap.xml is already done. You just need to download that file and upload it on your hosting website.
But, what about if we have ourself API to generate every single page with dynamics like every book, song, and other thing?
Don’t worry, we can create our sitemap.xml with some script and just run it once, So, it will generate a file sitemap.xml with all of the link detail content like name-of-website/link-detail-from-api .
Also, if you want to automatically run the script once a month you can create a cron-job to run the script on the server and the file sitemap.xml will update every month.
Setup
First, we need to install this library. To convert the JSON to XML or otherwise.
npm i --save-dev xml-js
Also, need to install fetch-node to run fetching data from node env.
npm i --save-dev node-fetch@2.6.1
The reason why we need a specific version is because we can still use require, but if we use the greater than version of that it will get an error Error [ERR_REQUIRE_ESM]: require()
Next, create file index.js and define this variable
const fs = require('fs');
const convert = require('xml-js');
const fetch = require('node-fetch');
const hostBaseURL = 'http://website.com/detail';
const untrackedUrlsList = [];
const options = { compact: true, ignoreComment: true, spaces: 4 };
let totalData = 0;
Information:
- fs is a library for read-and-write file system
- convert is a library for converting JSON to XML or otherwise
- fetch is a library for fetching data from node env
- hostBaseURL is the address URL
- untrackedUrlsList is a variable to save data from fetching data
- options are default config to read or write files using xml-js
- totalData is a variable that will be a parameter for using pagination when fetching Data.
Then, we need to create a template file sitemap.xml like this.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://website.com/</loc>
<priority>1.00</priority>
</url>
<url>
<loc>http://website.com/detail/delectus-aut-autem?id=1</loc>
<priority>0.80</priority>
</url>
</urlset>
In this article, We use open API from jsonplaceholder.
Let’s continue to write code in index.js
Create an async arrow function fetchData with parameter limit and start to call API with pagination.
const fetchData = async (limit = 10, start= 0) => {
try {
const url = `https://jsonplaceholder.typicode.com/todos?_limit=${limit}&_start=${start}`;
const data = await fetch(url)
return data.json();
} catch (error) {
console.log(error);
}
}
Next, create an arrow function generateSitemap like this
const generateSitemap = async () => {
try {
const callAPI = [];
const initAPI = await fetchData(10, 0);
totalData = initAPI?.totalPage || 200;
for (let index = 1; index <= totalData; index++) {
callAPI.push(fetchData(10, index));
}
const data = await Promise.all([...callAPI]);
data.forEach(item => {
collectSitemap(item);
})
filterUniqueURLs();
} catch (error) {
console.log(error);
}
}
We use try-catch to fetch data. Then, we need to call the API first to know how much total data the API has. Usually for the model API has a key totalPage but in this case, I just put 200, because the API doesn’t have totalPage, but has a limit of 200 data from the API itself.
After that, use for to iterate to collect all hit API from start 0 until 200. Then we save the URL promise hit API into variable callAPI.
Next, use await promise.all to retrieve all data from 0 until 200 with pagination data.
The response API is like this.
https://jsonplaceholder.typicode.com/photos?_start=0&_limit=5
Responses:
[{
"userId": 1,
"id": 1,
"title": "delectus aut autem",
"completed": false
},
{
"userId": 1,
"id": 2,
"title": "quis ut nam facilis et officia qui",
"completed": false
},
...
]
After that, we need to create the function collectSitemap.
const collectSitemap = (data = []) => {
if (data.length > 0) {
data.forEach(item => {
const modifiedURL = item.title.replace(/[@\s!^\/\\#,+()$~%.'":*?<>{}]/g, '-').toLowerCase() + `?id=${item.id}`;
const encodingURL = encodeURI(modifiedURL);
untrackedUrlsList.push(`${hostBaseURL}/${encodingURL}`);
});
}
}
That function is to create a dynamic URL with format domain/detail?id=number. Because the retrieve data from promise.all is an array, So, we need to iterate the item and then create a format URL and save it into the untrackedUrlsList.
The function of the regex is to remove all of the special characters like [@\s!^\/\\#,+()$~%.’”:*?<>{}] and likefrom the URL and to ensure the URL is valid we use encodeURI. The URL becomes like this http://website.com/detail/ipsa-repellendus-fugit-nisi?id=12
After executing the function collectSitemap inside of iteration, we need to create the function filterUniqueURLs
const filterUniqueURLs = () => {
const newDate = new Date();
const date = [ newDate.getFullYear(), ('0' + (newDate.getMonth() + 1)).slice(-2), ('0' + newDate.getDate()).slice(-2)].join('-');
fs.readFile('sitemap.xml', (err, data) => {
if (data) {
const existingSitemapList = JSON.parse(convert.xml2json(data, options));
let existingSitemapURLStringList = [];
if (existingSitemapList.urlset && existingSitemapList.urlset.url && existingSitemapList.urlset.url.length) {
existingSitemapURLStringList = existingSitemapList.urlset.url.map(ele => ele.loc._text);
}
const removeDuplicate = [...new Set(untrackedUrlsList)];
removeDuplicate.forEach((ele, i) => {
if (existingSitemapURLStringList.indexOf(ele) == -1) {
existingSitemapList.urlset.url.push({
loc: {
_text: ele,
},
priority: {
_text: 0.8
},
changefreq: {
_text: 'monthly'
},
lastmod: {
_text: date
}
});
}
});
createAndSaveSitemapFile(existingSitemapList);
}
});
}
The function is used to read the existing URL on file sitemap.xml, if the URL is existing, then skip it, but if there is a new URL, then collect it and save it with format loc and priority.
Actually, we can add multiple options on the sitemap.xml like a url, priority, changefreq, lastmode, etc.
As I said to read the file sitemap.xml we need to use xml-js and remove the duplicate URL using the function of new Set(array).
The format is like this.
The object is like this.
After that, we need to create the function createAndSaveSitemapFile
const createAndSaveSitemapFile = (list) => {
const finalXML = convert.json2xml(list, options);
fs.writeFile('sitemap.xml', finalXML, (err) => {
if (err) {
return console.log(err);
}
console.log("Success generate sitemap.xml!");
});
}
Just convert the list of existingSitemapList to XML and update the file sitemap.xml
After we have generated a new sitemap.xml. Then, we want to validate is the URL valid or not with this function.
const urlStatus = [];
function readFileSitemap() {
fs.readFile('sitemap.xml', 'utf8', (err, data) => {
if (err) {
return console.log(err);
}
const { urlset: { url } } = JSON.parse(convert.xml2json(data, options));
url.forEach((item, idx) => {
const isValid = isValidUrl(item.loc._text);
const splitUrl = item.loc._text.split('?id=');
urlStatus.push({
url: item.loc._text,
id: splitUrl[1],
title: split[0].split(`${hostBaseURL}/`)[1],
statusUrl: isValid
})
})
writeLogSitemap()
});
}
function writeLogSitemap() {
let data = JSON.stringify(urlStatus, null, 2);
fs.writeFile('log-sitemap.json', data, (err) => {
if (err) throw err;
console.log('Success create log');
});
}
function isValidUrl(urlString) {
var urlPattern = new RegExp('^(https?:\\/\\/)?' + // validate protocol
'((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.)+[a-z]{2,}|' + // validate domain name
'((\\d{1,3}\\.){3}\\d{1,3}))' + // validate OR ip (v4) address
'(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*' + // validate port and path
'(\\?[;&a-z\\d%_.~+=-]*)?' + // validate query string
'(\\#[-a-z\\d_]*)?$', 'i'); // validate fragment locator
return !!urlPattern.test(urlString);
}
Information
- The function readFileSitemap is used to read data from the file sitemap.xml . Inside the function, we just get the URL. After that iterate it and check the URL with function isValidUrl.
- The function isValidUrl is used to check the URL with regex if the URL is valid or not. If valid it will return true or otherwise.
- Inside the iterate of the URL, we push on variable urlStatus with this format { url, id, title, isValid}. After finishing the iterate we run the function to writeLogSitemap.
- The function writeLogSitemap is used to write file log-sitemap.json from the data variable urlStatus.
The last step is to execute the function generateSitemap and validate URL. Just add this code at the bottom of the code.
generateSitemap();
Don’t forget to call the function readFileSitemap() inside of the function createAndSaveSitemapFile like this
const createAndSaveSitemapFile = (list) => {
const finalXML = convert.json2xml(list, options);
fs.writeFile('sitemap.xml', finalXML, (err) => {
if (err) {
return console.log(err);
}
console.log("Success generate sitemap.xml!");
readFileSitemap();
});
}
Result
After the finished generated sitemap. The file sitemap.xml becomes like this.
And for the log-sitemap.json becomes like this.
If you are curious about the code, here I put the link repository.
Conclusions
After we did this tutorial, we already know that it’s not too hard to create our own sitemap.xml, especially if we have API and need to improve the SEO.
Also, this is one of a lot of ways to improve the SEO, like we can improve the meta title, meta desc, meta image, and like that. Improve the image and alt and many other things.
Hopefully, this article is useful for you, thank you for reading this article.
References
#javascript #SEO #sitemap #generate #validate