< BACK

Bash tips, tricks and code snippets

Generate a simple .txt sitemap

The following is a bash code snippet to generate a .txt sitemap file, placing it in the /tmp directory.

This can then be used, e.g. for a cron job.

Bash script

#!/usr/bin/env bash

### Script to generate .txt sitemap for a URL
### If called with 2>/dev/null 
### will only output path to generated file
###
### NOTE: 'assets/css' links are ignored

usage() {
        if (( $# < 1 )); then
                >&2 echo echo -e "Script to generate .txt sitemap for a URL [ 2>/dev/null ]\n"
                #>&2 echo "error"
                >&2 echo "Usage: $0 URL"
                exit 2 # silently return
        fi
        URL=$1
        >&2 echo "URL: $URL"

}

random() {
        local BASE36=($(echo {0..9} {A..Z}))
        local STR=""
        local ts=$(($(date +%s%N)/1000000)) # epoch in millisecs
        for i in $(bc <<< "obase=36; $ts"); do STR="$STR"${BASE36[$(( 10#$i ))]}; done
        local rnd=$(( $RANDOM * (10**16) ))
        RND=${rnd#-} # absolute value
}

usage "$@"

random

TMP="/tmp/linklist_"$RND".txt"
SITEMAP=/tmp/sitemap_`date +%Y%m%d_%Hh%M`.txt

>&2 echo -n "Spidering over $URL ... "
wget --spider --recursive --level=999 --no-verbose --output-file=$TMP "$URL" && >&2 echo done.
grep -i URL "$TMP" | grep -v '/assets/css/' | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' > $SITEMAP

[[ -f "$SITEMAP" ]] && { >&2 echo "Sitemap in .txt form generated at $SITEMAP"; echo "$SITEMAP"; } || >&2 echo "Sitemap generation failed :-("

Cron

Then a crontab entry could look something like this

15 */3 * * * (FILE=`$HOME/scripts/sitemap_generate.sh https://techblog.marcgreyling.com 2>/dev/null`; mv --backup=numbered $FILE $HOME/workspace/marcg1968.github.io/sitemap.txt)