< BACK
Bash tips, tricks and code snippets
Generate a simple .txt
sitemap
The following is a bash
code snippet to generate a .txt
sitemap file, placing it in the /tmp
directory.
This can then be used, e.g. for a cron job.
Bash script
#!/usr/bin/env bash
### Script to generate .txt sitemap for a URL
### If called with 2>/dev/null
### will only output path to generated file
###
### NOTE: 'assets/css' links are ignored
usage() {
if (( $# < 1 )); then
>&2 echo echo -e "Script to generate .txt sitemap for a URL [ 2>/dev/null ]\n"
#>&2 echo "error"
>&2 echo "Usage: $0 URL"
exit 2 # silently return
fi
URL=$1
>&2 echo "URL: $URL"
}
random() {
local BASE36=($(echo {0..9} {A..Z}))
local STR=""
local ts=$(($(date +%s%N)/1000000)) # epoch in millisecs
for i in $(bc <<< "obase=36; $ts"); do STR="$STR"${BASE36[$(( 10#$i ))]}; done
local rnd=$(( $RANDOM * (10**16) ))
RND=${rnd#-} # absolute value
}
usage "$@"
random
TMP="/tmp/linklist_"$RND".txt"
SITEMAP=/tmp/sitemap_`date +%Y%m%d_%Hh%M`.txt
>&2 echo -n "Spidering over $URL ... "
wget --spider --recursive --level=999 --no-verbose --output-file=$TMP "$URL" && >&2 echo done.
grep -i URL "$TMP" | grep -v '/assets/css/' | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' > $SITEMAP
[[ -f "$SITEMAP" ]] && { >&2 echo "Sitemap in .txt form generated at $SITEMAP"; echo "$SITEMAP"; } || >&2 echo "Sitemap generation failed :-("
Cron
Then a crontab
entry could look something like this
15 */3 * * * (FILE=`$HOME/scripts/sitemap_generate.sh https://techblog.marcgreyling.com 2>/dev/null`; mv --backup=numbered $FILE $HOME/workspace/marcg1968.github.io/sitemap.txt)