Assignments

Shell Scripting

Bringing it all together with Shell Scripts

Here is where we'll take what we've learned and bring the commands in the last section into one script that will do all of the tasks we require and tell us how often verified Googlebots have visited us.

We'll then save it in a file so we can run it at will, add a section that will email us the results, then set the system to run the script daily.

So without further ado let's get down and do it...

You'll need to download this zip file that contains 4 files.
 

  1. googlebotcheckscript.sh
  2. email.sh
  3. variables.sh
  4. crontab.file

The main work that is done is within the googlebotcheckscript.sh - It's pretty simple really.

It takes the log files for your web server, extracts all instances of Googlebot visiting then checks each visitor's IP addresses to make sure it truly is a real Googlebot visiting.

email.sh takes those results and emails you, detailing the day before's visits.

variables.sh contains all the variables, some of which you WILL need to change to make the system work on your server.

and finally crontab.file which will help in getting the server to run these scripts daily in an automated way.

N.B. You will need to sign up for a free MailGun.com account to use the service effectively.  This script uses the MailGun service to send emails to you daily. The main reason for using this way, rather than your server directly sending the email on your behalf is two fold.

This is a system intended to increase your technical SEO knowledge and having another application, in this instance cURL being used within the scripts is a good thing, and secondly there is a chance that your server doesn't have it's MTA or mail transport agent (The application that sends emails from your server) set up to send emails.

Normally this MTA would be Sendmail, Postfix or similar, and full configuration and setup of an MTA is way outside the realms of this training.

You will also need to find where your web web server keeps its log files and adapt the date format your server uses. I've created it in the format my server uses which is the default Apache setup, but yours may be different.

Thankfully there is lots of documentation out there to help you

E.G. http://man7.org/linux/man-pages/man1/date.1.html

Once you have edited the file variables to work effectively on your server you will need to upload them. You can use SCP, SFTP or even old fashioned FTP itself... which, is a decision for you. Here's an article that discusses the differences between the protocols.

http://unix.stackexchange.com/questions/8707/whats-the-difference-between-sftp-scp-and-fish-protocols

Once uploaded, and NOT in a web accessible directory, but preferably in your home directory above the web root (or subdirectory under that) please change the permissions on the scripts.

I'd suggest changing directory to the one you placed your scripts in and changing the permissions to be rwxr-xr-x or 755

username$ chmod *.sh 755

Once you've edited the variables.sh script and everything works the way it should, setting up the automated running needs to be done.

We'll be using Crontab to do  this.

Edit the crontab.file so that the correct path to the scripts is set and once done, simply type.

username$ crontab crontab.file

at the command line.

There are many ways to add crontab entries but placing them all in a file, and importing via the above method is my preference. You can verify they are in Crontab correctly by typing

username$ crontab -l


So now you have this working, and verifying the real and verified Googlebot sessions coming to your site, you may want to think about how you could expand the scripts to do the same checking for Bing, Yandex and other search engine bots.


https://www.bing.com/webmaster/help/how-to-verify-bingbot-3905dc26
https://yandex.com/support/webmaster/robot-workings/check-yandex-robots.xml

Don't forget Yahoo, DuckDuckGo or Baidu either...
 


N.B. I strongly suggest you spend the time to get it working for other search engines. You may be tested on it :)