The Screaming Frog SEO Spider software is no doubt, a tool every SEO should have and should know how to use. It is very powerful and very fast at what it does, which is basically all about crawling websites.
However, if you’ve been using it for a while, you will have noticed its drawbacks also. A couple of things that I don’t like when using the software are:
It’s a bandwidth hog
When I’m running a crawl and I configure it with a higher speed setting, it eats up my bandwidth and my normal browsing experience is greatly affected. So I can’t really work effectively on other things while the crawler is running.
It’s limited by the RAM my computer has
The number of URL’s on a website that it can crawl is directly limited by the amount of RAM that your computer has. If you’re going to crawl a large site and you only have 8GB of RAM like I do, ~200,000 URL’s is about all you can do.. usually even less, if you’re also using your computer for doing other stuff at the same time. So obviously, you can’t crawl and study big sites like say, NYTimes.
So to get rid of the above annoyances and also increase the power of Screaming Frog at the same time, the logical thing to do is to run it on a VPS.
I first considered running it on a Windows VPS. But the problems with this are that VPS’s cost much more if you want Windows on them; and running Windows itself already takes away a huge portion of the virtual machine’s resources.
On the other hand, GNU-Linux OS’s don’t cost anything if you use them on your virtual machines. And they are generally lighter on system resources. The only problem is they can be daunting to use at first.
Then there’s Google Cloud which offers 60-day free trial on their cloud services. They let you deploy Virtual Machines which you can use to run any app you want. This is perfect for running Screaming Frog. But when I first thought of this, the big problem that I faced was how to go about doing it.
The only resource that I found by googling about it, is this outdated instructions post. It was from last year and there have been a few big changes to Google Cloud’s platform that made the instructions on that post obsolete and confusing (at least for me). Nevertheless, it was a great help and it pointed me to the right direction.
Here’s what I ended up doing:
Get a Google Cloud Account
Go to Google Cloud Platform‘s page and signup for a free trial. You have to sign in with your Google account and enter your credit card info.
They won’t charge you anything yet, the credit card info is just for ensuring you’re a real person signing up and not just spam. You also won’t be automatically charge at the end of the trial period. They made it quite clear that they will explicitly ask for your permission first before actually charging your card. Your virtual machines will just be paused when the trial expires. They won’t even be deleted until 30 days after.
Create a Project
Right after signing up for your trial, you should be taken to your developer console where you can create a new project.
Click on “Create Project” to create a new project to be used for this purpose.
You’ll have to give your project a name and a globally-unique project ID.
Enable Cloud Compute API
When the project has been created, you will be taken to the project dashboard. You will have to go the APIs section and enable the Cloud Compute API.
Then click “Enable API” on the next page:
Create a VM Instance
From the left menu bar, go to
Compute >> Compute Engine >> VM Instances. It’s going to take a few seconds for initializing the Compute Engine if it’s the first time. Then you’ll see this screen:
Click “Create Instance” to create your first Virtual Machine. You’ll then have to give it a name, choose a datacenter location (Zone), the machine type and the disk image to use.
For the machine type, I would pick n1-highmem-2 because it’s the machine type available to trial users which has the biggest RAM. Screaming Frog is again, heavy on RAM usage but not so much on CPU, so this machine type is just right for this purpose.
For the boot disk image, choose the debian-7-wheezy variant of GNU-Linux. This is the one I used for this guide.
For the Firewall settings, it probably doesn’t matter because we won’t be using this VM to serve webpages, but just tick both “Allow http traffic” and “Allow https traffic”.
For the additional options at the bottom, you can just leave them at their default values. Then click “Create” to deploy the new VM. You’ll have to wait for a few seconds as the new VM is being created and deployed. Afterwards, you should see something like this:
Install a VNC Server on the VM
In order to run Screaming Frog effectively and easily on this VM, we need to have a graphical user interface through which we can operate the VM remotely, from our computer. There are other solutions to this but in this guide, I’ll use a VNC software called TightVNC
To install VNC, you’ll have to SSH into your new VM and use the command line. There are also other ways to do this but the easiest way for now is by using the browser-based SSH console that Google Cloud provides:
This opens up an SSH console you can use to install software on your VM.
Now, to install TightVNC server, we need to switch to the root user first. Enter the following command:
Then update the software packages in the system before actually installing anything:
And here’s to actually install the TightVNC server and the other programs we need for it:
apt-get install tightvncserver xfce4 xfce4-goodies xdg-utils openjdk-6-jre software-properties-common python-software-properties
It may take a few minutes to install all of them. When prompted for a keyboard configuration, you can just choose the default. (use the Space bar on your keyboard to make a selection and Tab to navigate to the “OK” option and Enter to continue)
Afterwards, we need to add a new user named vnc. Execute the following command:
You will prompted for a password, enter up to 8 characters for this. Then you can just skip the other values by hitting Enter. Lastly, type Y and press Enter to confirm.
Now you need to switch to the new user:
Then set a new password for the user. This will be what we will use when we connect to the VM later on. You can just use the same password you used when creating this user.
When prompted whether you want to enter a “view-only” password, you can just select no (N). When done, switch back to the root user by entering the following command:
Set Up Startup Scripts
We need to setup startup scripts that will run the VNC server automatically every time the VM is powered on or restarted. Use the following command to download the first script:
wget http://filiwiese.com/files/vncserver -O /etc/init.d/vncserver
And then the second one:
wget http://filiwiese.com/files/xstartup -O /home/vnc/.vnc/xstartup
You need to apply some configuration settings to make the VNC server work properly. Enter the following commands:
chown -R vnc. /home/vnc/.vnc && chmod +x /home/vnc/.vnc/xstartup sed -i 's/allowed_users.*/allowed_users=anybody/g' /etc/X11/Xwrapper.config chmod +x /etc/init.d/vncserver
Set the size of the screen you want when connecting to your VM via VNC later on. Open the VNC Server script using the following command:
Then change the “GEOMETRY” value into the resolution you want:
CTRL + Xon your keyboard when done, to close the file. Type “Y” to save the changes when prompted and hit Enter.
Then you’ll have to reboot the VM to apply the changes:
By doing this, you will lose your SSH connection to the VM instance. Just wait for a minute or two for the VM to restart and then reconnect via SSH by clicking the SSH button again:
Then switch to the root user again:
Then you can start the VNC service by executing the following commands:
update-rc.d vncserver defaults service vncserver start
And that lets you connect to your VM instance using any client VNC software. (More on this below…)
Install Screaming Frog
Now, on to actually installing Screaming Frog. First, we need Java. Install Java on your VM using the following commands:
echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee /etc/apt/sources.list.d/webupd8team-java.list echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886 apt-get update apt-get install oracle-java8-installer
You will be prompted to accept Oracle’s License Agreement. Just accept it to proceed with the installation (use arrow keys to select “Yes”).
Then set Oracle-Java as the default Java library for your system:
apt-get install oracle-java8-set-default
We also need the “
ttf-mscorefonts-installer” installer before we can install the latest version of Screaming Frog. Use the following command:
add-apt-repository "deb http://http.debian.net/debian wheezy main contrib non-free" && apt-get update && apt-get install ttf-mscorefonts-installer
Now we can install Screaming Frog. First, you have to download the latest version from Screaming Frog’s website:
To check for the latest version of screaming frog, visit their download page.
dpkg -i screamingfrogseospider_3.3_all.deb
An error may be thrown because of a missing dependency called “zenity”. You can fix that by the following command:
apt-get -f install
After that, Screaming Frog should now be installed on your VM!
Install VNC Client on your Computer
To connect to your VM and run Screaming Frog there, you need to install a VNC client software on your own computer. You can just use TightVNC for this also. If you are using Windows, download the installer from their website. Be sure to download and install the appropriate version for your system. Use the 64-bit version if you’re using a 64-bit system, and likewise for 32-bit.
When you install TightVNC, you don’t have to install the server software, just the client part (TightVNC Viewer).
In my example here, I’ll be using TightVNC.
Allow Connection to the VM via the TightVNC Port
After installing a VNC client on your computer, you have to open a port on your VM where the VNC server is listening. The default port used by TightVNC is port tcp:5900.
To open it, go back to your Google Cloud developer console. Inside your project, go to
Compute >> Compute Engine >> Networks. Under Networks, click on default as shown below:
On the next page, click “Add firewall rule“:
Give your new firewall rule a name. For the Source filter, select IP ranges. and in the Source IP Ranges field, enter “
0.0.0.0/0“. In the Allowed protocols or ports field, enter “
tcp:5900; tcp:5901; tcp:5902“.
Click Create and that should allow your VNC client software to connect to your Google Cloud VM!
Connect to Your VM From Your Computer
To connect to your VM instance, you’ll need the external IP address of your VM. You’ll find it inside your Google Cloud developer console by going to
Your Project >> Compute >> Compute Engine >> VM Instances:
Open the TightVNC Viewer that you have installed in the previous steps above and in the “Remote Host” field, enter the IP address of your VM followed by two colons and the tcp port 5901, like so:
Click “Connect“. It will then attempt to connect to your VM instance and when it establishes a connection, it asks for the password. This is the password that you have set in the above steps when you created the VNC user.
Then you should be taken to your XFCE homescreen on your VM! From there, you can then start Screaming Frog. You can find it under
Applications Menu >> Internet:
Change Screaming Frog’s RAM Setting
After opening Screaming Frog for the first time via VNC, a configuration file is created inside the vnc user folder on your VM. It’s in this file where you can change the amount of RAM you want to let Screaming Frog use.
Go back to your SSH console and while using the root user profile, open the Screaming Frog config file by the following command:
The VM that we’ve set up here in this guide, has 13GB of RAM. If you’re not going to run any other apps on this machine, you can allow Screaming Frog to use up to 11GB:
When you’re done, press
CTRL + Xon your keyboard to close the nano editor and type “Y” to save the changes. Then restart Screaming Frog via your VNC display to apply the new setting.
Multiply the Crawling Power by Deploying More Instances
You can clone your existing instance in order to have multiple instances with all of the same settings that we’ve configured above, without actually doing them again.
To do that, what you have to do is create a snapshot of your existing VM. Go to
Compute Engine >> Snapshots and click Create a snapshot.
On the next screen, you’ll just have to name your snapshot and select the disk you want to create a snapshot of. This will of course be, your existing VM’s disk.
After creating the snapshot, you can then select it as the boot disk image to use when creating new VM instances:
This way, you’ll be able to run multiple crawls simultaneously. Now that’s a lot more crawling power!