run screaming frog on google compute engine

How To Run Screaming Frog SEO Spider on Google Cloud Platform for FREE


The Screaming Frog SEO Spider software is no doubt, a tool every SEO should have and should know how to use. It is very powerful and very fast at what it does, which is basically all about crawling websites.

However, if you’ve been using it for a while, you will have noticed its drawbacks also. A couple of things that I don’t like when using the software are:

  1. It’s a bandwidth hog

    When I’m running a crawl and I configure it with a higher speed setting, it eats up my bandwidth and my normal browsing experience is greatly affected. So I can’t really work effectively on other things while the crawler is running.

  2. It’s limited by the RAM my computer has

    The number of URL’s on a website that it can crawl is directly limited by the amount of RAM that your computer has. If you’re going to crawl a large site and you only have 8GB of RAM like I do, ~200,000 URL’s is about all you can do.. usually even less, if you’re also using your computer for doing other stuff at the same time. So obviously, you can’t crawl and study big sites like say, NYTimes.

So to get rid of the above annoyances and also increase the power of Screaming Frog at the same time, the logical thing to do is to run it on a VPS.

I first considered running it on a Windows VPS. But the problems with this are that VPS’s cost much more if you want Windows on them; and running Windows itself already takes away a huge portion of the virtual machine’s resources.

On the other hand, GNU-Linux OS’s don’t cost anything if you use them on your virtual machines. And they are generally lighter on system resources. The only problem is they can be daunting to use at first.

Then there’s Google Cloud which offers 60-day free trial on their cloud services. They let you deploy Virtual Machines which you can use to run any app you want. This is perfect for running Screaming Frog. But when I first thought of this, the big problem that I faced was how to go about doing it.

The only resource that I found by googling about it, is this outdated instructions post. It was from last year and there have been a few big changes to Google Cloud’s platform that made the instructions on that post obsolete and confusing (at least for me). Nevertheless, it was a great help and it pointed me to the right direction.

Here’s what I ended up doing:

  1. Get a Google Cloud Account

    Go to Google Cloud Platform‘s page and signup for a free trial. You have to sign in with your Google account and enter your credit card info.

    They won’t charge you anything yet, the credit card info is just for ensuring you’re a real person signing up and not just spam. You also won’t be automatically charge at the end of the trial period. They made it quite clear that they will explicitly ask for your permission first before actually charging your card. Your virtual machines will just be paused when the trial expires. They won’t even be deleted until 30 days after.

  2. Create a Project

    Right after signing up for your trial, you should be taken to your developer console where you can create a new project.

    Click on “Create Project” to create a new project to be used for this purpose.

    google developer console

    You’ll have to give your project a name and a globally-unique project ID.

    google developer console - new project

  3. Enable Cloud Compute API

    When the project has been created, you will be taken to the project dashboard. You will have to go the APIs section and enable the Cloud Compute API.

    enable google cloud compute engine

    Then click “Enable API” on the next page:

    enable api

  4. Create a VM Instance

    From the left menu bar, go to Compute >> Compute Engine >> VM Instances. It’s going to take a few seconds for initializing the Compute Engine if it’s the first time. Then you’ll see this screen:

    VM instances

    Click “Create Instance” to create your first Virtual Machine. You’ll then have to give it a name, choose a datacenter location (Zone), the machine type and the disk image to use.

    create VM instance

    For the machine type, I would pick n1-highmem-2 because it’s the machine type available to trial users which has the biggest RAM. Screaming Frog is again, heavy on RAM usage but not so much on CPU, so this machine type is just right for this purpose.

    For the boot disk image, choose the debian-7-wheezy variant of GNU-Linux. This is the one I used for this guide.

    For the Firewall settings, it probably doesn’t matter because we won’t be using this VM to serve webpages, but just tick both “Allow http traffic” and “Allow https traffic”.

    For the additional options at the bottom, you can just leave them at their default values. Then click “Create” to deploy the new VM. You’ll have to wait for a few seconds as the new VM is being created and deployed. Afterwards, you should see something like this:

    new VM instance deployed

  5. Install a VNC Server on the VM

    In order to run Screaming Frog effectively and easily on this VM, we need to have a graphical user interface through which we can operate the VM remotely, from our computer. There are other solutions to this but in this guide, I’ll use a VNC software called TightVNC

    To install VNC, you’ll have to SSH into your new VM and use the command line. There are also other ways to do this but the easiest way for now is by using the browser-based SSH console that Google Cloud provides:

    VM browser SSH

    This opens up an SSH console you can use to install software on your VM.

    browser ssh console

    Now, to install TightVNC server, we need to switch to the root user first. Enter the following command:

    sudo -s

    Then update the software packages in the system before actually installing anything:

    apt-get update

    And here’s to actually install the TightVNC server and the other programs we need for it:

    apt-get install tightvncserver xfce4 xfce4-goodies xdg-utils openjdk-6-jre software-properties-common python-software-properties

    It may take a few minutes to install all of them. When prompted for a keyboard configuration, you can just choose the default. (use the Space bar on your keyboard to make a selection and Tab to navigate to the “OK” option and Enter to continue)

    Afterwards, we need to add a new user named vnc. Execute the following command:

    adduser vnc

    You will prompted for a password, enter up to 8 characters for this. Then you can just skip the other values by hitting Enter. Lastly, type Y and press Enter to confirm.

    Now you need to switch to the new user:

    su vnc

    Then set a new password for the user. This will be what we will use when we connect to the VM later on. You can just use the same password you used when creating this user.

    vncpasswd

    When prompted whether you want to enter a “view-only” password, you can just select no (N). When done, switch back to the root user by entering the following command:

    exit
  6. Set Up Startup Scripts

    We need to setup startup scripts that will run the VNC server automatically every time the VM is powered on or restarted. Use the following command to download the first script:

    wget https://filiwiese.com/files/vncserver -O /etc/init.d/vncserver

    And then the second one:

    wget https://filiwiese.com/files/xstartup -O /home/vnc/.vnc/xstartup

    You need to apply some configuration settings to make the VNC server work properly. Enter the following commands:

    chown -R vnc. /home/vnc/.vnc && chmod +x /home/vnc/.vnc/xstartup
    sed -i 's/allowed_users.*/allowed_users=anybody/g' /etc/X11/Xwrapper.config
    chmod +x /etc/init.d/vncserver

    Set the size of the screen you want when connecting to your VM via VNC later on. Open the VNC Server script using the following command:

    nano /etc/init.d/vncserver

    Then change the “GEOMETRY” value into the resolution you want:

    vnc screen resolution

    Press CTRL + X on your keyboard when done, to close the file. Type “Y” to save the changes when prompted and hit Enter.

    Then you’ll have to reboot the VM to apply the changes:

    reboot

    By doing this, you will lose your SSH connection to the VM instance. Just wait for a minute or two for the VM to restart and then reconnect via SSH by clicking the SSH button again:

    VM browser SSH

    Then switch to the root user again:

    sudo -s

    Then you can start the VNC service by executing the following commands:

    update-rc.d vncserver defaults
    service vncserver start

    And that lets you connect to your VM instance using any client VNC software. (More on this below…)

  7. Install Screaming Frog

    Now, on to actually installing Screaming Frog. First, we need Java. Install Java on your VM using the following commands:

    echo "deb https://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee /etc/apt/sources.list.d/webupd8team-java.list
    echo "deb-src https://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list
    apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886
    apt-get update
    apt-get install oracle-java8-installer

    You will be prompted to accept Oracle’s License Agreement. Just accept it to proceed with the installation (use arrow keys to select “Yes”).

    Then set Oracle-Java as the default Java library for your system:

    apt-get install oracle-java8-set-default

    We also need the “ttf-mscorefonts-installer” installer before we can install the latest version of Screaming Frog. Use the following command:

    add-apt-repository "deb https://http.debian.net/debian wheezy main contrib non-free" && apt-get update && apt-get install ttf-mscorefonts-installer

    Now we can install Screaming Frog. First, you have to download the latest version from Screaming Frog’s website:

    wget https://www.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_3.3_all.deb

    To check for the latest version of screaming frog, visit their download page.

    Then install:

    dpkg -i screamingfrogseospider_3.3_all.deb

    An error may be thrown because of a missing dependency called “zenity”. You can fix that by the following command:

    apt-get -f install

    After that, Screaming Frog should now be installed on your VM!

  8. Install VNC Client on your Computer

    To connect to your VM and run Screaming Frog there, you need to install a VNC client software on your own computer. You can just use TightVNC for this also. If you are using Windows, download the installer from their website. Be sure to download and install the appropriate version for your system. Use the 64-bit version if you’re using a 64-bit system, and likewise for 32-bit.

    When you install TightVNC, you don’t have to install the server software, just the client part (TightVNC Viewer).

    If you are using a GNU-Linux system, you can use RealVNC or Remmina. If you have a Mac, RealVNC has a version for you too.

    In my example here, I’ll be using TightVNC.

  9. Allow Connection to the VM via the TightVNC Port

    After installing a VNC client on your computer, you have to open a port on your VM where the VNC server is listening. The default port used by TightVNC is port tcp:5900.

    To open it, go back to your Google Cloud developer console. Inside your project, go to Compute >> Compute Engine >> Networks. Under Networks, click on default as shown below:

    network settings

    On the next page, click “Add firewall rule“:

    add firewall rule

    Give your new firewall rule a name. For the Source filter, select IP ranges. and in the Source IP Ranges field, enter “0.0.0.0/0“. In the Allowed protocols or ports field, enter “tcp:5900; tcp:5901; tcp:5902“.

    VNC firewall rule

    Click Create and that should allow your VNC client software to connect to your Google Cloud VM!

  10. Connect to Your VM From Your Computer

    To connect to your VM instance, you’ll need the external IP address of your VM. You’ll find it inside your Google Cloud developer console by going to Your Project >> Compute >> Compute Engine >> VM Instances:

    VM external IP

    Open the TightVNC Viewer that you have installed in the previous steps above and in the “Remote Host” field, enter the IP address of your VM followed by two colons and the tcp port 5901, like so:

    tightVNC viewer

    Click “Connect“. It will then attempt to connect to your VM instance and when it establishes a connection, it asks for the password. This is the password that you have set in the above steps when you created the VNC user.

    vnc password

    Then you should be taken to your XFCE homescreen on your VM! From there, you can then start Screaming Frog. You can find it under Applications Menu >> Internet:

    screaming frog

  11. Change Screaming Frog’s RAM Setting

    After opening Screaming Frog for the first time via VNC, a configuration file is created inside the vnc user folder on your VM. It’s in this file where you can change the amount of RAM you want to let Screaming Frog use.

    Go back to your SSH console and while using the root user profile, open the Screaming Frog config file by the following command:

    nano /home/vnc/.screamingfrogseospider

    The VM that we’ve set up here in this guide, has 13GB of RAM. If you’re not going to run any other apps on this machine, you can allow Screaming Frog to use up to 11GB:

    screaming frog config

    When you’re done, press CTRL + X on your keyboard to close the nano editor and type “Y” to save the changes. Then restart Screaming Frog via your VNC display to apply the new setting.

Multiply the Crawling Power by Deploying More Instances

You can clone your existing instance in order to have multiple instances with all of the same settings that we’ve configured above, without actually doing them again.

To do that, what you have to do is create a snapshot of your existing VM. Go to Compute Engine >> Snapshots and click Create a snapshot.

create snapshot

On the next screen, you’ll just have to name your snapshot and select the disk you want to create a snapshot of. This will of course be, your existing VM’s disk.

After creating the snapshot, you can then select it as the boot disk image to use when creating new VM instances:

snapshot boot disk

This way, you’ll be able to run multiple crawls simultaneously. Now that’s a lot more crawling power!

Nimrod

People don't really know what they want until they learn about it. Marketing educates people about what they want. I'm a marketer, therefore I'm an educator.
  • matthew22929

    a nice tip, too advanced for me, but that is some serious skills.

    • nah, that’s just the product of a little research. :) but thanks.

      • Malin Civelek

        Hi Nimrod, hope you’re still running this blog :) I have a question regarding to this system. thanks

  • Leigh McKenzei

    Hey Nimrod, I got all the way into the VNC desktop but Screaming Frog wasn’t in the Internet menu. Any suggestions on where I can go back and get this fixed?

    Other than that — great list of instructions! Very much appreciated.

  • Malin Civelek

    Awesome guide, thanks a lot! Right now I did everything you showed in this post even though there were some slightly changes in interface it was so easy to set this whole thing up that quickly. Just have one question, didn’t get the last part. Do you mean we can create another instance like “instance2” if so do I need to go through the same steps in SSH to build that up? Thanks a lot

    • Malin Civelek

      ok figured it out, it’s extremely easy.thanks again!!

  • Malin Civelek

    Hi Nimrod, I built the entire system the only problem is I can’t copy the downloaded file to my computer. Have any suggestion on that? I’m not a tech person so couldn’t figure out by my own. Thanks for anyway!

    • What I do is I use the browser in the virtual server and go online to services like Google Drive or OneDrive and upload the files there. Then I can download them to my computer from the cloud storage.

  • Very helpful guide Nimrod! Thanks for sharing I was wondering, does this have any limitations in terms of its features since it’s free?

  • Do you have an updated guide for installing on Debian 8?

Menu