This morning I was watching a politics programme on TV which mentioned a particular website and how information had seemingly been removed.
It made me wonder why they didn’t use a couple of basic tools to find an older version of that site.
Here’s a very basic guide to finding “deleted” content on the Internet.
Search Engine Cache
The big three search engines (Google, Yahoo!, Bing) – and probably some others – all store cached versions of pages.
It’s very easy to see if a cached page is available. Just look below the individual search result for either the “Cached” or Cached page” link.
This is most useful when you know the page has changed in the past few days (or even hours).
It’s not easy to determine exactly when the last copy of the page was stored, as search engine spiders vary in their frequency of visit to any web site, but it’s good to use when content has been static for a while and is then removed.
It’s also worth noting that generally only the text is cached. Images are pulled directly from the web page, if still available, otherwise simply show as blank.
If an image is changed but its name remains the same, you will likely see the replacement image, therefore this method can’t be relied upon to view older images.
To search for the latest cached copy of pages in a web site (any pages):
In Google, Yahoo and Bing use the site: prefix (e.g. site:domain.com)
To search for the latest cached copy of a specific URL, you can simply paste that URL into the search box (e.g. domain.com/somepage.htm)
Wayback Machine
The Wayback Machine at archive.org lets you view various snapshots of web sites that are at least six months to a year old.
This is a great resource to use when you are interested in seeing how a web page used to look, even if the site no longer exists.
There are a lot of advanced options, but often it’s enough just to enter the URL of a web site into the search box and see what dates come up.
The web site is a little flaky sometimes, and doesn’t always return results, but it’s a great way of seeing old pages that either no longer exist, or have had a makeover.
So there you go. Two ways to find content that’s tried to shuffle off the web.
A lot of websites block the Wayback Machine. I do for all of my websites. I think there’s also a way to stop Google (and possibly the others) from showing cached page content, but I’m quite happy to allow that to happen. It means if my site is unreachable for some reason when someone’s done a search they can still see the cached version from the SERP.
I should have mentioned that I frequently use the Wayback Machine myself when I’m analyzing whether or not to buy an expired domain to see what was there previously.
Another place to look is http://whois.domaintools.com/ which maintains point-in-time screen captures of websites, although you have to subscribe with a monthly fee to get this service. (But you can do a free 10 day trial and get the screenshot that way).
This also isn’t perfect and you can’t really read the content, but it does mean you can get an idea of what the site was about.
can i find deleated cache pages that have been removed from google