This morning I was watching a politics programme on TV which mentioned a particular website and how information had seemingly been removed.
It made me wonder why they didn’t use a couple of basic tools to find an older version of that site.
Here’s a very basic guide to finding “deleted” content on the Internet.
Search Engine Cache
The big three search engines (Google, Yahoo!, Bing) – and probably some others – all store cached versions of pages.
It’s very easy to see if a cached page is available. Just look below the individual search result for either the “Cached” or Cached page” link.
This is most useful when you know the page has changed in the past few days (or even hours).
It’s not easy to determine exactly when the last copy of the page was stored, as search engine spiders vary in their frequency of visit to any web site, but it’s good to use when content has been static for a while and is then removed.
It’s also worth noting that generally only the text is cached. Images are pulled directly from the web page, if still available, otherwise simply show as blank.
If an image is changed but its name remains the same, you will likely see the replacement image, therefore this method can’t be relied upon to view older images.
To search for the latest cached copy of pages in a web site (any pages):
In Google, Yahoo and Bing use the site: prefix (e.g. site:domain.com)
To search for the latest cached copy of a specific URL, you can simply paste that URL into the search box (e.g. domain.com/somepage.htm)
The Wayback Machine at archive.org lets you view various snapshots of web sites that are at least six months to a year old.
This is a great resource to use when you are interested in seeing how a web page used to look, even if the site no longer exists.
There are a lot of advanced options, but often it’s enough just to enter the URL of a web site into the search box and see what dates come up.
The web site is a little flaky sometimes, and doesn’t always return results, but it’s a great way of seeing old pages that either no longer exist, or have had a makeover.
So there you go. Two ways to find content that’s tried to shuffle off the web.