You may (or may not) have noticed that my archives have really been flushed out over the last few weeks. We lost all of the past entries here and when I came back online here I had to start from scratch again. I was disconcerted at first but then I remembered that Google had been visiting my site (noted here) and I had set my meta robots tag to index and follow links. I did a search for my site (old blog location) and noticed that there were quite a few results (over 1300) and that many of them were cached by Google (the same cache that was recently declared legal). All I had to do then was iterate through the results, visit the cached version of each result and copy the entry into this new installation. It took a lot of work and a lot of time but I think I managed to get about 95% of the entries back again which I’m pretty impressed with. There are still quite a few broken links in those old posts because they refer to other posts in the old blog location so I’ll have to go back and update them at some point but at least the base content is there. I’m currently in the process of importing my old Livejournal entries so that I’ll have all of my blog entries in one place. With Wordpress 2.0 I can backup the blog at any time so I’m going to start doing that just in case something catastrophic goes wrong. Sometimes it’s nice to have search engines crawling what you put online. I also could have used The Wayback Machine but it doesn’t seem to have much from my site. That’s not surprising considering how insignificant this site in the grand scheme of things.
License
This work is published under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Canada License.
Tags: archive.org, archives, blog, boing-boing, chang8ling, google, google-cache, livejournal, wayback-machine

No comments
Comments feed for this article
Trackback link: http://www.theinflux.com/2006/02/01/thanks-to-the-cache/trackback/