To solve a problem I describe in detail below, I recently created a new site called expandUrl. The expandUrl™ service does a few things very well:
- Tells you the “real” URL of any shortened URL (regardless of what service provides it).
- Gives technical types the redirect codes involved in all the redirects for any particular link.
- Discovers both canonical and shorturl data for the final expanded URL.
- Offers a fast API for quickly finding the expanded URL (and a less fast API for getting the same detailed data that is available from the web interface).
Heard enough? Go check it out:
The expandUrl™ site was built to fix a specific problem I recently discovered on one of my other sites, so you can be confident that expandUrl™ has been tested on a sample set of a couple thousand real-life URLs. However, the total number of hours I’ve spent building expandUrl™ can still be counted on one hand so you are bound to discover some edge cases which I haven’t encountered.
I appreciate all bug reports to be posted below. Security concerns should also be posted here, but I will pre-moderate those until they are fixed. (Not to hide them, but to give me the opportunity to fix them before they are exploited.)
Eating my own dog food
I run a site which aggregates RSS feeds from a few dozen different sources. One of the included sites accidentally let their domain name expire and had to move to a new domain name. This particular site also hosted their RSS feed with Feedburner.
When the site changed domain all the URLs were broken. You may be thinking, “Well, duh. There’s nothing you can do to fix that.” In this scenario however, the original site owner simply changed domain names. All of the old URLs would map perfectly to the new domain name:
If I had stored the actual URLs, I could simply do a REPLACE in the database. However, since I stored the Feedburner URLs in my database, it would be impossible for me to do so. That’s because Feedburner obfuscates the actual URL data:
Actual URL: http://example.com/blog/2007/04/post.html
Feedburner URL: http://feeds.feedburner.com/~r/exampleblog/epRL/~3/745213668/post.html
I quickly built expandUrl so that I could resolve any URL (be it a short url, tracking script, or feed redirect) to the actual URL. I then ran an automated script to determine the relative link on the old domain, replaced the old domain with the new domain, and updated the database accordingly. An hour or so later, and all those posts are now fixed.
I ran this script against the entire database of URLs to go ahead and resolve any other redirects found. I made an interesting discovery. Many of the RSS feed services will generate multiple redirect URLs for the same blog post permalink. My database had a UNIQUE index on the URL field as a means of preventing the same post from showing up multiple times, but it turned out the feed services were inadvertently circumventing that “protection.”
In the end, integrating expandUrl has cleaned over 300 duplicate URLs from my database and helped improve the future maintainability of the link data the site has collected.
If you find expandUrl™ helpful, I’d love to hear about it! :)