Gregg Hilferding


301 Redirects With a Custom 404 Page

For anyone who cares about search engine rankings and has had to move a page or domain from one location to another, the handling of redirects is very important. The wrong type of redirect and any links pointing to the old location don't count towards the contents new location.

If you read most work on "How To Set Up a 301 Redirect" it involves delving into Apache .htaccess files or waving a dead rat over a .NET server while clucking like a chicken. In those odd shared hosting cases where you have no access to modify .htaccess files you may be further screwed into learning about meta redirects and determing just how Yahoo! is going to interpret a 2 second delay versus a 3 second delay.

Here's a method for handling redirects in a way that is simple to manage, requires only PHP and the ability to define a custom error page, and allows you to prepare your redirects before you move the page with a seamless redirect once you have. Even with the ability to control every aspect of my Apache server, I still prefer this method.

How it works

When you move a page to a new location the most important thing is to ensure a user or search engine spider does not receive a "File Not Found" error (hereafter referred to as a "404"). It's important to understand how a request for a non-existant page is handled by the server. Here's a diagram I put together:

How Apache Handles a Request for a Non-Existant Page

If you didn't realize already, it's a simplified diagram of the process. Where in this process do you intercede to prevent a 404 error? Because everything a server must do takes up processing power, it is logical to conserve server resources and intercede as soon as possible. For example, if you could go out and fix all your visitor's bookmarks and search engine listings to point to the new page, you should. Then, no one even would have to worry about 404 pages or dealing with redirects! :)

So, logically, to conserve server resources, the next step in which we can intercede is the checking of the .htaccess file. (For simplicities sake, I won't discuss editing the httpd.conf to place the redirects in the VirtualHost section.) For example, we could put something like this in our .htaccess file:

RedirectMatch 301 /olddirectory/([^.]+)\.html$ http://www.example.com/newdirectory/$1.html

And, that code would work, just as expected! But, once you and your boss and clients learn about the wonder of the 301 redirect, some of the arguments against changing page locations willy nilly will go away and you'll find yourself with .htaccess directives that are dozens of lines long. Which reminds us that Apache parses all the directives it finds for every page request it receives in every directory in which that page may be found. Suddenly our performance savings by using Apache to handle redirects are being whittled away by an excess number of directives.

The solution lies downstream

Scroll back up and bit and take a look at that diagram. When a page isn't found, the web server will serve up an error page. It will even serve a custom error page if configured to do so. How do you configure it do so? It's easy and I'm going to skip ahead a bit and reveal that our custom error page will be a PHP script so I can provide all the .htaccess code you'll need in this one bit of code:

ErrorDocument 404 /404.php

That line added to the main .htaccess file for your entire script is the only and last time you'll need to edit .htaccess to deal with your 301 redirects. All the work occurs in that PHP script.

PHP headers overrule Apache headers

Even though Apache will normally serve a 404 header along with your custom 404 page, you can easily overrule that with a PHP header. So, why note send a 301 header? Once we make this cognitive leap, it's dead simple to put all your redirect code into your custom 404 page and check the request against your URL patterns.

Using a database structure like so:

CREATE TABLE `redirects` ( `primary` bigint(20) unsigned NOT NULL auto_increment, `old_uri` varchar(254) NOT NULL default '', `new_uri` varchar(254) default NULL, `action_type` mediumint(9) default NULL, `domain_match` varchar(254) NOT NULL default '', PRIMARY KEY (`primary`), KEY `old_uri` (`old_uri`,`new_uri`,`action_type`,`domain_match`), FULLTEXT KEY `old_uri_2` (`old_uri`) ) ENGINE=MyISAM;

You can use the following PHP code at the beginning of your custom 404 page:

<?php // include your own DB connection info here $link = mysql_connect('localhost', 'mysql_user', 'mysql_password'); // build a complete URL from server variables $complete_request = 'http'.(($_SERVER['HTTPS']=='on')?'s':'').'://'.$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI']; // Parse out the URI $parsed_request = parse_url($complete_request); // Especially badly formed URLs will throw a false from the parse_url function, we only continue if that is not the case if($parsed_request !== false){ // Check the DB for a specific entry $query = sprintf("SELECT * FROM `redirects` WHERE `domain_match` LIKE '%s' AND `old_uri` LIKE '%s' LIMIT 1", ''.mysql_real_escape_string($_SERVER['SERVER_NAME']).'', ''.mysql_real_escape_string($parsed_request['path']).'%' ); //print($query); // for debugging $result = mysql_query($query); while ($row = mysql_fetch_array($result, MYSQL_ASSOC)) { $redirects_found[] = $row; } // print_r($redirects_found); // for debugging // If there is a match, redirect the user if(count($redirects_found)==1){ // send the right type of redirect if($redirects_found[0]['action_type'] == '302') { header('HTTP/1.1 302 Moved Temporarily'); header('Location: http'.$s.'://'.$_SERVER['HTTP_HOST'].$redirects_found[0]['new_uri'].''); $headers_sent = true; } else { header('HTTP/1.1 301 Moved Permanently'); header('Location: http'.$s.'://'.$_SERVER['HTTP_HOST'].$redirects_found[0]['new_uri'].''); $headers_sent = true; } // You can add additional behaviors for other header codes here } // if(count($redirects_found)==1){ } // if($parsed_request !== false){ // Exit if we have sent headers if ($headers_sent) { die; } // Otherwise, include friendly 404 HTML below :) ?>

With this code in place, you can add entries to the redirects table ahead of removing/moving pages. As soon as your new pages are in place, just rename or delete the old files and all incoming requests (from both users and spiders) will automagically be redirected to the new location.

Published by Gregg Hilferding on November 24th, 2006 at 3:10 pm. Filed under PHP, SEO, Signal, WebmasterNo Comments

Testing MSN Live Search

Dave is doing a bit of testing on an issue with MSN Live Search. It seems only appropriate to link to Matt's post about url canonicalization tips. :)

Published by Gregg Hilferding on November 24th, 2006 at 2:14 pm. Filed under MSN, SearchNo Comments

Googifets

The next item on my intermission list:

33 make all the testimonial on the eite forver in a palce to keep the googifets

I'm not sure what's worse, the horrible spelling or the fact that I know exactly what my boss means by "googifets." :)

Perhaps I'll photoshop up a Google branded Boba Fett to put in the office. :)

Published by Gregg Hilferding on November 22nd, 2006 at 7:24 pm. Filed under NoiseNo Comments

Link Development and Linking Optimization Wrap-Up

This really was one of the best sessions I attended. Rae Hoffman and Roger Montti, both long time WebmasterWorld members and moderators both gave presentations that had more information than I could even keep up with.

Rae Hoffman Rae (who also provides seo consulting services) was kind enough to publicly link to her presentation (Powerpoint File) on Delegating Link Development. That presentation is worth reading through a few times to get all the information. A couple big highlights of her talk for me:

  • Training a link developer is not a light task. But, given the right person with the right training, efficient management and monitoring, they will rapidly do better than outsourcing can do.
  • It is easier to train someone inexperienced with marketing who is familiar with the internet than it is to train someone experienced with marketing but lacking experience using the internet.

And also her list of interview questions for a potential in-house link monkey was quite helpful:

  1. What is your favorite search engine?
  2. What is a blog? A message board? A link?
  3. Three favorite websites?
  4. Do you use IM?
    (And Rae made it clear that IM use is a positive thing as an indicator of computer and internet familiarity.)

She had even more questions in her presentation (linked above) which I won't duplicate here. She also recommended having a computer handy and asking the applicant to perform a specific task on the internet, such as "Can you find me a {Brand} {Model} digital camera that I can actually buy?" to see if they can tell the difference between an e-commerce site and an affiliate/content site.

Joel Lesser Joel Lesser of LinksManager.com spoke mostly about reciprocal linking. This is a topic that I felt (and he confirmed) was mostly taboo in the SEO community. He recognized that full duplex linking schemes with no editorial discretion will cause problems with search engines but also made a case for limited, on-topic reciprocal linking.

Joel, correctly, points out that the nature of the internet allows for the organic growth of reciprocal linking and he says, therefore, the search engines mustn't completely devalue these links. Unfortunately, his proof that recips still have value is that "I can't tell you what the search engines do or don't do." This strikes me as dodging the burden of proof, a logical tactic that sends off "scam alarms" in my head. The whole "no one can prove that recips are bad" struck me as a Russell's teapot approach to something that can actually be independently tested to a reasonable degree of satisfaction.

Ignoring that one aspect of his presentation, however, Joel did bring back to the table the non-search engine benefits of relevant reciprocal links:

  • Cost effective
  • Provides qualified traffic independent of search engines
  • Provides value to your users by connecting them to other relevant resources

Another interesting aspect of his presentation was the idea of alternate forms of publishing reciprocal links. Most of these alternatives blend the links as sidebar resources or as contextual links inside of your content.

Finally, he wrapped up with some link request etiquette.

  • Use link request forms whenever available
  • Don't send out link requests longer than 3 sentences
  • Don't require links be placed on a page with x pagerank.

Roger Montti Roger Montti, who (surprise, surprise) also provides link development services, gave a presentation full of alternative link building ideas. If this blog entry wasn't serving as my own personal notes, I would think twice about posting a lot of his ideas. :)

He discussed things to look for when you are going to be buying a text link (or even a banner link) from a site:

  1. Relevance
  2. No mention of PageRank
  3. No ads for non-relevant sites
  4. Year-long contracts

Smaller magazines which are published offline often have an online presence that is poorly developed. So, they will be happy to sell a banner ad or other link for relatively low dollar amounts.

Buying websites is a good way to accumulate their links and direct that link popularity to your own site. Older sites which are inactive or under-performing are good candidates for a low dollar amount (less than $1,000) buy.

Site of the month/week/day/second sites (and also newsletters) are handy for getting some traffic and links. If they do not permanently archive the links, you may still see the readers of those sites re-publishing links to your site if it's any good. Ideally you would find a "site of the whatever" site that is specifically focused on your niche.

Sponsorships of sites, groups, and events all provide opportunities for static links -- often from .org or .edu domains. Check your competitors backlinks for .org and .edu backlinks to see how they are getting those links.

Although he discussed the creation of satellite informational sites as a means of acquiring inbound links and I agree that method still works, I have anxiety about it's long-term value if you take any shortcuts on quality with those sites.

Published by Gregg Hilferding on November 20th, 2006 at 5:14 pm. Filed under SEO, Webmaster1 Comment