301 Redirects With A Custom 404 Page

For anyone who cares about search engine rankings and has had to move a page or domain from one location to another, the handling of redirects is very important. The wrong type of redirect and any links pointing to the old location don’t count towards the contents new location.

If you read most work on “[How To Set Up a 301 Redirect](http://www.webmasterworld.com/forum92/82.htm)” it involves delving into Apache .htaccess files or [waving a dead rat over a .NET server](http://www.webmasterworld.com/forum47/3287.htm) while clucking like a chicken. In those odd shared hosting cases where you have no access to modify .htaccess files you may be further screwed into learning about meta redirects and determing just [how Yahoo! is going to interpret a 2 second delay versus a 3 second delay](http://help.yahoo.com/help/us/ysearch/slurp/slurp-11.html).

Here’s a method for handling redirects in a way that is simple to manage, requires only PHP and the ability to define a custom error page, and allows you to prepare your redirects *before* you move the page with a seamless redirect once you have. Even with the ability to control every aspect of my Apache server, I still prefer this method.

### How it works

When you move a page to a new location the most important thing is to ensure a user or search engine spider does not receive a “File Not Found” error (hereafter referred to as a “404”). It’s important to understand how a request for a non-existant page is handled by the server. Here’s a diagram I put together:

How Apache Handles a Request for a Non-Existant Page

If you didn’t realize already, it’s a simplified diagram of the process. Where in this process do you intercede to prevent a 404 error? Because everything a server must do takes up processing power, it is logical to conserve server resources and intercede as soon as possible. For example, if you could go out and fix all your visitor’s bookmarks and search engine listings to point to the new page, you should. Then, no one even would have to worry about 404 pages or dealing with redirects! :)

So, logically, to conserve server resources, the next step in which we can intercede is the checking of the .htaccess file. (For simplicities sake, I won’t discuss editing the httpd.conf to place the redirects in the VirtualHost section.) For example, we could put something like this in our .htaccess file:

RedirectMatch 301 /olddirectory/([^.]+)\.html$ http://www.example.com/newdirectory/$1.html

And, that code would work, just as expected! But, once you _and your boss and clients_ learn about the wonder of the 301 redirect, some of the arguments against changing page locations willy nilly will go away and you’ll find yourself with .htaccess directives that are dozens of lines long. Which reminds us that *Apache parses all the directives it finds* for *every page request* it receives in *every directory in which that page may be found*. Suddenly our performance savings by using Apache to handle redirects are being whittled away by an excess number of directives.

### The solution lies downstream

Scroll back up and bit and take a look at that diagram. When a page isn’t found, the web server will serve up an error page. It will even serve a custom error page if configured to do so. How do you configure it do so? It’s easy and I’m going to skip ahead a bit and reveal that our custom error page will be a PHP script so I can provide all the .htaccess code you’ll need in this one bit of code:

ErrorDocument 404 /404.php

That line added to the main .htaccess file for your entire script is the only and last time you’ll need to edit .htaccess to deal with your 301 redirects. All the work occurs in that PHP script.

### PHP headers overrule Apache headers

Even though Apache will normally serve a 404 header along with your custom 404 page, you can easily overrule that with a [PHP header](http://php.net/header). So, why note send a 301 header? Once we make this cognitive leap, it’s dead simple to put all your redirect code into your custom 404 page and check the request against your URL patterns.

Using a database structure like so:

CREATE TABLE `redirects` (
`primary` bigint(20) unsigned NOT NULL auto_increment,
`old_uri` varchar(254) NOT NULL default ”,
`new_uri` varchar(254) default NULL,
`action_type` mediumint(9) default NULL,
`domain_match` varchar(254) NOT NULL default ”,
PRIMARY KEY (`primary`),
KEY `old_uri` (`old_uri`,`new_uri`,`action_type`,`domain_match`),
FULLTEXT KEY `old_uri_2` (`old_uri`)
) ENGINE=MyISAM;

You can use the following PHP code at the beginning of your custom 404 page:

<?php

// include your own DB connection info here
$link = mysql_connect(‘localhost’, ‘mysql_user’, ‘mysql_password’);

// build a complete URL from server variables
$complete_request = ‘http’.(($_SERVER[‘HTTPS’]==’on’)?’s’:”).’://’.$_SERVER[‘HTTP_HOST’].$_SERVER[‘REQUEST_URI’];

// Parse out the URI
$parsed_request = parse_url($complete_request);

// Especially badly formed URLs will throw a false from the parse_url function, we only continue if that is not the case
if($parsed_request !== false){

// Check the DB for a specific entry
$query = sprintf(“SELECT * FROM `redirects` WHERE `domain_match` LIKE ‘%s’ AND `old_uri` LIKE ‘%s’ LIMIT 1″,
”.mysql_real_escape_string($_SERVER[‘SERVER_NAME’]).”,
”.mysql_real_escape_string($parsed_request[‘path’]).’%’
);
//print($query); // for debugging

$result = mysql_query($query);
while ($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
$redirects_found[] = $row;
}
// print_r($redirects_found); // for debugging

// If there is a match, redirect the user
if(count($redirects_found)==1){

// send the right type of redirect
if($redirects_found[0][‘action_type’] == ‘302’) {
header(‘HTTP/1.1 302 Moved Temporarily’);
header(‘Location: http’.$s.’://’.$_SERVER[‘HTTP_HOST’].$redirects_found[0][‘new_uri’].”);
$headers_sent = true;
} else {
header(‘HTTP/1.1 301 Moved Permanently’);
header(‘Location: http’.$s.’://’.$_SERVER[‘HTTP_HOST’].$redirects_found[0][‘new_uri’].”);
$headers_sent = true;
}
// You can add additional behaviors for other header codes here

} // if(count($redirects_found)==1){

} // if($parsed_request !== false){

// Exit if we have sent headers
if ($headers_sent) {
die;
}

// Otherwise, include friendly 404 HTML below :)
?>

With this code in place, you can add entries to the `redirects` table ahead of removing/moving pages. As soon as your new pages are in place, just rename or delete the old files and all incoming requests (from both users and spiders) will automagically be redirected to the new location.