URL Shortener and Base 62 Encoder / Decoder

I'm creating yet another URL shortener script. Why?  Because I'm not keen on over paying for simple scripts -- because, well, I'd be over paying -- and I'm not too keen on using open source scripts -- because the more people using the same script, the more likely people will find exploits in it. So with all that in mind, I set out to create my own script.

One of the main components of URL shortener script is the base 62 encoder / decoder. Base 62 is just another way of counting, which allows us to pack more numbers into a shorter string. In base 10, we count: 1,2,3,...,9,10,11, etc. That's good, everyone knows how to count that, but the down side is, it's not very short. If we do URL shorteners like that, pretty soon, we'll have long URLs because 5 digits will only encompass 99999 links, etc. You might say, hey, 99999 links isn't too bad, but I say why end there? In base 62, we count like this: 1,2,3,...7,8,9,A,B,C,...,X,Y,Z,a,b,c,...x,y,z,10,11,12, etc. Notice by the time we got to "10", we're already at 63? Now, with the same 5 digits, we're looking at 62^5 instead of 10^5 links.

Computers generally don't know how in fancy bases unless you teach it to do so, and teaching MySQL to count like that is way over my head -- hey, cut me some slacks, I'm a PHP guru, not a MySQL guru -- so I'm letting MySQL count in its native auto increment on an unsigned integer. With that, I also get a fancy pool of numbers to work with... Unsigned BIGINT in MySQL goes up to 18446744073709551615 to be exact. That's eighteen quintillion, four hundred and forty six quadrillion, seven hundred and seventy four trillion, seventy three billion, seven hundred and nine million, five hundred and fifty one thousand, and six hundred and fifteen. Or in short: quite-a-mouth-full (see, wasn't that easier?).

Now, we obviously don't expect to house that much links in the URL shortener database, but its always fun for a small challenge. PHP can't count that much in integers (default int size is 4 bytes, and we'd need around 8 to 9 bytes to match that), and floats aren't that accurate when we're doing math (decimal precision issues), so we'll obviously need something better.

In comes BCMath Arbitrary Precision Mathematics. BCMath allows us to work with arbitrary large numbers with arbitrary precision through some nifty black magic API function calls. So anyways, after a few minutes of tinkering and editing, I present you a quick and simple base62 class for you to use in your application!

<?php
/**
 * Base 62 Encoder / Decoder Class
 * (c) Andy Huang, 2009; All rights reserved
 *
 * This code is not distributed under any specific license, 
 * as I do not believe in them, but it is distributed under
 * these terms outlined below:
 * - You may use these code as part of your application, even if it is a commercial product
 * - You may modify these code to suite your application, even if it is a commercial product
 * - You may sell your commercial product derived from these code
 * - You may donate to me if you are some how able to get a hold of me, but that's not required
 * - You may link back to the original article for reference, but do not hotlink the source file
 * - This line is intentionally added to differentiate from LGPL, or other similar licensing terms
 * - You must at all time retain this copyright message and terms in your code
 */
class base62 
{
    static $characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
    static $base = 62;

    public function encode($var) 
    {
        $stack = array();
        while (bccomp($var, 0) != 0)
        {
            $remainder = bcmod($var, self::$base);
            $var = bcdiv( bcsub($var, $remainder), self::$base );

            array_push($stack, self::$characters[$remainder]);
        }

        return implode('', array_reverse($stack));
    }

    public function decode($var) 
    {
        $length = strlen($var);
        $result = 0;
        for($i=0; $i&lt;$length; $i++) 
        {
            $result = bcadd($result, bcmul(self::get_digit($var[$i]), bcpow(self::$base, ($length-($i+1)))));
        }

        return $result;
    }

    private function get_digit($var) 
    {
        if(ereg('[0-9]', $var))
        {
            return (int)(ord($var) - ord('0'));
        }
        else if(ereg('[A-Z]', $var))
        {
            return (int)(ord($var) - ord('A') + 10);
        }
        else if(ereg('[a-z]', $var))
        {
            return (int)(ord($var) - ord('a') + 36);
        }
        else
        {
            return $var;
        }
    }
}

Since it is a static class, you can invoke it directly without having to instantiate it. Simply do this:

require_once("class_base62.php");
base62::encode("100"); // returns "1c";
base62::decode("LygHa16AHYF"); // returns "18446744073709551615";

Happy coding! But please do let me know if you encounter any problems with the script, so I can fix it!

{{ message }}

{{ 'Comments are closed.' | trans }}