How can I sanitize user input with PHP? – Dev

The best answers to the question “How can I sanitize user input with PHP?” in the category Dev.

QUESTION:

Is there a catchall function somewhere that works well for sanitizing user input for SQL injection and XSS attacks, while still allowing certain types of HTML tags?

ANSWER:

Do not try to prevent SQL injection by sanitizing input data.

Instead, do not allow data to be used in creating your SQL code. Use Prepared Statements (i.e. using parameters in a template query) that uses bound variables. It is the only way to be guaranteed against SQL injection.

Please see my website http://bobby-tables.com/ for more about preventing SQL injection.

ANSWER:

It’s a common misconception that user input can be filtered. PHP even has a (now deprecated) “feature”, called magic-quotes, that builds on this idea. It’s nonsense. Forget about filtering (or cleaning, or whatever people call it).

What you should do, to avoid problems, is quite simple: whenever you embed a a piece of data within a foreign code, you must treat it according to the formatting rules of that code. But you must understand that such rules could be too complicated to try to follow them all manually. For example, in SQL, rules for strings, numbers and identifiers are all different. For your convenience, in most cases there is a dedicated tool for such an embedding. For example, when you need to use a PHP variable in the SQL query, you have to use a prepared statement, that will take care of all the proper formatting/treatment.

Another example is HTML: If you embed strings within HTML markup, you must escape it with htmlspecialchars. This means that every single echo or print statement should use htmlspecialchars.

A third example could be shell commands: If you are going to embed strings (such as arguments) to external commands, and call them with exec, then you must use escapeshellcmd and escapeshellarg.

Also, a very compelling example is JSON. The rules are so numerous and complicated that you would never be able to follow them all manually. That’s why you should never ever create a JSON string manually, but always use a dedicated function, json_encode() that will correctly format every bit of data.

And so on and so forth …

The only case where you need to actively filter data, is if you’re accepting preformatted input. For example, if you let your users post HTML markup, that you plan to display on the site. However, you should be wise to avoid this at all cost, since no matter how well you filter it, it will always be a potential security hole.

ANSWER:

PHP has the new nice filter_input functions now, that for instance liberate you from finding ‘the ultimate e-mail regex’ now that there is a built-in FILTER_VALIDATE_EMAIL type


My own filter class (uses JavaScript to highlight faulty fields) can be initiated by either an ajax request or normal form post. (see the example below)
<?
/**
* Pork Formvalidator. validates fields by regexes and can sanitize them. Uses PHP filter_var built-in functions and extra regexes
* @package pork
*/

/**
 *  Pork.FormValidator
 *  Validates arrays or properties by setting up simple arrays. 
 *  Note that some of the regexes are for dutch input!
 *  Example:
 * 
 *  $validations = array('name' => 'anything','email' => 'email','alias' => 'anything','pwd'=>'anything','gsm' => 'phone','birthdate' => 'date');
 *  $required = array('name', 'email', 'alias', 'pwd');
 *  $sanitize = array('alias');
 *
 *  $validator = new FormValidator($validations, $required, $sanitize);
 *                  
 *  if($validator->validate($_POST))
 *  {
 *      $_POST = $validator->sanitize($_POST);
 *      // now do your saving, $_POST has been sanitized.
 *      die($validator->getScript()."<script type="text/javascript">alert('saved changes');</script>");
 *  }
 *  else
 *  {
 *      die($validator->getScript());
 *  }   
 *  
 * To validate just one element:
 * $validated = new FormValidator()->validate('[email protected]', 'email');
 * 
 * To sanitize just one element:
 * $sanitized = new FormValidator()->sanitize('<b>blah</b>', 'string');
 * 
 * @package pork
 * @author SchizoDuckie
 * @copyright SchizoDuckie 2008
 * @version 1.0
 * @access public
 */
class FormValidator
{
    public static $regexes = Array(
            'date' => "^[0-9]{1,2}[-/][0-9]{1,2}[-/][0-9]{4}\$",
            'amount' => "^[-]?[0-9]+\$",
            'number' => "^[-]?[0-9,]+\$",
            'alfanum' => "^[0-9a-zA-Z ,.-_\\s\?\!]+\$",
            'not_empty' => "[a-z0-9A-Z]+",
            'words' => "^[A-Za-z]+[A-Za-z \\s]*\$",
            'phone' => "^[0-9]{10,11}\$",
            'zipcode' => "^[1-9][0-9]{3}[a-zA-Z]{2}\$",
            'plate' => "^([0-9a-zA-Z]{2}[-]){2}[0-9a-zA-Z]{2}\$",
            'price' => "^[0-9.,]*(([.,][-])|([.,][0-9]{2}))?\$",
            '2digitopt' => "^\d+(\,\d{2})?\$",
            '2digitforce' => "^\d+\,\d\d\$",
            'anything' => "^[\d\D]{1,}\$"
    );
    private $validations, $sanatations, $mandatories, $errors, $corrects, $fields;
    

    public function __construct($validations=array(), $mandatories = array(), $sanatations = array())
    {
        $this->validations = $validations;
        $this->sanitations = $sanitations;
        $this->mandatories = $mandatories;
        $this->errors = array();
        $this->corrects = array();
    }

    /**
     * Validates an array of items (if needed) and returns true or false
     *
     */
    public function validate($items)
    {
        $this->fields = $items;
        $havefailures = false;
        foreach($items as $key=>$val)
        {
            if((strlen($val) == 0 || array_search($key, $this->validations) === false) && array_search($key, $this->mandatories) === false) 
            {
                $this->corrects[] = $key;
                continue;
            }
            $result = self::validateItem($val, $this->validations[$key]);
            if($result === false) {
                $havefailures = true;
                $this->addError($key, $this->validations[$key]);
            }
            else
            {
                $this->corrects[] = $key;
            }
        }
    
        return(!$havefailures);
    }

    /**
     *
     *  Adds unvalidated class to thos elements that are not validated. Removes them from classes that are.
     */
    public function getScript() {
        if(!empty($this->errors))
        {
            $errors = array();
            foreach($this->errors as $key=>$val) { $errors[] = "'INPUT[name={$key}]'"; }

            $output="$$(".implode(',', $errors).').addClass("unvalidated");'; 
            $output .= "new FormValidator().showMessage();";
        }
        if(!empty($this->corrects))
        {
            $corrects = array();
            foreach($this->corrects as $key) { $corrects[] = "'INPUT[name={$key}]'"; }
            $output .= '$$('.implode(',', $corrects).').removeClass("unvalidated");';   
        }
        $output = "<script type="text/javascript">{$output} </script>";
        return($output);
    }


    /**
     *
     * Sanitizes an array of items according to the $this->sanitations
     * sanitations will be standard of type string, but can also be specified.
     * For ease of use, this syntax is accepted:
     * $sanitations = array('fieldname', 'otherfieldname'=>'float');
     */
    public function sanitize($items)
    {
        foreach($items as $key=>$val)
        {
            if(array_search($key, $this->sanitations) === false && !array_key_exists($key, $this->sanitations)) continue;
            $items[$key] = self::sanitizeItem($val, $this->validations[$key]);
        }
        return($items);
    }


    /**
     *
     * Adds an error to the errors array.
     */ 
    private function addError($field, $type="string")
    {
        $this->errors[$field] = $type;
    }

    /**
     *
     * Sanitize a single var according to $type.
     * Allows for static calling to allow simple sanitization
     */
    public static function sanitizeItem($var, $type)
    {
        $flags = NULL;
        switch($type)
        {
            case 'url':
                $filter = FILTER_SANITIZE_URL;
            break;
            case 'int':
                $filter = FILTER_SANITIZE_NUMBER_INT;
            break;
            case 'float':
                $filter = FILTER_SANITIZE_NUMBER_FLOAT;
                $flags = FILTER_FLAG_ALLOW_FRACTION | FILTER_FLAG_ALLOW_THOUSAND;
            break;
            case 'email':
                $var = substr($var, 0, 254);
                $filter = FILTER_SANITIZE_EMAIL;
            break;
            case 'string':
            default:
                $filter = FILTER_SANITIZE_STRING;
                $flags = FILTER_FLAG_NO_ENCODE_QUOTES;
            break;
             
        }
        $output = filter_var($var, $filter, $flags);        
        return($output);
    }
    
    /** 
     *
     * Validates a single var according to $type.
     * Allows for static calling to allow simple validation.
     *
     */
    public static function validateItem($var, $type)
    {
        if(array_key_exists($type, self::$regexes))
        {
            $returnval =  filter_var($var, FILTER_VALIDATE_REGEXP, array("options"=> array("regexp"=>'!'.self::$regexes[$type].'!i'))) !== false;
            return($returnval);
        }
        $filter = false;
        switch($type)
        {
            case 'email':
                $var = substr($var, 0, 254);
                $filter = FILTER_VALIDATE_EMAIL;    
            break;
            case 'int':
                $filter = FILTER_VALIDATE_INT;
            break;
            case 'boolean':
                $filter = FILTER_VALIDATE_BOOLEAN;
            break;
            case 'ip':
                $filter = FILTER_VALIDATE_IP;
            break;
            case 'url':
                $filter = FILTER_VALIDATE_URL;
            break;
        }
        return ($filter === false) ? false : filter_var($var, $filter) !== false ? true : false;
    }       
    


}

Of course, keep in mind that you need to do your sql query escaping too depending on what type of db your are using (mysql_real_escape_string() is useless for an sql server for instance). You probably want to handle this automatically at your appropriate application layer like an ORM. Also, as mentioned above: for outputting to html use the other php dedicated functions like htmlspecialchars 😉

For really allowing HTML input with like stripped classes and/or tags depend on one of the dedicated xss validation packages. DO NOT WRITE YOUR OWN REGEXES TO PARSE HTML!

ANSWER:

No. You can’t generically filter data without any context of what it’s for. Sometimes you’d want to take a SQL query as input and sometimes you’d want to take HTML as input.

You need to filter input on a whitelist — ensure that the data matches some specification of what you expect. Then you need to escape it before you use it, depending on the context in which you are using it.

The process of escaping data for SQL – to prevent SQL injection – is very different from the process of escaping data for (X)HTML, to prevent XSS.