Wednesday, October 25. 2006
As you may know PHP 5.2.0 will feature a very capable filtering extension that can be used to easily validate your input via a number of rules which you can find here. What I am interested in hearing is are there any other common types of data collected by PHP forms that would be worth while adding filters for into the extension. My own suggestions would be the phone (US/EU formats) and postal/zip code validators.
So let's hear what you have to say
Brief Disclaimer: Consider this an RFC of sorts, the suggestions if widely supported may not get integrated and any additions will need to have the implicit agreement of all the filter extension developers before being added.
Display comments as (Linear | Threaded)
Something that I do not fully understand is, why there are constants like INPUT_(POST|GET|ENV...) defined as they are just another equivalent to the Superglobals. Are they faster or why do they make sense?
It would also be useful to be able to filter http uploads. e.g. FILTER_UPLOAD_IMAGE_JPEG,
Determining type of file reliable is fairly tricky and is nearly impossible to do without something like the fileinfo (http://pecl.php.net/Fileinfo) extension.
The same thing goes for the executable flag, what makes the file "executable" in the upload context? #!/path/to/bin, .exe, .com, .dll extension?
I don't think phone or postal zip codes should be have specific validation routines, the user should simply write a regex for them instead.
I second this. There are numerous non-standard ways to format a phone number, and usually no customer for whom I have written an application used exactly the same format as the one before.
Even zip codes can change - granted, this happened only twice in the past century in Germany, I guess, but still... imagine if PHP and the filter extension had been around back then and all applications were using it to filter zips... you would have had to make a new release specifically for Germany, and rely on all ISPs upgrading it in a timely fashion.... a nightmare
Both phone a and zip filers are a bit tricky as they would require country flags since different countries have different schemas.
The only reason why I thought it maybe a good idea since looking @ my past application those two values were often collected and required some basic validation. So it only made sense to allow PHP to do the work for me.
Instead of providing any specialized filters it should be possible to add customized filters. This may be simple callbacks, and the callbacks written by the users must return a result which is defined by the filter extension...
The filter extension already includes support for a callback filter, where a user defined function can be used to validate the data.
I would suggest to either put them in the wiki (for the most popular) or to open an issue in the pecl.php.net bugs system.
It will be then easier to discuss each of them and associate some deadlines, patches or other docs.
Perhaps date/time format validation? - however there are a dozen ways of asking for that - just like zips and phone numbers.
Perhaps a filter for a string consisting of numerics + whitespace?
This is an interesting idea that would not take much to implement if we define a valid date as:
A value that when passed to strtotime() function returns a valid unix timestamp.
it would be great if it is possible to pass a date format string that use the same parameters as date() uses.
That would be very flexible and still easy to use.
This is a bit too complex IMO for a filter and will require a fairly BIG filter. To avoid errors in a critical component such as filter you want to keep things as simple as possible.
might be a reason not to do that.
Let's say my app concerns itself with
information of historical (pre UNIX epoch) data.
I would like to have some OWASP filters put in for input validation.
Check list here:
Many of those filters are already implemented, but without reliance on regex. Such as the URL and E-mail filters.
Their other filters don't make a whole lot of sense to me such as the password filter for example. I see no reason to deny users the ability to use non-alphanumeric characters in their passwords.
Maybe this is way too generalized but what about basic word filtering? Maybe I'm not reading the livedocs closely enough but it'd be nice to allow for filtering of bad content (profanity, etc). I know the filtering mechanism is really as much about security than anything else but seems like having basic word filters would be useful.
I realize creating a custom filter and using the callback is possible but the parsing and string replacement would be nicer under the speed of C and given how many CMS/Wiki'ish things there are in PHP I think this would be a welcome addition.
Just thinking out loud, here.
My opinion is that "bad word" filters are notoriously unreliable there are so many ways to bypass them via the use of spaces, l33t speak, phonetic spelling and so on.
I hear you, I'm not suggesting they are reliable but reality is people try to mimick this sort of stuff in code. So given the shortcomings it'd be nice to something to at least try to facilitate this.
Obviously not my call but food for thought. Thanks for the reply.
How about if the Validate URL filter had an option to compare the input URL against $_SERVER["SERVER_NAME"] to see if the address local.
Maybe a check for a "." in the path to prevent XSS attacks like this
http://example.com/sample.php/xss%20crap. I don't know how good that will be since "."'s in directory names are valid.
Also if filter_data() could support other charsets like UTF-8, that'd be great. It looks like you intend to already, but that's what's preventing me from using the filter extension at the moment.
Filter will definitely support charsets, although a full blown implementation of this feature is unlikely until PHP6.
A short summary and some filters are already in the TODO:
trim (left/right/custom using php's trim options): Planed for 0.12.0, patch ready
date: Planed for 0.12.0, patch ready
custom charset: planed, no deadline. Discussions about input charset detection and other funny things.
I will put these somewhere in Lukas Wiki asap (and in the filter TODO in cvs), so we (at least me) don't have to read this entry for the next 6 months
Ilia, this extension is so hard to use! Even VBScript is easier to test an integer than this:
$foo = filter_input(INPUT_REQUEST, 'someInt', FILTER_VALIDATE_INT, 5, 10);
if($foo === false)
throw new Exception();
I want to write.
$foo = filter_get_int($_REQUEST['someInt'], 5, 10);
and that should throw the exception for me. If you don't streamline this
process, very few people will want to use your extension.
- Something like FILTER_VALIDATE_ENUM or FILTER_VALIDATE_SET where you can check a value against a list of allow values. Not sure how that would
- Is it possible to have the filter declare an input key as required (i.e. if it's missing, the complete input is invalid)?
I was thinking to add ENUM but a callback is more flexible, or simply:
if (array_diff($input, $set)) is just as fast. However, if many users like to have it, it is easy to restore (I should still have a patch somewhere
About requiring a key, you mean something like ENUM + required key? ENUM + required keyS? what's using in_array?
> About requiring a key, you mean something like ENUM + required key? ENUM + required keyS? what's using in_array?
I way have completely misunderstood so bear with me.
Let's say you're dealing with a GET request and there's a query variable like ?foo=123
In your filter spec (if I've got it right) you might use FILTER_VALIDATE_INT to make sure foo is an integer - something like;
$fspec = array('foo'=>FILTER_VALIDATE_INT);
But - if I've understood right - it doesn't guarantee you that 'foo' actually exists, just that if it does exist, it must be an INT.
Thinking it would be nice to have something like FILTER_VALIDATE_REQD to say the variable must exist in the input e.g.
$fspec = array(
I'd assumed filter_has_var() is currently intended for this purpose
If 'foo' does not exist, the return value or the array element (filter_input_array) will be set to NULL.
If 'foo' is not valid (not an integer), the return value or the array element (filter_input_array) will be set to FALSE.
What about parameter with default value? if value is missing?
$page = filter_input(INPUT_GET, 'page', FILTER_SANITIZE_SPECIAL_CHARS, $options);
Sanitize string/stripped filters do it. It does not remove the contents of the tags though, it could be a new option.
Something I always use myself is something that will validate a list:
- checking that a list is a list of ints only
- checking a list against a list of allowable values
"- checking that a list is a list of ints only"
already supported, you can ask/allow array of values.
"- checking a list against a list of allowable values"
check again an enumaration can be handy, let see if more people like to have that
Similar to already suggested, a html filter which will allow user to set which html tags + attributes through after filtering them for any xss. Think the string filters convert < to & so can't be used in situations where html is wanted, like comments/msg board.
Already implemented, use FILTER_SANITIZE_ENCODED.
How about a flag for FILTER_VALIDATE_STRING that allows the speciying of the minimun and maximum lengths of the string: eg..
array("filter"=>FILTER_VALIDATE_STRING, "flags"=>FILTER_NULL_ON_FAILURE, "options"=>array("min_length"=>2, "max_length"=>100)
This would mimic the functionality of two calls to strlen() in a callback. Just a thought.
How about a whitelist approach also. Where-by coders can specify and array of characters they would like to include. I know this can be done with the regex filter, but why not something like a flag to ALLOW_WHITE_LIST=>array('&', '#')
This might be handy for those times when validating you need to do something like #[0-9] or [A-z]*@, or whenever you need to include an odd character or two.
Just a thought