Quicksearch
Calendar
Syndicate This Blog
|
Monday, December 20. 2004phpBB & unserialize bug
As most of you hopefully know, a few days ago PHP 4.3.10 and 5.0.3 were released in response to several vulnerabilities that were discovered. Two of those involved bugs in unserialize function that is used to re-create PHP variables based on an encoded string normally generated by serialize() function. This functionality allows storage & retrieval of PHP variables from outside PHP.
While these two problems are quite serious, they can normally only be exploited locally, meaning that you'd need an account with access to PHP on the server. However, several applications such as phpBB store serialized data inside cookies meaning that anyone accessing those applications will be able to supply their own serialized string. By tinkering with this string it is possible to make an exploit capable of doing things like theft of passwords. In response to this development phpBB developers decided to put the following statement out "This is not a phpBB exploit or problem, it's a PHP issue and thus can affect any PHP script which uses the noted functions". First of all this is not a correct analysis of the situation, the only applications vulnerable are the ones that expose serialized data to the user allowing them to modify it, like phpBB does. Even if the bug in serialized code did not exist, there are still issues with exposing serialized data to the user without validation. There is nothing to prevent someone from generating a very complex data structure that would take long time to parse and use it as a means of launching a resource depletion attack. It also means that by modifying the serialized string it is possible to inject all manner of data into the script which may lead to exploits due to uninitialized variables, etc... Ultimately it comes down to blindly trusting the user with your data and expecting not to get penalized for it. While unserialize is certainly a bug in PHP, the fact it is remotely exploitable is the fault of script writers who do not take the time to properly validate user input. Comments
Display comments as
(Linear | Threaded)
> Ultimately it comes down to blindly trusting the user with your data and
> expecting not to get penalized for it. Well said. ![]()
> Ultimately it comes down to blindly trusting the user with your data and
> expecting not to get penalized for it. Actually it dosen't. How do you validate a serialized string? When the problem is as simple as this: $serialized = 's:30000:"text";'; echo unserialize($serialized); The only way to validate a serialized string as far as I can tell is to unserialize it into an array, and then validate the components of the string (ie the actual data). The serialized format is a way of storing data. The system to unserialize it should then validate the data it is unserializing - which is why this is a PHP bug and has nothing to do with trusting use data. It is no different to a bug that caused addslashes() to break on a certain string.
Actually you can easily validate the data. Since you generated the data via the serialize() function you know what it is, so you can generate a sha1 or md5 hash of it. Before actually deserializing the data you can check that the hash of the data matches the stored hash, which can be found in a database or another non-user accessible location.
You are correct, that would be a way to validate the data - assuming it was necessarily created by PHP, on this server, but I still think it misses the point. There is a difference between validating data - which the PHP script should do, and validating the message format itself, which PHP should do.
Your solution for example might not always work. What if you wanted to serialize data inside javascript and send that to PHP? I believe that was something Harry Fuecks was considering for jspan for example.
Well, if you are using a database then the data can be created else where and you can still store it's hash. PHP cannot possibly validate the serialized string, since it does not know what can be in it. If you tell it to create a 100meg text string, it'll happily do it (unless memory_limit prevents it from doing so).
Javascript can generate md5 and sha1 just as well as PHP, the trick would be how to store the hash securely. Perhaps some intermidiate request passing the hash to the app or encryption of the data preventing tinkering.
While javascript can generate an md5, so can an attacker. They just create the md5 of whatever they want to serialize. The md5 technique only works if you know exactly what data it is you are getting, the point of the js is that you don't.
I understand your point about unserializing 100MB, but this seems to be a bit of a red herring to the question here, we have built in methods proventing that amount of data getting into the system. Despite that, I get the idea of doing a strlen() or something like that on the data - but that still wouldn't help here, a 30 character string can break it. I definitely understand the idea of validation, but it occurs to me that unserialize() is part of the process of that validation. Before now, my understanding is the function would return false on invalid input, just in the same way is_array() would. Perhaps we need an is_serialized() function?
Well, first of all Javascript generation of PHP data is not what phpBB or other apps outlined in the advisory are doing. The serialized data they generate is made by PHP for PHP within the same application.
If md5 does not work, you can use encryption and if serialized data cannot be safely generated via Javascript, pass the data as GET/POST params. Unserialize can only perform syntax validation, where by it checks that it can understand the format listed. That will not prevent you from resource exhaustion attacks where the input in perfectly valid.
Using a hash to check the integrity of message should be done where applicable.
Where this is possible, it would be far wiser to not protect the data in a complex way, like using a full serializer. Rather, encode the data as a simple string and know that its integrity can not be trusted. If you have, say foo=bar,bar=baz,etc. style string, at least checking what it contains poses no security problem because it is at least very easy to securely handle what it contains. The only real way to handle the main problem of storing data securely would be to have a persistent "session", say, database record, and you only hand user a long, say 128-bit, key to the data which you would use verbatim to retrieve the real data from the database server. If user tampers with this ID, the data is only lost. But no other malice may result.
I tend to prefix the serialized string with a HMAC-SHA1.
Thou I do try and avoid using serialize functions for anything serious.
> It also means that by modifying the serialized string it is > possible to inject all manner of data into the script which > may lead to exploits due to uninitialized variables, etc...
This is also not true and you don't differentiate between the data and the format of the data. A serialized string is a data format. The data inside the string needs validating of course - but you can't possibly validate it until you have unserialized it. Arguments like preventing users from submitting serialized data because of possible bad data is just wrong, of course that data needs validating. What it is *not* up to the PHP developer to do is vaidate the message format. A better analogy might be XML-RPC. If you use an XML-RPC function to decode a request, you expect that not to break on a badly formed XML-RPC request and you would not blame the PHP developer for the request being bad. You would however blame the PHP developer for not validating the information in the request. Hence the difference, data format v data.
Yawn, phpbb devs aren't bright with a mistake like that, there is simply no excuse.
It's a basic security flaw giving user modifiable data that is processed. The only thing a session should contain is an opaque key. It's obvious to anyone with 1/2 a brain. Of course it's possible to checksum serialized data. MD5 the string against a locally stored checksum before un-serializing it. It's all very obvious, while yes, the fault is in php. It's expolitability was allowed by security ignorant coders, for this mistake is so very basic it's funny.
Hackie,
Your missing the point. Serialization is a *data format* that needs decoding before it can be validated. The error did not occur because the data was then not validated, the error occred because the inbuild decoding system was broken. Just look at: http://ca.php.net/manual/en/function.serialize.php and the amount of uses there for serialize where the script does *not* know what data to expect. One of the main uses is to pass data as a hidden form variable without having to store it in the database.
Do you understand what you are saying? Do you understand checksumming!?
You checksum the serialised string. If you think you need to de-serlialize data before checksumming perhaps you're over your head in this discussion?
You have to perform some sort of validation before you unserialize(), the fact that a serialised object that implements __wakeup() would cause PHP code to execute, which can cause allsorts of problems if the serialised string has been tampered with.
As I mentioned earlier using a HMAC, you send both the serialised string and HMAC to the client. but when the its POSTed back, verify that the serialised post data still hashes to the same HMAC. Its fairly trivial to get a pure PHP HMAC implementation, so has no external dependancies. If you want to verify the serialised string elsewhere securely then you have to use digital signatures with public keys. But at this point I'd be tempted serialising to XML with XML-Signatures.
Think Ilia is right that an app like phpBB _can_ validate a serialized string _before_ unserializing it, simply by creating a hash from the serialized string and passing that along with the string to the client. On the next request the server can test whether the string was tampered with.
Another problem exists with remote, untrusted, systems which want to use PHP's serialization data format to send data. Why do that? Because, in principle, it's an easy way to talk to PHP, given that PHP already has built in parsers and generators. I put up some notes here about projects using PHP's serialized format (plus problems in using it) - http://jpspan.sourceforge.net/wiki/doku.php?id=php:serialization#implementations . It's also (optionally) used in JPSpan but have given up using it by default, as it's problematic when remotely encoding multibyte characters.
Harry, you can't pass the hash to the client because then an "attacker" could just send back a new hash that matches the new data structure they have sent. The has thing only works if you store that in a database, which partly defeats the point.
Ren, you are right though about _wakeup and that is the first good point re serialization and something I did not realise happened upon unserialization - thats the reason then to not run unserialize on untrusted data. Personally I think this is a major weakness in the function - as I understand it one of its major uses is to pass data from one page to another so you do *not* have to store it in a database for example. For this purpose the function is thus completly useless - and many of the uses on the php.net page are security problems. If this function (which I still consider a messaging format) is insecure like this, perhaps there should be note to that affect on the PHP function page? Hackie, I perfectly understand the idea of checksum - but having to checksum a serialized data string requires the script to know the data it is getting (exactly) which is massivly limits the use of the string. Its like being told to validate an XML-RPC request by doing a md5 on it - well the point of a messaging format is you don't know exactly what you are going to get, but once you decode it there are certain validations you are going to do.
In some situations validating using checksum maybe impossible and those situations require a different mechanism for passing data to avoid security issues.
However, this is NOT the case with most instances where serialized data is exposed to the user. I mean there is really no good reason to store them in cookies or even GET/POST strings instead of database or a file associated with the session.
'Harry, you can't pass the hash to the client because then an "attacker" could just send back a new hash that matches the new data structure they have sent. The has thing only works if you store that in a database, which partly defeats the point.'
Here's a simple way to do it; // Serialize the data structure (whatever it is) $serialized = serialize($dataHere); $uniqueValueOnlyKnownByServer = 'Fido'; $hash = md5($serialzied.$uniqueValueOnlyKnownByServer); You can now send the serialized data and the hash to the client (in say hidden form fields). On the next request they send you them back, and also recreate the hash e.g.; $hash = md5($_POST['serialized'].$uniqueValueOnlyKnownByServer); if ( $hash != $_POST['hash'] ) die('Data has been tampered with'); Because the attacker doesn't know the unique value, he can't create a valid hash.
http_build_query() & parse_str() are probably a better method of serialising, though you do lose the datatype information.
But then you're running parse_str() on user input, which is just as bad as running unserialize on user input, isn't it?
parse_str() happens on unvalidated strings all the time, for population of the $_GET & $_POST arrays, validation is then expected afterwards.
parse_str() doesnt contain any mechanisms for calling PHP code whereas unserialise() does via __wakeup() so in my mind is "better". another advantage is that is trivial to tack-on a checksum to the string built by html_build_query(), and check it before executing parse_str() $str = html_build_query($stuff); $str .= '&checksum='.sha1($secretKey.$str); and then something like... parse_str($str, $arr); isset($arr['checksum']) && $arr['checksum'] == sha1($secrectKey. ...$str-with-checksum-removed.. ) to check if been tampered with.
well I dont think its feasible to require all this md5'ing in order to be able to call unserialize() on an untrusted string. sounds to me like it would make more sense to have an unserialize() function that does work safely with untrusted data (potentially not allowing objects ..).
Yeah, it's good to cripple functions because some people are too goddamned lazy to check user input.
Derick, it is not necessarily about people being lazy - there are certain potential times when validation can't be done; although I agree this thread raises some good suggestions on ways it can be done in some situations.
I do however thing it should be made clear that serialize() is not about creating a data format - this is something the php.net site suggests it is. Clearly this is not the case.
err this is not about crippling functions. the question is how to verify this kind of stuff with minimal effort. it seems to be that validation in this case would be fairly complex since you would have to write a parser (same goes for a simplified eval() that could come in handy to be able to use php code instead of serialize(), but yeah you are going to beat me over the head even more for a var_import() I know). Eitherway it sounds to me like a common enough useage scenario for either unserialize() to support it directly or atleast providing a C method to check things (instead of a userland implementation).
No, you do not need to write a parser, you can use a checksum as writen about above.
Yup, we need an is_serialized() function and ability to prevent objects running _wakeup()
|
Categories
|
Picked up from Derick Rethans about PHP Fud which makes a better summary about this FUD about than I was typing. Everybody who thinks that the Santy.A worm uses one of the security problems addressed in PHP's latest bugfix releases...
Tracked: Dec 24, 15:47
Tracked: Dec 28, 05:04
I have been reading about PHP in the press and a comment from Jan Schneider. I'm thinking the reason why people are assuming that the phpBB exploit would affect projects like Horde is due to them not understanding that the...
Tracked: Dec 30, 16:52