Search engines have for a long time been a good helper of people trying to find sensitive information or vulnerabilities on the web. When you have a few billion documents indexed, it is inevitable some things that should remain private inadvertainly end up in public directories and get indexed, then its just a matter of writing a sufficiently creative search query to find that data.
There are even sites that aggregate "interesting" search queries designed to quickly locate sensetive data such as Google Hacking Database from "Johny" that has queries to find everything from old vulnerable software to credit card numbers, etc...
There have also been attempts to identify things like SQL injection and XSS by locating sites collecting common form of input and then checking to see if said input is not validated. A good example of this can be found on Michael Sutton's blog, who used Google to generate statistics to identify the frequency of SQL injections.
But this approach is does not really show you the full extent of the exploit, just indicates presence of SQL injection, which can then be explored further mostly through trial an error. Well, no more, thanks to Pierre I've discovered a Google's lab project called "Code Search", which as the name suggests indexes publicly available source code. Meaning that now not only can you easily find exploits, but also get the full context of the code allowing for a much nastier exploitation. Let's give it a shot
To start things off lets look at our common friend, XSS (Cross Site Scripting):
The search above will find all instances of code where PHP input variables collected from un-trusted sources such as GET/POST/COOKIE/REQUEST are output to screen via echo or print. A fairly good trawl, about 17,500 results, mind you not every site is vulnerable since they could be filtering the data and keeping it within the super-global, however a brief spot check shows that more the 1/2 the code bits found do not actually do that.
Next lets take a glance at another old favorite, SQL injection:
In this instance we use the same un-safe input sources as parameters, to SQL queries, which often are executed via functions ending with "query", of course you could search for something more specific like mysql_query() to focus on MySQL users. Another mod could be to take commonly used database wrappers like PEAR DB and ADODB and look for their query execution. However, even this simplistic search shows about 3,000 results.
Perhaps one of the more dangerous security exploits is remote code execution, let check it out:
Wow!!! I don't know wether to be scared or impressed by the fact that there are nearly 14,000 results for what amounts to remote shell in dozens of pieces of software. I can only hope that people running this code have disable allow_url_fopen, otherwise they better do it quick. The only silver lining is that it would appear far fewer people are willing to trust eval blindly and searches for eval([user_input]) do not reveal a significant number of results: lang:php \s+eval\s*\(\s*\$_(GET|POST|COOKIE|REQUEST) only 4 entries were found.
We could of course also try searching for preg_replace() with /e flag exploits, but that would require a far trickier regex then I want to write
(Additional "fun" queries)
One more common mistake is to include user input inside header() calls there by allowing header injection, cache poisoning and other fun attacks, let's check for frequent those problems are. First I did a search for code where user input when it appears inside the redirect headers, probably one of the most common instances where injection is possible.
Not bad, 2,000 hits, fortunately sending of \n is no longer possible with new versions of PHP, which does reduce the amount of damage that can be caused, you can no longer inject arbitrary headers into most of the code found. However there are plenty of gems such as this:
There are also plenty of other things we could search for like use of $_SERVER values such as PHP_SELF, PATH_INFO, HTTP_USER_AGENT, QUERY_STRING and many others. I am sure you can craft your own google queries to find those. You can also search for old style input sources such as HTTP_GET_VARS and a-like, so amount of interesting queries as endless. And let's not forget about $_FILES, that can also be misused if passed around without proper validation.
So what does it all mean?
Well for one developers now have another tool (if grep is not to your liking) to examine their code for common mistakes, which hopefully translates to safer code for all. It also means that it is now easier to locate common mistakes in indexed code, so you may want to think twice before putting your code online or at least in a manner visible to search engines if you are unsure about its security worthiness.
I can only hope that the presence of this tool will make developers pay closer attention to the security of their application because with Google (and eventually probably Yahoo as well) on their case there is no where to hide .
[?????]Google Code Search???????
???????Google Code Search?????????????????? Stephen de Vries sent an email to SecurityFocus?s web application security mailing list earlier today to comment on the new Google Code Search: Google?s code search provides an easy way
Weblog: Do You PHP ??? Tracked: Oct 06, 00:03
Why PHP Programmers are Leaving
Recent web survey report shows us a surprising fact about PHP programmers deciding to leave PHP for Ruby on Rails.
Of course this survey isn’t necessarily the reason to claim the quality of the programming language it represents. I even convince...
Ilia, what do you think about bad programming style code samples in the PHP documentation?
For example, see page http://www.php.net/manual/en/faq.html.php (Chapter 56. PHP and HTML).
In last listing author outputs unverified data directly to html ($_GET['width'], $_GET['height'], $_SERVER['QUERY_STRING']) - this is source for 17,500 results with "our common friend, XSS".
Please make documentation bug reports about those issues and someone will make the necessary corrections to the manual. The manual contains many old examples, written before security was such an important issues and there are less then ideal code samples present.