There are many instances where you may want to see what kind of PHP settings other people are using and what better source of this information then the phpinfo() page.
The problem with finding a reliable pool of such pages is that basic search often contaisn many blog, forum, bugs.php.net and alike entries which area copy & paste outputs from users. This maybe fine in some instances, but what if you just want the real phpinfo() pages. The answer is surprisingly simple.
To get the data you need to simply need to search for a element always present on the phpinfo() page such as the "Zend Scripting Language Engine" string and then for a user-agent containing the indexing bot of your favorite search engine. Among the data displayed by the phpinfo() page is a header containing the browser provided User-Agent field, which is always populated by respectable crawlers such as the ones uses by Google and Yahoo. The presence of this value guarantees that the page shown will be an actual page, rather then a copy in paste where the field will be populated by the user's own browser.
Here are the sample search queries for
Google and
Yahoo!.
Both of the search engines display tens of thousands of results, however Yahoo seems to find three times as many pages as Google (118,000 entries vs Google's mere 38,000). Random spot checks show that nearly every result is valid, with most invalid ones being attributed to removed phpinfo() pages, however those often have a "cached" version, so the data is still around. Unfortunately, both search engines cap the result set at 1,000, so you'd need to add additional filters to get around this limit.
Aside from the phpinfo() data you can also find some interesting data on the frequency of the crawler's visits. Both crawlers send the
"If-Modified-Since" header trying to identify "old" pages that have changed since last visit to avoid having to download the page again. When you looked at the cached phpinfo() output you can compare that value to the "retrieval date" shown at the top of all the cached pages.
While there are plenty of harmless uses of the data displayed, in many instances this can reveal far more information then the user/hoster desires to outside sources, and can be used for evil rather then good
. The general advise has been that phpinfo() pages are not left out in the open and are only created temporarily for debugging purposes. This tip is especially important for users utilizing older version of PHP where the output is vulnerable to XSS and CSRF.