Note: This article refers to an older version of Acunetix. Click here to download the latest version.
The next version of Acunetix Web Vulnerability Scanner (version 7), will contain a much more improved HTTP stack. While testing, we wanted to test the new HTTP stack on as many sites as possible to make sure we didn’t introduce any bugs.
Alexa, a web information company, maintains a CSV file containing the top 1,000,000 sites on the internet. We used this database to test the new HTTP Stack. While testing, we also thought that it might be useful to gather all the information and come up with some statistics. After all, who does not like statistics and graphs?
From this list, 951990 records were valid and we got a response. The other records were removed either because they were invalid (some entries from the Alexa list might be invalid), or because the hostname could not be resolved, or they were unresponsive during the testing phase. Also, some other entries were removed because there were multiple domains redirecting to the same domain that was already in the database. These results were stored in a Microsoft SQL Server 2008 database. In the database, we stored the complete HTTP headers and the body. The information was split in several tables like headers, links, metas and responses, so to be queried more easily. We’ve also enabled Full-Text Search on the database. In the end, the database size was around 50Gb.
Top Web Servers
To start off with, I wanted to see which are the most popular web servers running the top 1m sites on the internet. I’ve used the information returned in the ‘Server’ header to compute this information. Not all sites do return a Server header, and as a matter of a fact, 11,100 (1.16 %) sites didn’t return a Server header. This was quite surprising, as I was expecting a much higher percentage of websites which do not report the running web server version and make.
|Google Web Server||25647||2.79%|
|Apache-Coyote (mostly Tomcat)||5350||0.58%|
|YTS (Yahoo! Traffic Server)||2662||0.29%|
|WebSphere Application Server||140||0.02%|
|Sun GlassFish Enterprise Server||96||0.01%|
As you can see from the table above, 88.5 % of all web servers are Apache (first place) and IIS (second place) combined together.
nginx web server is a distant third, with only 4.63%. This server is quickly gaining popularity. I’ve seen it used a lot on high-traffic websites nowadays (typically used as a reverse proxy server).
Google Web Server placed fourth, with 2.79%. I’ve combined GFE and GWS Server strings. Sites/blogs hosted on blogspot, are returning the GFE Server string while the Google sites are returning GWS.
Fifth place was LiteSpeed web server.
I’ve also created a Wordle image to better represent the information. The image is not 100% accurate, since I had to lower the numbers for Apache & IIS. Otherwise the image would only contain those two web servers and the other names would be unreadable.
Apache version distribution
Next, I wanted to see how many sites are still using the old Apache version1. Although there are still some websites running on Apache Version 1, most of them already switched to v2. From Apache’s website it seems that Apache Version 1 was last updated on 2008-01-19. Same situation for Apache 2.0.x
Microsoft IIS version distribution
Moving on to Microsoft web servers, Microsoft IIS version 6 is the most commonly used. This is the version which is shipped with Windows Server 2003. IIS Version 7 (the one shipped with Windows Server 2008 / or later) is starting to gain popularity as well, but pretty slowly. Strangely enough, it seems there are still websites running on Microsoft-IIS version 4 (or are these fake Server headers? I remember that mod_security was configured at some point with SecServerSignature “Microsoft-IIS/4.0″).
Unix vs Windows
The next table shows Unix vs Windows distribution. To determine the number of Windows running computers, I’ve added the number of IIS web servers with the ones that return Win32 or Windows in the Server header. For Unix it was more complicated, I had to add the servers returning ‘Unix’ with all the other Unix flavors like FreeBSD, Mac OS X/Darwin and with all the other popular Linux distributions. I’ve ended up with a SQL query like this:
SELECT COUNT(value) cnt FROM headers WHERE name='server' and ( value like '%Unix%' or value like '%Red Hat%' or value like '%CentOS%' or value like '%OVH%' or value like '%Fedora%' or value like '%SUSE%' or value like '%Debian%' or value like '%Turbolinux%' or value like '%Ubuntu%' or value like '%Mandriva%' or value like '%Trustix%' or value like '%Gentoo%' or value like '%Slackware%' or value like '%Linux%' or value like '%SunOS%' or value like '%FreeBSD%' or value like '%Darwin%' or value like '%OpenBSD%' or value like '%Mac OS X%' or value like '%OS/2%' )
Linux vs other operating systems
Again, this is pretty hard to determine, since many operating systems are returning ‘Unix’ or don’t return anything else in their Server header. However, based on what I can count, Linux is a clear winner. FreeBSD is not that popular anymore. A few years ago it used to be very popular on high-traffic websites. Linux is in a better position nowadays, especially since the 2.6 branch started.
Next, I wanted to calculate which are the most popular Linux distributions. CentOS, Debian and Red Hat are pretty close to each other, with CentOS the current winner. CentOS is derived from Red Hat sources. CentOS is completely run by volunteers and unfortunately, a few months ago they had some problems. It seems that Lance Davis, holds sole control of the centos.org domain with no deputy. That’s pretty scary when CentOS is the most popular Linux distribution for web servers. Though since then, the problem was solved.
Another Wordle visualization for Linux distributions.
Web Technologies distributions
The next table is about web technologies distribution. PHP is the clear winner with ASP.NET following on a distant second place. Together PHP+ASP.NET are installed on the majority of the web servers which run the top 1M sites. This information was determined using the X-Powered-By header. Ruby on rails is gaining a lot of media attention nowadays, but in real life they only have around 0.48%.
|Ruby on Rails||2798||0.48%|
PHP Version distribution
What versions of PHP are mostly used? PHP version 5.2 is taking the first place, followed by version 4.4. No surprises here.
ASP vs ASP.NET
This one is quite interesting. There are more ASP websites than ASP.NET websites, and by far. This came as a big surprise to me. It seems that ASP.NET didn’t catch up that quickly as I, and many others were expecting. ASP.NET is by far superior, both from the functionality and from a security point of view. More people should make the switch. How was this information computed? We’ve used the Set-Cookie header. ASP-NET is creating cookies named ASP.NET_SessionId= and ASP is creating cookies named ASPSESSIONID*. I could be wrong about this. If somebody has a better way to calculate ASP/ASP.NET distribution, please contact me.
Top 50 TLD (top-level domains)
The last statistic is about TLD (Top-level domain) distribution. .COM is the clear winner and it is not surprise to anybody. The surprise (at least for me) is that .de is on the third place, before .org. How come there are so many German websites on top 1Million?
I am planning of publishing other blog posts with more statistics from this database. If you would like to see some particular statistics, or have some ideas, please post a comment.