Statistics from the top 1,000,000 websites

Note: This article refers to an older version of Acunetix. Click here to download the latest version.

The next version of Acunetix Web Vulnerability Scanner (version 7), will contain a much more improved HTTP stack.   While testing, we wanted to test the new HTTP stack on as many sites as possible to make sure we didn’t introduce any bugs.

Alexa, a web information company, maintains a CSV file containing the top 1,000,000 sites on the internet.  We used this database to test the new HTTP Stack.  While testing, we also thought that it might be useful to gather all the information and come up with some statistics. After all, who does not like statistics and graphs?

From this list, 951990 records were valid and we got a response. The other records were removed either because they were invalid (some entries from the Alexa list might be invalid), or because the hostname could not be resolved, or they were unresponsive during the testing phase. Also, some other entries were removed because there were multiple domains redirecting to the same domain that was already in the database.  These results were stored in a Microsoft SQL Server 2008 database. In the database, we stored the complete HTTP headers and the body.  The information was split in several tables like headers, links, metas and responses, so to be queried more easily. We’ve also enabled Full-Text Search on the database. In the end, the database size was around 50Gb.

Top Web Servers

To start off with, I wanted to see which are the most popular web servers running the top 1m sites on the internet.  I’ve used the information returned in the ‘Server’ header to compute this information.  Not all sites do return a Server header, and as a matter of a fact, 11,100 (1.16 %) sites didn’t return a Server header.  This was quite surprising, as I was expecting a much higher percentage of websites which do not report the running web server version and make.

Name Count Percentage
Apache 627003 68.31%
Microsoft-IIS 185678 20.23%
nginx 42515 4.63%
Google Web Server 25647 2.79%
LiteSpeed 5981 0.65%
lighttpd 5790 0.63%
Apache-Coyote (mostly Tomcat) 5350 0.58%
Phusion Passenger 5271 0.57%
YTS (Yahoo! Traffic Server) 2662 0.29%
IBM_HTTP_Server 2298 0.25%
Zeus 1883 0.21%
Jetty 1506 0.16%
Zope 1161 0.13%
Resin 1045 0.11%
Mongrel 852 0.09%
Sun-ONE-Web-Server 838 0.09%
Oracle-Application-Server 815 0.09%
Lotus-Domino 795 0.09%
Netscape-Enterprise 470 0.05%
WebSphere Application Server 140 0.02%
AOLserver 130 0.01%
Sun GlassFish Enterprise Server 96 0.01%

As you can see from the table above, 88.5 % of all web servers are Apache (first place) and IIS  (second place) combined together.

nginx web server is a distant third, with only 4.63%. This server is quickly gaining popularity.  I’ve seen it used a lot on high-traffic websites nowadays (typically used as a reverse proxy server).

Google Web Server placed fourth, with 2.79%.  I’ve combined GFE and GWS Server strings. Sites/blogs hosted on blogspot, are returning the GFE Server string while the Google sites are returning GWS.

Fifth place was LiteSpeed web server.

I’ve also created a Wordle image to better represent the information. The image is not 100% accurate, since I had to lower the numbers for Apache & IIS.  Otherwise the image would only contain those two web servers and the other names would be unreadable.

Apache version distribution

Next, I wanted to see how many sites are still using the old Apache version1.   Although there are still some websites running on Apache Version 1, most of them already switched to v2.  From Apache’s website it seems that Apache Version 1 was last updated on 2008-01-19. Same situation for Apache 2.0.x

Name Count Percentage
Apache v2.x 356162 82.6%
Apache v1.x 74521 17.3%

Microsoft IIS version distribution

Moving on to Microsoft web servers, Microsoft IIS version 6 is the most commonly used. This is the version which is shipped with Windows Server 2003. IIS Version 7 (the one shipped with Windows Server 2008 / or later) is starting to gain popularity as well, but pretty slowly.  Strangely enough, it seems there are still websites running on Microsoft-IIS version 4 (or are these fake Server headers? I remember that mod_security was configured at some point with SecServerSignature “Microsoft-IIS/4.0″).

Name Count Percentage
Microsoft-IIS v6.x 158087 85.15%
Microsoft-IIS v7.x 15683 8.45%
Microsoft-IIS v5.x 11661 6.28%
Microsoft-IIS v4.x 216 0.11%

Unix vs Windows

The next table shows Unix vs Windows distribution. To determine the number of Windows running computers, I’ve added the number of IIS web servers with the ones that return Win32 or Windows in the Server header.  For Unix it was more complicated, I had to add the servers returning ‘Unix’ with all the other Unix flavors like FreeBSD, Mac OS X/Darwin and with all the other popular Linux distributions. I’ve ended up with a SQL query like this:

SELECT COUNT(value) cnt
FROM headers
WHERE name='server' and
value like '%Unix%'
or value like '%Red Hat%'
or value like '%CentOS%'
or value like '%OVH%'
or value like '%Fedora%'
or value like '%SUSE%'
or value like '%Debian%'
or value like '%Turbolinux%'
or value like '%Ubuntu%'
or value like '%Mandriva%'
or value like '%Trustix%'
or value like '%Gentoo%'
or value like '%Slackware%'
or value like '%Linux%'
or value like '%SunOS%'
or value like '%FreeBSD%'
or value like '%Darwin%'
or value like '%OpenBSD%'
or value like '%Mac OS X%'
or value like '%OS/2%'


Not all the sites do return the operating system in their Server header, but you can still see that Unix is leading on the server front, and by far, I may say. From all these Unknown operating systems, most of them are Unix systems returning ‘Apache’ in the Server header.
Name Count Percentage
Unix 384490 40.38%
Windows 192403 20.21%
Unknown 375097 39.40%

Linux vs other operating systems

Again, this is pretty hard to determine, since many operating systems are returning ‘Unix’ or don’t return anything else in their Server header. However, based on what I can count, Linux is a clear winner. FreeBSD is not that popular anymore.  A few years ago it used to be very popular on high-traffic websites. Linux is in a better position nowadays, especially since the 2.6 branch started.

Name Count Percentage
Linux 151277 94.05%
FreeBSD 7762 4.82%
SunOS 1219 0.75%
Darwin 582 0.36%

Linux distros


Next, I wanted to calculate which are the most popular Linux distributions.  CentOS, Debian and Red Hat are pretty close to each other, with CentOS the current winner. CentOS is derived from Red Hat sources. CentOS is completely run by volunteers and unfortunately, a few months ago they had some problems.  It seems that Lance Davis, holds sole control of the domain with no deputy. That’s pretty scary when CentOS is the most popular Linux distribution for web servers. Though since then, the problem was solved.

Name Count Percentage
CentOS 38257 26.38%
Debian 34168 23.56%
Red Hat 33154 22.86%
Fedora 16283 11.23%
Ubuntu 12789 8.82%
SUSE 9285 6.40%
Gentoo 415 0.28%
Turbolinux 225 0.15%
Mandriva 203 0.14%
Trustix 201 0.13%

Another Wordle visualization for Linux distributions.

Web Technologies distributions

The next table is about web technologies distribution. PHP is the clear winner with ASP.NET following on a distant second place. Together PHP+ASP.NET are installed on the majority of the web servers which run the top 1M sites. This information was determined using the X-Powered-By header. Ruby on rails is gaining a lot of media attention nowadays, but in real life they only have around 0.48%.

Name Count Percentage
PHP 403188 69.27%
ASP.NET 170202 29.24%
Java (Servlet+JSP+JSF) 5638 0.96%
Ruby on Rails 2798 0.48%
Python 129 0.022%
ColdFusion 23 0.003%

PHP Version distribution

What versions of PHP are mostly used? PHP version 5.2 is taking the first place, followed by version 4.4. No surprises here.

Version Count Percentage
v5.2.x 108152 73.30%
v4.4.x 26008 17.62%
v4.3.x 7461 5.05%
v5.1.x 3690 2.50%
v5.3.x 1084 0.73%
v4.2.x 627 0.42%
v4.1.x 522 0.35%


This one is quite interesting. There are more ASP websites than ASP.NET websites, and by far. This came as a big surprise to me. It seems that ASP.NET didn’t catch up that quickly as I, and many others were expecting.  ASP.NET is by far superior, both from the functionality and from a security point of view.  More people should make the switch. How was this information computed?  We’ve used the Set-Cookie header. ASP-NET is creating cookies named ASP.NET_SessionId= and ASP is creating cookies named ASPSESSIONID*.  I could be wrong about this.   If somebody has a better way to calculate ASP/ASP.NET distribution, please contact me.

Name Count Percentage
ASP 52591 63.10%
ASP.NET 30745 36.89%

Top 50 TLD (top-level domains)

The last statistic is about TLD (Top-level domain) distribution. .COM is the clear winner and it is not surprise to anybody. The surprise (at least for me) is that .de is on the third place, before .org. How come there are so many German websites on top 1Million?

TLD Count Percentage
com 533272 56.21%
net 62383 6.58%
de 45834 4.83%
org 41349 4.36%
ru 33414 3.52%
cn 21639 2.28%
uk 18741 1.98%
jp 17111 1.80%
info 14388 1.52%
it 11048 1.16%
nl 8780 0.93%
pl 8335 0.88%
br 7996 0.84%
fr 6790 0.72%
au 6221 0.66%
in 4671 0.49%
es 4359 0.46%
se 3713 0.39%
biz 3572 0.38%
cz 3570 0.38%
ca 3461 0.36%
ro 3411 0.36%
at 3325 0.35%
ua 3295 0.35%
tv 3274 0.35%
gr 3096 0.33%
ir 3088 0.33%
edu 3065 0.32%
eu 3061 0.32%
za 2917 0.31%
ch 2804 0.30%
dk 2644 0.28%
cc 2621 0.28%
us 2600 0.27%
hu 2593 0.27%
ar 2168 0.23%
be 2023 0.21%
no 1886 0.20%
tr 1876 0.20%
mx 1726 0.18%
kr 1640 0.17%
tw 1598 0.17%
fi 1566 0.17%
il 1449 0.15%
vn 1344 0.14%
sk 1230 0.13%
cl 1121 0.12%
ws 1085 0.11%
nz 1050 0.11%
id 1027 0.11%

I am planning of publishing other blog posts with more statistics from this database.  If you would like to see some particular statistics, or have some ideas, please post a comment.

Share this post
  • If possible, I think it would be interesting to identify how many sites redirect to https, and of those, how many force the redirect (do not allow content to be downloaded via http). Good luck with the query string 🙂

  • I will see if we have this information in the database. We were following the redirects automatically and I’m not sure if we kept both the original domain and the redirected one.

  • Leave a Reply

    Your email address will not be published.