Testing A Scalable Infrastructure

Work Completed By: Sukhpal Shergill


Cloud service elasticity is achieved by using the internet as a mechanism to provide access to generic infrastructure resources such as storage and computing. This combined with techniques such as virtualisation enables providers to offer utility computing which can be designed to scale in line with demand with pay-as-you go service options.
This project aims to demonstrate how a service hosted on a scalable infrastructure can automatically increase and decrease in line with demand. In doing so we can understand more about how this is achieved, and what the strengths and weaknesses of such a solution are.
The project will focus on hosting a simple web page on a server where the underlying infrastructure can be designed to provide elasticity as demand on the web page is increased and decreased.
To achieve this, the project needs to do the following (note this is not meant to be a plan just an overview of tasks that may be included):
• Develop a simple web page
• Create a web server to host the page
• Baseline performance under normal conditions
• Measure at what point the performance starts to drop
• Configure the hosting solution to scale to meet demand in order to maintain performance levels


Details of where code, web pages etc can be found.

The work was carried out on VMWare workstation on the laptop I am using, this can be demonstrated from this laptop.


Describe how you approached this piece of work, any technologies, tools or techniques that you found useful or tried and discarded. Include any examples that you used for inspiration and any contacts you have made in carrying out the work.

The approach used was to start with what I know by building a Windows 2008 VM using VMWare Workstation and creating a few websites, then looking at software that could be used to emulate load. Once this had been done I looked at how we would deal with a server under load on our ESX environment and compare that with third party server hosting solutions.

Two windows 2008 servers were built on VMWare Workstation with Service Pack 2, the latest hotfixes and Mcafee VirusScan 8.7 to simulate a production server. Both had IIS 7.0 installed, one had a simple web page created with a link to page 2, page 2 had a link back to page 1, on the other VM Windows SharePoint Services 3.0 was installed to create a more complex website.

flickr:3851531581 flickr:3851531529

To measure at what point the performance starts to drop I looked into web performance load tools, the tools I looked at:

  • Web Server Stress Tool
  • Web Capacity Analysis Tool
  • OpenSTA

The Web Server Stress tool was by far the easiest tool to use with its simple GUI interface, the only limitation being the trial version only allows you to simulate 5 concurrent users.

The Web Capacity Analysis Tool was more cumbersome to configure and in the time provided I was unable to get this tool working, however this is a free offering from Microsoft and is designed to work with IIS web sites so it would be beneficial to investigate and get this tool working.

The most useful tool was OpenSTA which is GUI based, it enables you to create a script of actions for example loading a web page and waiting a few seconds before selecting a link to another page. You then add performance monitors and specify how many virtual users you want to simulate. This open source tool works very well, the limitations being that it only works on IE4,5 and 6 or Netscape and any more than 1665 virtual users leads to the application crashing on the pc from which you are running the tests. I conducted the tests from a Windows XP VM running on the ESX environment while the web server was running on a VM on my laptop, the limit can be overcome by having serveral PCs running the test simultaneously hitting the website.

The result of the load test indicate a VM running a simple website with a simulated load of a few clicks doesn't stress the web server very much, CPU load jumps a little on W3WP .exe and the memory used by this IIS worker process jumps a little. The result on the web server running Windows Sharepoint Services however places more stress on the server. A few clicks while navigating around a sharepoint site leads to a big sustained spike in CPU load with the main processes being W3WP.exe and SQLServer.

Windows server 2008 has improved performance management tools, the screenshots below are from the simple web site using Windows Performance Monitor firstly without load, then with load:


The screenshot below shows that memory consumption has increased leading to paging while load was placed onto the web server.

The following screenshots show the openSTA application:
The screenshot above shows the 3 steps required, firstly to create a script of actions to perform on the website, secondly to configure performance collectors and then thirdly to drag the script and the collector to the test window and specify the number of Virtual Users.

The screenshot above shows some of the charts the openSTA application produces.

The server then had the amount of memory assigned to it increased from 682MB to 1126MB, then the load test run again, this time there is no excessive paging.

Overall its difficult to pinpoint at what point performance starts to drop as its based on the the initial resources assigned to the web server, the complexity of the web site and the actions the average user is expected to perform while hitting the website. Looking at the simple web site where users will on average hit a few pages of a website a basic spec Windows 2008 server with 1GB of ram and a single CPU running at 2Ghz can quite easiliy handle 2000 simultaneous hits.

The next step was to investigate how to address the performance drop, a starting point was our ESX hosting environment, ESX uses a combination of resource pools and increasing the CPU, memory and disk space assigned to a VM.
At the moment we have three resource pools on the ESX production environment, High, Normal and Low Priority. Resource pools allow you to think more about aggregate computing capacity and less about individual hosts. In addition you do not need to set resources on each VM, instead you can control the aggregate allocation of resources to the set of virtual machines by changing settings on their enclosing resource pool. The low resource pool has 2000 cpu shares, normal has 4000 and high has 8000, this means if there is CPU contention a VM in the normal pool will get twice as much free CPU cycles as a VM in the low priority group, a VM in the high priority group will get four times the free CPU cycles as a VM in the low priority group. The same principle applies to memory so one option for a VM that is running low on resources is to move it to a higher priority group, however this will only be beneficial if there is a contention for resources.

The other option is to increase the memory assigned to a VM or add additional processors, this does however require down time even though this will be for only a few minutes, if this isn't enough to cope with the demand the other options open are to add another VM and load balance or to move the application onto a physical server.

Cloud Computing

The next step was to look at how an external hosting solution would deal with this issue, one of the biggest hosting providers out there is Amazon EC2 who use Amazon Machine Images (AMI) which are deployed from Amazon Public Images (API) which include basic Windows server images. there is a choice of small (default), large and extra large instances that range from 1.7GB to 15GB of ram, a single to 4 virtual CPU cores and from 160GB to 1690GB of disk space. It is recommended you deploy your web application on an instance and benchmark the performance and compare that with anticipated load and make a decision on which instance is suitable, however to benefit from the peaks and troughs it would be better to use two or more default instances in a network load balanced configuration, it is then possible to power up other instances when demand rises and power some of the AMI's down when demand drops.


Describe the degree to which the work was successful in addressing the project description. Include reasons why or why not.

The project description was to understand how a scalable infrastructure is achieved and the strengths and weaknesses of such a solution, in the time period it was not possible to create an environment on an external hosted provider (this was done in another project on elasticity ) but it has been possible to look at how they aim to achieve this. Virtualisation makes it possible to deploy multiple VMs very quickly as well as increase and decrease the resources assigned to them with minimum downtime of minutes rather than days or weeks that it would take with physical servers. Other strengths of this approach are that we wouldn't need to worry about power, UPS, cooling, the need to purchase more capacity. The weaknesses of this approach are the reliance on internet connectivity, server licensing issues and being locked into the vendor specific image format, VMWare are pushing for the Open Virtualisation Format (OVF) which attempts to address this point.


What immediate impact could the output of this R&D work have on the organisation – could it provide benefits without compromising our strategic approach?

The one immediate benefit of this R&D work has been exploring load testing tools which we haven't looked at in the past, for example the recent implementaion of Sharepoint Intranet was launched with very little load testing, the testing involved getting approximately 20 users to try specific tasks at a set time. With OpenSTA it would have been possible to simulate a much larger load and more accurately predict whether we had assigned the correct resources to the servers.


How the work carried out fits with our strategic direction or how it should contribute to our strategic thinking.

It appears that a scalable infrastructure delivered by a public cloud would be most appealing to small businesses who have neither the expertise nor the resources to host their own web site. WCC have a private cloud and with the recent investment in expanding the ESX virtual infrastructure capacity the public cloud would be only of use in a few cases. One such case is the Performance Plus application used by Business Consultancy which needed to be opened up for access by external organisations which meant moving a part of the application, the website to the DMZ. Issues regarding security led to this application being moved out of WCC and being hosted externally, this application may have benefited from cloud computing.
On the other hand another application we use Snap surveys was hosted externally and the poor performance issues of the supplier led to this application being bought to the WCC private cloud although it could be argued this was an issue with the particular provider and better negotiation with regards to SLA may have prevented this.

Overall the shift to virtualisation has proved to be successful and will only increase in the coming years which means the use of elastic computing will continue to rise in the private cloud and is likely to take of in the public cloud as applications are written to take advantage of it. The market leader VMWares approach with its latest version vSphere is to work in partnership with suppliers to create private clouds which organisations can span onto when running low on capacity on their own private clouds.


Website links:

Amazon EC2 FAQ:

Open Systems Testing Architecture:

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License