School District reduces data loss and increases performance with ISI Server Health
Several months ago a software development partner came to us for help with one of their largest clients. This client built a strong and growing business as a cloud services provider for school districts. The system was very well received at all levels of those school districts.
The platform consists of multiple web and database servers, hosted in a Tier 1 datacenter, each on its own very expensive physical server hardware. Each school district is assigned to access one of the web servers with their data residing on one of the database servers. The workload is distributed across these servers based on the team’s best judgment of the school district’s needs and time zone.
The client’s net revenues are largely determined by the cost of these rented servers and the density of school districts using each server. The client’s goals were…
To determine the hardware resources consumed by the current work load for each server
Get recommendations to improve overall performance
Develop resource requirements for future server builds
When the system was first created, the team could only estimate the resources that would be required to run the system. As the number of school districts on the system increased, the only true measure of performance has been direct feedback from users and ad hoc observations of system performance during peak usage periods.
Using our industry leading monitoring systems, ISI built a custom monitoring plan. For about 3 months we collected and analyzed many thousands of data points and created hundreds of graphical reports showing how the system resources were actually being used. SQL monitoring tools were used in conjunction with specialized monitor sets to capture vital Process Blocking and Wait Times caused by inefficiently written queries and reports.
Our analysis showed several important facts and trends
The Web Server hardware was significantly over powered for the workload and thus fewer servers were needed or many more customers could be added without fear of degradation in performance.
The Database Servers were generally provisioned with perfectly sufficient hardware resources with the exception of hard drive space in one instance.
The overall performance, as perceived by the customers, was significantly and primarily impacted by several improperly written queries.
In addition, other problematic code was causing intermittent loss of data which eroded trust in the system which in turn caused the users to execute exponentially more queries to check the proper capture of the data. These excess queries then had the impact of causing more performance problems, more lost data and, once again, more queries… a death spiral.
After correcting the problematic code, the performance of the system increased tremendously, and the monitoring measurements were invaluable for creating specifications for new servers saving the client a great deal of capitol.
Our monitoring system continues to generate valuable information on the health and wellbeing of the servers and enabling proactive action to prevent downtime to the clients.
Any Company…
Any company of size has investments in servers whether they are on premise or in the cloud. And every server is a monstrosity of complexity. Consider that the choices of hardware, operating system and applications are astronomical. Then consider that the number of possible configurations within and between these components are nearly infinite, all of which provide the luxury of complete customizability. In short, every single serve is unique. And then consider that these systems undergo constant change. Patches and updates are installed nearly daily. User activity creates logs, errors and configuration changes. So what tells us if any of these changes are impacting performance of those systems? That is where a Systems Health Disciplinarian comes in.
Critical business servers are often ignored once they are installed and configured. So long as the customer’s or user’s experience performance that is reasonably tolerable, not much attention is paid to the cost of performance, or what I call Performance Value. Performance Value is the business return that can be obtained from a given asset or investment. From the context of a specific server, think of PV as the number of user hours per dollar of server cost.
User Hours
The bottom line for any web based application service offering is how many User-Hours can be had from the investment. User load may vary based on work day, but if an enterprise has users in multiple time zones, the workload may shift between time zones keeping the resources of a system busy most if not all the time