Despite what Andre Agassi said in all those TV commercials, image isn’t everything. When it comes to running a Web site, performance and reliability are king.
Without these two factors, your image never reaches the intended audience or, if it does, the image is at best one of incompetence and poor customer service.
With companies relying on the Internet to connect remote and mobile workers, exchange information with suppliers and get their messages out to the broad public, a lost or slow connection robs the company of customers and income.
Let’s take a look at some of the most common issues affecting Web performance and how to identify and catch those thieving scoundrels.
The Usual Suspects
It’s easy to tell when there is a problem within the company’s own network. Employees will be on the phone to IT as soon as their computers slow down.
Problems plaguing external users, however, are not so apparent. When someone can’t access your Web site, they aren’t going to e-mail you that there is a slowdown, especially when they can’t even log onto your site to obtain the contact information.
So, how does one determine there is a problem and then discover its source? Symptoms of network problems include slow response times, excessive database table scans, database deadlocks, pages not available, memory leaks and high CPU usage.
But none of that identifies the offending hardware, software or connection. When dealing with an Internet application the problem can lay anywhere along the process, including the application design, incorrect database tuning, internal and external network bottlenecks, undersized or non-performing hardware or Web and application server configuration errors, not to mention errors at the client end.
To identify the most common types of problems, Mercury Interactive Corp. of Sunnyvale, Calif., analyzed the results of thousands of load tests conducted on B2B and B2C Web sites using the company’s ActiveTest software.
The root causes broke down into four main areas — databases, Web servers, application servers and the network — each resulting in 20% to 27% of the total number of problems.
Malfunctioning databases were the largest source of problems, being responsible for 27% of the errors found.
The most common were insufficient indexing, fragmented databases, out-of-date statistics and faulty application design.
Any of these wreaks havoc on sites that depend on database operations, such as e-commerce sites.
Depending on which type of database error is found on a particular site, the solution lies in tuning the index, compacting the database, updating the database or rewriting the application so that the database server, not the client or application, controls the query process.
The network came in as the No. 2 wrongdoer at 25%. Undersized, misconfigured or incompatible routers, switches, firewalls and load balancers all lead to performance problems.
In addition, there is the need for adequate bandwidth on each of the hops along the way. Mercury found that in nearly one-quarter of the cases, the pipe to the Internet was too small to handle the desired load.
The application servers caused 23% of the problems discovered, and these problems can also impact internal operations.
The common errors were poor cache management resulting in overworking the CPU, not optimizing database queries, incorrect software configuration and poor concurrent handling of client requests.
Finally, there is the Web server itself. Poor design algorithms, incorrect configurations, poorly written code, memory problems and an overloaded CPU all create bottlenecks at the Web server.
So, this narrows it down to a few dozen possible culprits. It gives you some places to start looking, but you still need tools to zero in on the actual source.
Companies Need Real-World Mirror
“The performance issues that most companies have with their Web sites on a day-to-day basis relates to issues that are on the Internet, typically things that are out of their control and that they have little visibility into,” says Jeb Bolding, senior analyst for Enterprise Management Associates in Bolder, Colo.
“Companies need to invest in pre-production environments that mirror their real-world environment as well as effective eBiz testing tools that can mirror the impact of users on their sites.”
The first testing needs to be done before going on line. An application may run fine on the programmer’s workstation, but lock up under the stresses of real world traffic.
ThinAirApps Inc., a wireless server developer in New York City, uses Rational Software Corp.’s testing tools to load test its software before shipping it to customers.
To do this, the company takes an actual wireless device such as a PDA, initiates a session with the server and executes different user scenarios. This is repeated using other types of devices.
All the traffic from each of the transactions is recorded in HTTP. The company then loads the resulting data onto a batch of workstations and executes the scenarios simultaneously.
“By doing this we can see what happens when a thousand people are doing it at the same time,” says Evan Simeone, senior product manager at ThinAirApps.
“We can find out at what point the system will break, whether you start dropping sessions or does the whole server crash?”
The Rational software records and reports such items as response times, CPU and memory utilization. ThinAirApps drills down into these results and finds the offending piece of code or hardware.
“This functional testing must occur to ensure quality is all the uses being run simultaneously by a number of users,” says Simeone. “You may find subtle bugs, such as a lock on a database causing other users to time out, which you won’t find unless you test the system under load.”
Measure User Experience
But even after the testing an application in the laboratory, it still needs to be tested on site and then continually monitored.
BenefitPoint, a San Francisco company which provides hosted workflow software to the employee benefits industry, uses Mercury’s Topaz software to monitor performance at the company’s collocation facility.
The company had a series of tools to monitor the network, servers and applications, but didn’t have anything which measured the user experience.
So it installed Topaz agents to measure how long a transaction was taking from the point a customer enters the firewall until the transaction exits the firewall again.
While this doesn’t directly measure customer response time, it does show how well BenefitPoint is performing its duties.
“Our SLAs with customers exclude travel time over the Internet,” explains Jim Alexander, the company’s CTO. “The Internet knows no SLA, so we can’t control it.”
The Topaz software conducts average screen response time tests about 8000 times per week.
It pages staff whenever there are two consecutive failed transactions or when the site fails to respond after two minutes.
This allows the company to meet its SLA of 99.9% availability and performance of less than five seconds.
“The most significant finding when we first got Topaz up and running was that queries were taking greater than five seconds and we didn’t know if it was the Internet, customer’s internal networks, or a poor performing query in our application,” says Alexander. “By getting the objective, quantifiable feedback, it gives us the information we need to fix the query.”
As a result, the company has been able to bring down its average response times from 2-3 seconds to .4 seconds.
And, since BenefitPoint now knows when there is a problem before the customers call in to complain, they can reassure the customers who do call that the problem is being worked on.
Like cleaning up a crime-ridden neighborhood, speeding up Web performance isn’t a matter of solving all of your problems by finding a single perpetrator.
You take whichever one you find, get it off the street or out of the network, and then move on.
You may find any or all of the problems listed above, in addition to others that are unique to your own Web site. But by removing the performance thieves, bit by bit, performance and reliability improve.