Business Scenario-Based Performance Monitoring and Diagnosis
Internal
Overview
We're perfecting a methodology and a set of tools that allow to quantify the performance of complex systems, establish correlations and ultimately point out where the performance problems are coming from.
We applied the methodology to synchronous requests-based systems (web sites), and we're working on extending it to asynchronous message processing systems. At a very high level, the approach consists in defining business scenarios - we use those to provide a better representation of the overall system's performance instead of lower level HTTP requests - apply load, produce data and analyze data while attempting to establish correlations with the evolution in time of the "primary" resources of the system - CPU cycles, memory, open file descriptors - or even higher level resources, such as application locks. A certain amount of statistical analysis and automated reasoning goes into what is otherwise known as "analysis stage" and the results are higher-order synthetic data, such as graphs that make sense to humans or even root cause suggestions.
One interesting aspect of the methodology is that it works with "synthetic load" - the type of load applied by load generators such as JMeter, NeoLoad or LoadRunner - or with production load itself.
High Level Methodology:
• Define business scenarios • Apply load • Measure how business scenarios perform • Break them down into the “layer cake” (requests and in-requests). o TODO Break down into individual requests o Break down an individual request ♣ Look at byteman.
• Find the spikes • Correlate them with primary resource metrics.