Look at a combination of processor utilization and processor queuing.
(1) The primary indicator of processor utilization is the % Processor Time Counter in the Processor Object. Note that the _Total instance of the Processor Object is actually the average value over all processors. The System Object in NT 4.0 contains a Counter named % Total Processor Time, which is also the average value over all processors.
The thread is the unit of execution in Windows. Each process address space that is launched has at least one thread, and many applications, of course, are multithreaded. There is an operating system function in Windows that keeps track of how CPU time each thread consumes using a sampling technique. Samples are normally taken approximately one or two hundred times per second, which suggests that this technique is probably accurate for measurement intervals of 30 seconds or more. These samples are used to maintain the Thread % Processor Time Counter, once execution time recorded in 100 nanosecond timer tick units, is normalized to a percentage of the measurement interval duration. Thread % Processor Time is also summarized at the Process level. The table below summarizes this overall measurement scheme:
|Thread||% Processor Time||Dispatcher timing mechanism|
|Process||% Processor Time||SThread % Processor Time|
|Processor||% Processor Time||100% - Idle Thread % Processor Time|
|Processor (Win2K, XP)||Processor(_Total) % Processor Time||SProcessor % Processor Time / # processors|
|System (NT)||Total % Processor Time||SProcessor % Processor Time / # processors|
Processor busy is measured using an Idle thread mechanism. The operating system dispatches an Idle thread whenever there are no ready threads to run. Whenever the processor accounting routine finds the Idle thread dispatched, processor time is accumulated for the Idle thread. By the way, the Idle thread is not an actual execution thread, it is a HAL function which fulfills this essentially bookkeeping function.
At the end of a measurement interval, % Processor Time is calculated by subtracting the amount of accumulated Idle thread time from 100%. On a multiprocessor, there is a dedicated Idle thread per processor so that reliable measurements are kept. % Processor Time at the processor level can be broken down into Privileged mode execution time, User mode execution time, execution time in Interrupt mode, and execution time in Deferred Procedure Calls (DPC), as illustrated in Figure 1 below. % Interrupt Time and % DPC Time are both subsets of % Privileged Time. Figure 1. Processor utilization breakdown.
% Processor Time approaches 100% as an absolute upper limit on CPU capacity. A system that is running consistently at greater than 90% busy is clearly out of capacity. However, this is a not a hard and fast rule. Some workloads show signs of significant CPU contention at lower levels of processor utilization. For example, Figure 1 shows an IIS web server at a large e-commerce site where processor utilization remains consistently below 85%, except for two peak processing intervals. However, we will see that this system suffers from a serious CPU capacity constraint. So start with % Processor Time, but do not stop there.
(2) The System Object contains an instantaneous Counter called Processor Queue Length. This Counter shows the number of threads that are currently in the Ready state, but are delayed waiting for a processor to be available. Maintaining a value of no more than five Ready threads per processor is the usual recommendation. More than ten Ready threads per processor normally indicates a CPU resource shortage.
The Processor Queue Length Counter is often well-correlated with % Processor Time, even though the former is an instantaneous value obtained at the time the last processor sample was collected, while the latter is based on continuous samples during the measurement interval. Figure 2 shows the overall processor utilization from the 4-way multiprocessor system in Figure 1 with an overlay of the Processor Queue Length for the same interval (charted against the right hand y-axis). The expected correlation between processor utilization and the number of Ready and Waiting threads is apparent.
Figure 2. % Processor time vs. the Processor Queue Length.
In this instance, we also saw a correlation between periods of poor Active Server Pages (ASP) response time and corresponding spikes in the size of the Processor Ready Queue. In this specific instance, % Processor Time values consistently greater than 70% appeared to cause spikes in web site response time.
(3) At the Thread level, there is a Counter called Thread State. Threads waiting in the Ready Queue have a Thread State code of 1 (see the Explain text for the Counter). This Counter tells you precisely which threads are waiting for service at the processor. Since Windows uses priority queuing to order the Ready Queue, knowing which threads are delayed in the queue can be quite useful to help pinpoint the impact of CPU contention on specific applications that might be experiencing performance problems.
Because of the quantity of threads on a typical NT machine, collecting thread execution state data using tools like System Monitor is normally prohibitive. Consequently, we designed Performance Sentry to allow efficient collection of this potentially useful information. The Ready Threads Counter that Performance Sentry provides at the process level shows the number of Ready and Waiting threads for that process at the end of each measurement interval.