In my SQL Server production, there are 95-100% CPU usage incidents almost every day, which last from 5 to 10 minutes. Although it recovers and does not require intervention. It is due to a particular SP that consumes more than 90% of the CPU, since I can see that at the top when I see that the most frequent queries run on a sentinel.
Exactly during the incident, I saw the plan and it tells me that a high number of IO readings (in billions) and why high CPU usage
However, after or before an incident of 5-10 minutes, if I run the SP (with exactly the same parameters and configuration options), the plan tells me that they are only a few thousand readings and a normal CPU consumption .
I checked if a new plan has been generated but it is not the case.
Observations so far:
- The transactions / second, the batch / second, the forwarded records, the compilation / second, the recompilation / second and all the other metrics are the same and are as they should be, except.
- The moment of the incident is not fixed and may occur at any time during the user's activity time, so we can not wait for the participation of any work, etc.
Why with the same parameter, execution plan, configuration options, etc., there are so many differences for IO and CPU for the SP?