Cause and effect
Sometimes i would like to be able to do the jedi mind trick. Like “This isn’t the performance issue you are searching for”. Because it would be easier to convince people in that case that they saw the effect, but not the cause. That said, I think the real root cause on what i will describe in this block of this problem is the separation of database admins and system admins. But that’s my personal opinion.. Imagine from one day to another, your storage goes mad. The analytic tools of your storage show long latencies, the led of the storage just show massive use of the storage. As you didn’t made any change your first assumption is “storage has a defect” or “i just hit a bug in the storage firmware”. At the end he or she thinks “No major release changes. Perhaps a few minor configuration changes. And none of them has to do with storage”.
And the first thought of a sys admin is often: “A storage problem. Let’s open a case”. However: The experienced database administrator knows that there is a beast inside all databases.I’m simplyfing stuff a lot at this place for us sysadmins, but: When you send a SQL query to your database it tries to figure out how to execute it and what is the most clever way to execute. Should i do full table scans, should i use indexes, how should i do the joins. Stuff like that. And the more sophisticated the database is, the more factors are used to find the most clever way to execute your database. I would say, they would even factor in the position of the moon, if they find out that this has effect on the spin of the hard disk. The database has statistics about your data to base the decision of the optimizer on your data and just not averages over all customers. Before a single database block has been read, the database has computed something called “execution plan”. However sometimes the statistics computed from time to time aren’t representing your data because of changes in your data. That doesn’t imply code changes. Perhaps it’s just a country wide marketing campaign “All Joe Millers get 50% off”. And now you have a lot of Joe Millers in your database. Or you have added a core to your database domain. Factors a database optimizer puts into the equation. You do requests after requests. Execution plans are computed. However the execution plans can change over time. The database system can come to the idea, that based on the current state of the database, historic data about the databases, a different execution plan delivers much better results and switch for example from using indexes to full table scans. The query execution plans are not always correct or best, especially if you don’t generate the statistics of your database from time to time. And based on your query and the age of statistics your execution plan for the same query can even change over time despite executing the same command. They have to. There is a lot of knowledge, experience, innovation as well as splintered teeth and broken noses in the sense of “good idea, bad idea” in this part of the development of a database. Or to say different: There are good, page-rich books about just the topic of the “cost based optimizer” of Oracle. Looks like an important topic ;) That said, a long story short: In case your storage just goes from normal load from high load, the first step isn’t to call your storage vendor but to ask your DBA about changes in the database. There are some question you should ask first:
- Are there any new SQL statements?
Even when you think "nah ... we hadn't a release upgrade look for new statements. I once saw performance degradation in a OLTP database because someone forgot to run a clean-up process and tried to make up for error by doing a massive deletion run with a extremely suboptimal sql statement with missing indexes taking all available resources. This never was a problem before when you had to delete a few hundred rows ... with a several hundred thousand rows it was a completely different story.
- Are there changed execution plans?
Look into your AWR report, look for the most resource intensive SQL statements. Look for the execution plans. Is there a change in the execution plan to the past?
- Where is the historic data?
However to be able to compare, you have to collect something as a baseline. When everything works well, create a AWR report of a timeslot with with pretty usual load. Gather the query execution plans of the most frequent and most resource hungry SQL plan.
This information and this questions are important to hunt down the cause of the increased load of the storage, which is almost always just an effect that slows down things. The cause is a change in the usage