The joy of solving performance problems - Part 1

It took me some more time to have this first part ready, because I revised it quite a lot and in the recent evenings i didn’t wanted to spend the nessesary time in front of the computer . And at the end it’s somewhat hard not to sound like a grey-bearded unix guy that explains obvious things. That said, despite my efforts I probably still sound like a grey-bearded unix guy explaining … well … obvious things. However, here we are,the first part is now in your feed reader or browser.

This part was initially much longer.But even after publishing it, i thought that the second part needed more refinement. So i removed this half. Next Monday i will publish a cleaned up version of this second half. Additionally i had initially the plan to keep the texts at a 5 to 6 minutes reading time and the initial version hat a reading time of 15 minutes. This whole series of blog entries about the joy in solving performance problem will probably end up at 20 parts or so … and not 8 as i initially planned.

This series will start with some thoughts about first steps, but the main point of this part are some thoughts about performance problems in general.

The first steps

Okay, there it is. Massive performance problem out of the sudden. Everything bogs down to a grinding halt. Nothing works. Sounds strange, but these situations are often the easiest to work on.

Most often a singular change, a singular change or singular set of dependent changes can be made responsible for such a grinding halt problem. The git pipeline just dropped a new production environment into existence. A component failed. A limit was reached. Furthermore the grinding halt answers the next question without much words. Because obviously the next answer is a resounding “yes” when everything stops and nothing works.

But often it’s not about “nothing works”. It’s because of the simple question “We noticed that this part is slow” or “The users said that the answer of the system after clicking on this button needs virtually forever”.

There is even searching for performance problems just because a line went red. Sometimes you are just asked to do performance tuning because of some monitoring system that tells you that certain parts of your system got slower, reached a threshold and are now in your reports . And you may decide not to ignore that red line in the report but do something about it.

So, where to start? You would expect from a technical guy like me that I would now start to employ an elaborated set of tools. Describe how to use prstat, iostat or Dtrace. When helping with performance problems I often get the question “Nice, that you are here. Can you write a nice DTrace script”.

But I think in such a situation we are not at this point. Not at all. At this point of time I don’t think about technology. I’m usually starting with questions that are surprisingly un-technical. Searching for performance problems often starts with a pen and my notebook (the one out of paper). And to invest time to understand what is really happening on the “big picture” level. Because you need an impression about the general “problemness” of the issue.

“Okay, it is slow. But is it a problem?”

This may sound like a snappy question at first. However it isn’t. This question is a very short summary of a lot of thoughts about quantifying the situation in the case of the performance problem. Because you have to make a decision or enable an informed decision by your management about investing resources into a performance problem. I’ve heard this question 20 years ago from Ulrich Gräf , a former colleague at Sun whose untimely death is really a great loss to the world. This simple comment at a conference back at the start of the century really summarised a lot of my thoughts at that time.

Well, at the end it’s even good advice for life: Okay, you don’t like this or that or someone. But is it currently a problem? Could it be a problem in the future? If not, why bother at all? You have ample tasks to fulfil.

It is a very good first question you should ask yourself. You will probably invest a lot of time into searching for a performance problem. Or you will invest time with people whose time is only available to you because of the money you give them via consulting fees or salary. Or you have to cash in favours you had on your professional favour account for helping them in the past. You will commit a lot of time and effort so I think it’s important to ask how much of a problem a situation is so you know how much time you should commit.

It’s not such a problem for me, because when I’m asked in my job it’s always a problem. When I’m asked by friends, I know they have carefully assessed the situation. Otherwise I wouldn’t have been asked.

But remember, this is not about my job. The idea of this article is broader. On one hand I want to keep this blog as general as possible. On the other hand: I’m still asking this question in a variant to learn more about the situation. Even when I know it’s a problem.

Slow by itself is not a problem. Sometimes it’s just the way things are, for example because of the algorithm or how your application was developed 25 years ago. Or because there is inevitable latency in the process for example by the way users, systems and other components are located and connected.

It’s the question “Is this a problem?” that brings context into the situation! Does it hurt you in doing your business, in doing your job. When you can answer this with “Yes” the business case for investing time and money is easy. If you answer “Could be a problem in the future, because we are closing in on the SLA limits of the process” it’s an easy decision.

However when you say for example “It’s the runtime for a report nobody really reads, but we have to do it because of we print this for filing afterwards and it’s slow” it’s probably a better investment to invest the time in shaving a few milliseconds or a few hundred microseconds of a core component supporting one of your main business process instead of thinking about this slow but unimportant report. Perhaps you monitor it a little bit more closely to ensure that you aren’t surprised by further development. But that is it.

It’s practically the same at home. When your Mac needs 24 hours to render your vacation video and you think it’s slow because you know there is a faster system and the faster would do the final rendering in let’s say 2 hours, the question is: Are 24 hours rendering time really a problem? Do you have any deadlines? You don’t sit in front of the computer waiting for it. You have no customer that really cares if you have your video rendered tomorrow morning or this evening. And be honest to yourself, don’t use a perceived performance problem as an excuse to spend money for a new Mac to render your vacation video. Just accept that you are in need of a new gadget. And just buy it. You will get back to the point that you know that you didn’t need it for the performance later, and so you why start with lying to yourself.

If rendering a video is your core business process the equation looks different. And there the several thousand euros, the long tinkering with the rendering process is well spent. If you do a daily videocast and you need to complete the video within a very short timeframe it’s well-spent money.

So “Okay, it got slow. But is it a problem?” is actually not a snappy question but just the first think you should think about at start, especially when your are not gifted with an endless supply of people and money to commit to this problem.

“Okay, it is slow. But why is it a problem?”

Over time I adapted the question a little bit. Of course, I’m asking it a little bit differently with a few more words to ensure that it doesn’t sound as I would like to dismiss the problem, but that’s the essence of it. Integrating the “Why?” is something I found very useful. This has two reasons.

Of course the first one is obviously that you will always an unequivocal and resounding “Yes” if the question allows it. In my opinion, when you want to know more about a problem, never ask a question that can be answered with a yes or no.

If you ask why, you often get a lot of additional information about the problem and the bigger picture. For example that we are talking about a part of a very important business process, how the slowness impacts the situation, your learn about the surroundings of the problem. Okay … except someone is answering “Because I want it fast!”. But almost always you get really good answers on this question helping you to understand the situation.

I tend to ask this question like “How does the performance problem impact your business?”. Takes the snappyness out of the question. And gives me the information that I want. Is it a it’s a problem and why it’s a problem. Because business impact transform performance characteristics into a performance problem.

Conclusion

I try always to answer this both question, before even getting near a console. And even then i don’t touch the console, because there are additional untechnical questions to answer. Most importantly about the endpoint when going on the search for performance problem. When does the search stop. Which is a surprisingly difficult question. But that will be the topic for the next part of this blog entry next Monday.