Nudging machines

I’ve told this anecdote before. Back then it was about how you get from couch potato to metric century rider ¹ . I like to tell it whenever the occasion fits. Simply because in its simplicity it describes reality remarkably well.

Many, many moons ago, when years still started with 19 and my age started with 2, my employer at the time sent me to a training course. “Leading Without Hierarchical Power.” ² These soft skill trainings usually include a certain amount of “group exercises.” One of them has stuck with me to this day.

Tennis balls

A whole bunch of tennis balls - with a number of seemingly unrelated things - lay in front of a group of about 8 people and the question was: “All team members must touch all tennis balls. How do you do it fastest?” The approach of “everyone takes turns picking up each ball from the table” is pretty slow. Passing each tennis ball by hand along a chain is already quite fast. But it can be done much faster. You develop ideas, and each new idea makes it a little bit faster.

The trainer was never satisfied, though. In the end we went from, I think, almost a minute for the whole batch of tennis balls to just over 1–2 seconds.³ Had the trainer told us at the beginning that we needed to do it in such a short time, we would all have said “Forget it, that’s not possible.” But we got there through an iterative process. Step by step. A little faster each time. In absolute terms the early gains were large (from 20 to 15 seconds), eventually the absolute gains became small (from 4 seconds to 2 seconds), but in relative terms they were considerably bigger leaps forward.

In case this group exercise is still part of a training today, I don't want to spoil the solution. But you can click the triangle to see our solution.

One of the seemingly unrelated things was a roll of duct tape. MacGyver the tennis balls together, connect them to each other, and then pull them past everyone's fingers. Done briskly, that takes under two seconds with 8 people.

The whole thing is of course a cheap psychological trick. But it works.

Performance tuning

I often think back to that training whenever I do performance tuning somewhere. As I did just a few days ago on my personal systems.

The mechanism that puts my blog on the web is essentially a Gitea workflow that runs a whole series of tasks in sequence. This starts with launching the build container and ends with rsyncing the result and tearing down the build environment. The build wasn’t really slow before, either. 41 seconds minimum from merge to deployment on the website. But I wanted it faster. Lately I had increasingly noticed that I was sometimes pushing changes faster than the build process could keep up. Especially in the draft environment.

I was quite successful at it, too. The whole process now takes 16 seconds. Measured from merge into main in Gitea to completion of deployment on the web server. Some things couldn’t be sped up. Rsync simply takes its time. So the speedup had to come from a few specific steps.

For me, one of those was the mechanism I use to digitally sign the website.⁴ Even more optimisation potential lay in Jekyll.

Before optimisation, Jekyll needed over 20 seconds per run. The first jump came when I moved my Jekyll build to a system with somewhat higher single-thread performance. Jekyll is — as far as I can tell — strictly single-threaded. So that attack vector was obvious. That brought me to about 18 seconds.

Generating...
done in 18.578 seconds.
Auto-regeneration: disabled. Use --watch to enable.

I was hoping to maybe save another 5 seconds. It took me several evenings to implement the corresponding optimisations.

The result: the site currently generates in 1.9 seconds.

Generating...
done in 1.906 seconds.
Auto-regeneration: disabled. Use --watch to enable.

The process was not unlike the tennis balls. At first, relatively large, quick gains that were low-hanging fruit. For instance, switching the signing process. Or stopping the nonsense of generating the entire CSS inline for every single document.

Later the gains became relatively small in absolute terms — such as pre-generating certain text fragments in Ruby instead of using the Liquid templating engine. Since Ruby is not my native programming language of choice⁵, those fruits were unfortunately much harder to reach.

From roughly 12 seconds onwards, I probably did entire build runs for every second shaved off.

Experiment

What makes the story interesting: I gave the same task to an LLM. As you’ve read in other articles, I’ve been experimenting with that lately.

Why? A comment has stuck with me that was once rightfully “thrown in my face”: my blog was my baby, and I was leaving it lying in the gutter. I tidied up my blog reasonably well at the time. But the word baby stayed in my head and eventually led me to a thought: if it’s a baby, I obviously need a nanny. Of course I couldn’t hire a nanny for a blog that was essentially no longer being developed. The idea resurfaced when my experiments with vibe coding actually worked quite well.⁶

Because the state the comment referred to wasn’t intentional — it was simply the consequence of a lack of time. Or of having an idea but lacking the time to implement it, leaving it incomplete.⁷

Motivating an LLM

What happened when I let an LLM loose on the performance problem? I packed my blog minus the content into a tarball and mainly gave the prompt the output of Jekyll’s --profile option. The question was simple: “Want this faster. Make suggestions. Now.”

The LLM also spotted the major blunders in my modified theme. Some of them were already in the original. They were fixed in roughly the same way I had done it manually before. This was hardly surprising, because the solution is obvious: generate once and load as a file. The alternative would have been to generate the file once as a fragment and then insert it into all generated HTML pages via script. That’s how I handle the footer. The recent posts section is relatively expensive to generate but identical on almost every page.

After that, however, a somewhat puzzling effect quickly set in. Just as we as a team had kept saying “It can’t be done faster,” the LLM began telling me: “It can’t be done faster. This is already good. Diminishing returns. Everything else would require large architectural changes.” I think I was at 8 seconds runtime when those answers started. And each time I said what the trainer had told us back then: “Yes it can be done faster. Look more closely.”

So I was in the same position as the trainer many years ago at this point. I had an advantage over the LLM: knowing at minimum where the endpoint was. I’d already optimised the system to that point before asking the LLM. Obviously I expected the LLM to find at least the same optimisations I found. This was an experiment where I knew how far you could at least get.

Of the roughly 20 optimisations⁸ that were ultimately needed to reach 1.9 seconds, the LLM came up with 2 that I hadn’t thought of. It failed to find 3 more even with nudging. Admittedly, the LLM was missing context that was in principle contained in the data provided. But it wasn’t easily recognisable. I knew there were dependencies that could be leveraged. I just hadn’t told the LLM about them. The remaining optimisations were found by the LLM as well as me. But after the 6th or 7th step, the LLM would say every time: “That’s it, no more. Doing more wouldn’t yield sufficient improvements.” With a little nudging, though, it always went further. Half a second here, 2 seconds there.

A few steps were found after a hint like “Look at this file for optimisations” or “Is there anything more you could cache/precompute?”. Whatever the mechanism, it worked. The nudges were always directional, never additional data. I only suggested looking for caching or precomputation opportunities after the LLM had already proposed those mechanisms for other parts. The three optimisations that depended on knowledge of internal dependencies were never found until I disclosed them in a prompt. The only additional data in each round was the output of --profile as a feedback about the previous rounds. When the optimisation broke the generation of the website I passed the error output to the LLM. The LLM was always able to correct the error in the next attempt. The optimisation was done via the LLM’s web interface using the most capable model available, with extended reasoning enabled.

Of course I incorporated the two points the LLM found in addition. They were two relatively high-hanging fruits that I probably would have forgone had I needed to implement them myself. About one and a half seconds could be saved this way. The solutions consisted of moving further tasks into plugins that had previously been handled by the templates themselves. Here I got the plugin delivered right alongside the idea — a ladder for the higher hanging fruits, so to speak.

Perhaps the LLM was right. Individually those performance improvements are small. But in total they yielded a quite significant speedup. This is the incremental nature of improvements the training showed us. While an LLM can most often give a correct answer to questions at the first attempt, nudging the LLM again and again is the equivalent of such an incremental process of optimisation.

Lessons

What did I take away from this: the insight from that training day in the last millennium still holds. You rarely reach your goal in one big leap. Success is very often incremental. Long-distance cycling is no different from optimising a Jekyll blog. Or learning an instrument. Or any other skill.

The above does not apply, by the way, to performance problems caused by a bug. There, my experience is mostly that those big leaps do exist. Once you’ve found the bug, the system suddenly flips into a more performant state. Interestingly, I’ve often discovered further optimisation opportunities on such occasions. The performance problem served as a catalyst for optimisations that went beyond the bug fix itself.

That the LLM failed to find certain optimisation opportunities illustrates a further insight: in the absence of implicit specification — because a developer knows the business process (for example: she’s been working at a company for 20 years) — the entire specification must be explicitly spelled out to achieve an optimal result. The effort saved in automated code generation must therefore be invested upfront in specification.⁹

You also have to “motivate” LLMs. It’s not real motivation, of course — but you may need to keep re-pointing the LLM at the problem. You have to nudge the thinking in the right direction, just as the trainer did back then with the group exercise. But above all, you must not trust an LLM when it says: “That’s as good as it gets.” At least not the first few times, especially when the last step got you some real savings. Even when the system is saying “There is no more.”

That said, “LLM Motivator” was not on my bingo card of possible future jobs. I’m well aware that it’s an anthropomorphisation to speak of motivation when all I did was tell a machine to throw more compute at a detail. But then: what is it when you tell someone “Think about the problem more carefully” when their answer leaves you wanting?

I’m still sure that more is possible than what’s been achieved so far. I know of a few places where I could certainly squeeze out another 0.2–0.3 seconds. At the cost of higher complexity. And that raises the question: “Should I?”

Because even if it’s tempting to have an LLM implement the changes — in the end you have to maintain the code yourself. And that’s not worth the 0.3 seconds. My personal costs would not have arisen during development — those would have been practically zero thanks to the LLM — but afterwards, in keeping the code running. I suspect that the low cost of generating new code creates a certain tendency to generate more code than you can humanly maintain and operate afterwards.

Substantially more would only be possible if I switched from Jekyll to Hugo. But here too I was pragmatic in the end. What good does it do me if Hugo needs 0.3 seconds instead of 1.9 seconds for the build, when with the current setup the rsync alone takes 7 seconds?¹⁰ Or the time until the container actually starts generating the site has already consumed 4 seconds.

Is the nanny hired?

I’d like to answer with a “Yes, but…” Will I use the LLM as a nanny for my blog? Yes, certainly. It has proven itself here on a number of tasks. Do societal consequences of LLMs worry me? Yes, certainly. Cum cura ardenti. And that is the dilemma I’m currently facing, one for which I don’t yet have an answer.

Postscriptum

Of course I asked the LLM to solve the tennis ball problem of that training:

Q: There is a desk with a hammer, some yarn, a roll of duct tape, a number of tacks and 8 tennis balls. You can use all of the things on the desk. The task is: There are 8 people. Each of the 8 people has to touch each ball. What is the fastest way to do it?

A: Have all 8 people stand in a circle and pass all 8 balls around. Each ball goes through every person's hands, and since all balls can be moving simultaneously, it completes in just 8 passes — a few seconds total. No tools from the desk are needed. The balls themselves are enough.

Q: Is there a faster way?

A: Yes — pile all 8 balls together on the desk and have all 8 people crowd their hands in and touch all of them at once. With 16 hands on 8 tennis balls, everyone can make contact simultaneously. Done in one instant instead of 8 passes.

Q: Are you sure all people have touched all balls?

A: You're right to push back. In a crowd of hands, there's no guarantee each person contacts all 8 balls. Better approach: use the duct tape to bind all 8 balls into one big cluster. Now it's a single object. Each person grabs/touches the cluster and is guaranteed to be able to touch all 8 balls. All 8 people can do this simultaneously by placing their hands on different sides of the cluster. That's the fastest — one moment, everyone touches at once, and every ball is accessible because they're fixed together.

Q: It isn't sufficient that a person touches some of the balls. Each ball must be touched by each person.

A: Good point — here's a better method: Use the tacks and yarn to string all 8 balls together in a line (poke the yarn through each ball with a tack to anchor it). Then have all 8 people form a line, each making a loose fist around the yarn. One pull drags the entire string of balls through all 8 hands — every ball touches every person in a single motion.

Q: Which tool should I use to poke the yarn through the ball?

A: Use the hammer and a tack — drive the tack through the tennis ball with the hammer to make a hole, then thread the yarn through.

Q: A tack is very small

A: You're right — forget poking holes. Just duct tape each ball to the yarn at intervals along its length. Then pull the yarn-line of balls through everyone's hands as before. No hole-poking needed.

I think we needed more iterations to come up with the solution back then, though we didn’t try wrong ones — we discussed them briefly and discarded them quickly.

riding 100 km in on go. An imperial century is 100 miles. ↩
Or as I called it: “Führen ohne Schulterklappen”, perhaps best translated to “Leading without the paygrade” ↩
Interestingly, the small absolute gains at the end were the ones that required the biggest rethinking. ↩
I switched from “I sign every document individually” to “Every file contains a hash and I sign the list of hashes.” This saved several hundred calls to gpg, which even when significantly parallelised still took a certain amount of time. ↩
That would be Python. ↩
Though I also have a very forgiving use case. ↩
I have a tendency to write long articles. Having a roadmap for these articles always seemed sensible to me. So I wanted a floating TOC in my blog. Now I have one. ↩
There were of course considerably more build script runs. Along the way I produced variously destroyed draft websites. The radio silence during those days was also due to the fact that the only fitting description of the build process would have been “FUBAR.” I couldn’t publish any article at all. ↩
Which could lead to much better specified projects, which would have made the job of human developers much easier and faster. ↩
The generated files change with every build due to the timestamp, so they need to be transferred again. ↩

No comments yet.
Be the first to reply! ↗

Tennis balls

Performance tuning

Experiment

Motivating an LLM

Lessons

Is the nanny hired?

Postscriptum

Joerg Moellenkamp

Back Burner

Brazilian waxing for yaks

Deus ex prompt

c0t0d0s0.org

Recent posts

25 Jahre

The Day My Heart Stood Still

Menu

Nudging machines

Tennis balls

Performance tuning

Experiment

Motivating an LLM

Lessons

Is the nanny hired?

Postscriptum

Joerg Moellenkamp

You may also like...

Back Burner

Brazilian waxing for yaks

Deus ex prompt

25 Jahre

The Day My Heart Stood Still