Lego Tantrums, Query Scandals, and My Fiancé’s Side Eye:

Read Time:8 Minute, 53 Second

A DBA’s Tale

Picture this: you’re the new kid on the block (well, not entirely a kid anymore…Truth be told, just a tad over my adolescence stage, but you get the 90s lingo, right?), eager to dive in. But the database? It’s locked down tighter than a skinny jean on a middle-aged man. You need access to investigate why queries are suddenly crawling slower than a snail in molasses. Navigating the multiple MFA’s and password control applications, also requiring you to remember a password, weird, I know, my efforts are rewarded with access to the server, but monitoring remains a mirage… Until of course, our team’s kind of magic hits, and we pull together around the glare of our screens and feeding me stats like I am cheating on a diet.


Great team? Check. 
Database woes? Double check. 

Off to the races we go, and when confronted with a query that proudly boasted sub-0.500-second runtimes, we attempt to make sense of the chaos of this world, or at least this query and its impacts. Under load, this Zero runtime query ballooned to over 8 seconds per execution. Now for those who think that doesn’t sound too bad, this is an existential crisis for any high-traffic system. 

The Problem: When “Fast” Isn’t Fast Enough 
This piece of code appears as if from nowhere, only to come and disrupt an otherwise perfectly calm Wednesday morning. (I am grateful, for this part, at least, because generally, they like to appear on a Friday afternoon) It did not just appear, it was generating 1,300 executions per hour, and it didn’t just slow down, it created an avalanche of blocks and contention that flatlined the system. Even with 18 extra vCores (bringing the total to 36), the chaos persisted, and both the clients and our team were on it like sweat on a gym towel. 

The moral of the story? A query’s runtime on paper means nothing if it chokes under real-world pressure.  

The Solution: Rewind, Rethink, Rebuild 

Step 1: Dump OPENJSON 
First, we revisited the codebase, while simultaneously stopping the bleeding, and putting a dead halt on its execution. The query was heavily reliant on OPENJSON, which was like asking a toddler to find one specific LEGO piece in a pile, and not expecting a frustrated, overstimulated, and volatile situation. It was swiftly axed. Instead, the client reverted to an older, saner version of the codebase, replacing JSON parsing with classic WHERE clauses. 

Boom. No more toddler tantrums of missing (or are those new pieces that don’t quite fit the picture) Lego pieces. We were observing the stabilization of the system as the last remnants of JSON left the queue; Kicking and screaming as he went but fading into a dull echo of whimpers and obscenities as he exited the building(system), and we could just stare into bewilderment and pent-up frustration. Now everyone hates the Lego, and playtime is over. Operation Mop-Up on Isle Azure has begun, and there were some stains that was going to need deep scrubbing. 

Step 2: “Stat” Attack
The tables feeding the query were harboring a dirty little secret: outdated and inadequate statistics, leaving them exposed to bad query plans and scandalous exposing of entire data sets through full scans that would make airport security blush. Increasing sampling sizes and updating stats provided swift sanctuary for a battered and beaten database, stabilizing runtimes for our now, new/old codebase. 

Step 3: Index Like a Pro 
Next up: indexing. We quickly identified candidates for indexing by deep diving into query plan executions, where clauses, and the dire need for restored mental health under stressful conditions. Targeting the high-cost columns and converting most full scans into lightning-fast index seeks, we were getting that fuzzy feeling of DBA-preferred calm until Friday feeling back. The remaining troublemakers? We “brutalized” them into compliance with some strategic tweaks on indexing and filtering witchery, leaving them no space to maneuver but in the general direction of making haste. 

Suddenly, stability was in the air, and continued optimization became a question of how long a piece of string could be. 

Step 4: The Future is Brighter
Being the typical DBA, I was obviously not satisfied with a mere stabilization, and I needed to see whether we could create some headroom in-case a similar attempt was planned for a Friday in the not-too-distant future. Better to be prepared, than make plans and miss your braai, I always say. I started to work on those magical zero-second run times again, and my ADHD brain kicked into its hyperfocus mode, leaving me open for a friendly discussion with Mandi later, about what an odd place it is to start a conversation with, “You never listen when I speak”. (Then how come I heard you say that, is not the correct answer to this in my experience.) Anyhow, a few minutes (In my humble opinion.) over normal working hours time, I finally leaned back and sighed. There she was. Like a sunset on a beach…a warm, comfortable, and consistent execution time of 0,275 seconds, even under load conditions. I basked for a moment in the impactful reductions of CPU and IO operations, until I was swiftly brought back to reality. These changes will be pending review, and in all probability, will not be prioritized soon, a fact I am sure Mandi can live without knowing. We managed to stabilize with targeted maintenance and index optimization, to a consistent 0,450 seconds runtime, bringing us back to where we used to be, but without the stresses of OPENJSON and untested code.
 
For those of you who also like these kinds of sunsets, it was a combination of these and some other less successful attempts, to get that perfect sunglasses kind of glare: 

  • Standardizing date comparisons. 
  • Data-specific data types, cleaning up that messy varchar (max) columns that the Developer folk love to use so much.  
  • Fine-tuning indexing for filtered conditions. 
  • Removing redundant computations. 
  • Implementing snapshot isolation for those read-heavy operations. 
  • Reordering WHERE clauses to filter out large datasets early on 
     

There were more, some that worked, some that didn’t, and in another post, I will give some optimization tips on how to take names and kick some… uh, oh yeah, kicking scalability and cost savings into gear. 

From Sloth to Cheetah – the impacts of having an irritable fiancé 

Time Savings 
The old system processed 1,300 queries in 3+ hours. Now? We’re crunching 3,000+ queries in just 24 minutes—an almost 90% reduction in runtime. 

Financial Gains 
Remember those 18 extra vCores? No longer needed. Scaling back to 18 total vCores saves an estimated R1,308.96 daily, totaling R477,770.40 annually. (And some side glances mixed with mumbling for a day or two…)  

Compound Effects 
Less blocking. Fewer waiting threads. Happier CPUs. The result? A system that doesn’t just run faster, it scales and absorbs impacts better.  

A Recipe for Success 

Want to tame your own rogue queries? Here’s a 10-step (nice and predictable and stable, just the way a DBA likes it) recipe for success: 

  1. Start with the basics: Review the execution plan for obvious issues. 
  2. Update your stats: A strong foundation makes for a better optimizer. 
  3. Rethink your code: JSON parsing may sound modern, but simplicity is often timeless. 
  4. Index wisely: Focus on high-impact columns and include essential fields. 
  5. Iterate: Test changes incrementally and under real-world conditions. 
  6. Filter effectively: Place high-impact filters early in WHERE clauses. 
  7. Leverage your team: Collaboration leads to breakthroughs. 
  8. Document everything: Leave breadcrumbs for the next adventurer. 
  9. Keep optimizing: Stability isn’t the finish line, it’s a platform for growth. 
  10. Showcase results: Use time savings and cost reductions to build momentum for future improvements, creating a trusting relationship with your clients through consistency and pride in your work. 

Lessons Learned 

  1. Not all runtimes are equal: Sub-0.500 seconds is meaningless if the system melts under load. 
  2. The devil’s in the stats: Accurate data leads to smarter query plans. 
  3. Teamwork matters: A collaborative team can tackle even the trickiest challenges, at pace and with impact! 
  4. Humor helps: Performance problems is stressful, but a little laughter goes a long way. (Not necessarily at home though, though your performance should never be lacking, it will not be a humorous moment.) 

Conclusion 

What started as a nightmare ended as a triumph. We tamed the 8-second beast, restored sanity to the CPUs, and turned a resource hog into a lean, mean querying machine. 

Not all zero-second runtimes are equal, but with the right team, the right tools, and a sense of humour, anything is possible. 


PS:

Mandi isn’t really upset, in fact, she brought me coffee, snacks, and even a quick neck massage throughout. But if I dare consider she couldn’t have been a little upset about that very tiny, little, almost nonexisting time I went over business hours, I would be giving myself sideways glances… Thank you for having my back and supporting me in stressful times my Angel. To showcase how grateful I am, you can make me a coffee tomorrow morning which I will graciously accept and drink while working on the next optimization. I will now go get my sleeping bag and act like I don’t mind sleeping on the couch. 

Amor fati

One Reply to “Lego Tantrums, Query Scandals, and My Fiancé’s Side Eye:”

Leave a Reply

Your email address will not be published. Required fields are marked *