A Tale of Garbage Collection
The Innocent Dropdown That Was Eating Our Garbage Collector
There’s a particular kind of bug I love. The kind that’s been sitting in production for months, where two or three good engineers have already had a swing at it, and the verdict is always the same shrug: “the database must be slow during work hours.”
This is the story of one of those bugs.
The symptom
Legacy MVC5 application. A dossier search form, used all day every day by the backoffice team. Open the page, get a coffee, come back. Painful, but only between roughly 9 a.m. and 6 p.m. — outside work hours, the page was fine.
The team had spent weeks on it. Every query plan had been inspected. Every
index reviewed. Every Include() audited for over-fetching. The DBA had
been consulted. Nothing.
When it landed on my desk, I refused to play the guessing game. Guessing is how you spend two weeks adding indexes to columns that were never the problem. I opened my profiler and hit record.
What the profiler actually said
The hot path wasn’t a query. It wasn’t EF. It wasn’t the network.
It was a dropdown.
One of those innocent little <select> controls at the top of the form,
populated with about 1,500 dossiers. The previous developer had built it
the most direct way imaginable: take a string template, do string.Replace
to inject the id, do another string.Replace to inject a custom color
based on the dossier’s kind, then += the result onto the growing HTML
string. Repeat 1,500 times. Per request.
It looked like this, give or take:
string html = "<select>";
foreach (var d in dossiers)
{
string option = template
.Replace("__ID__", d.Id.ToString())
.Replace("__COLOR__", ColorFor(d.Kind))
.Replace("__LABEL__", d.Label);
html += option;
}
html += "</select>";Every Replace returns a brand new string. Every += allocates a new
string and copies the old one. For 1,500 options that’s something like
6,000 throwaway strings per render, many of them large enough and
long-lived enough to survive Gen 0 and get promoted.
Off-hours, with a couple of users, the GC absorbed it. During work hours, with thirty users hammering the search form, the heap couldn’t drain fast enough. Allocations piled up, objects got promoted to Gen 2, and Gen 2 collections — which are the expensive ones — started firing constantly. The page wasn’t slow because of SQL. It was slow because the GC was spending most of its time mopping up after this one method.
When old people tell you not to construct large strings with concatenation, listen.
The fix
The minimum fix is boring: build into a StringBuilder instead of +=,
and write the values directly with Append instead of substituting them
with Replace. That alone kills the quadratic copy and the throwaway
intermediate strings — Gen 2 collections drop to nearly zero, and you
didn’t add a single dependency.
var sb = new StringBuilder(dossiers.Length * 80);
sb.Append("<select>");
foreach (var d in dossiers)
{
sb.Append("<option value=\"").Append(d.Id)
.Append("\" style=\"color:").Append(ColorFor(d.Kind))
.Append(";\">").Append(d.Label)
.Append("</option>");
}
sb.Append("</select>");I shipped a different fix. I rebuilt the dropdown with HtmlAgilityPack:
a real DOM, attributes set directly, the document serialized once. It
stops you worrying about a label that happens to contain a quote or an
angle bracket silently producing broken HTML — for a control rendered on
every page load by every user, the correctness was worth the dependency.
Here is what the repro prints on my laptop, 1500 options, 1000 iterations, 16 concurrent renders:
[SLOW: string concat + Replace in loop]
elapsed : 28914 ms
allocated : 203,627.4 MB
GC Gen0/1/2 : 18083 / 14659 / 14593
[FAST : StringBuilder, no Replace]
elapsed : 120 ms
allocated : 546.3 MB
GC Gen0/1/2 : 49 / 44 / 39
[OK: HtmlAgilityPack]
elapsed : 3138 ms
allocated : 2,470.4 MB
GC Gen0/1/2 : 390 / 198 / 6A few things worth pausing on. The pathological version allocates
200 GB over the run and triggers fourteen thousand Gen 2
collections — that’s the GC working full-time, which is exactly what you
felt as page slowness. StringBuilder is the runaway winner on raw
throughput: 220× faster and 370× less allocated than the original, and
faster than the DOM library too. HtmlAgilityPack allocates more bytes
but funnels them through short-lived DOM nodes that die in Gen 0, so its
Gen 2 count is the lowest of the three — only 6.
So which is “best” depends on which axis you care about. On every axis, both are an enormous improvement over the original.
Reproducing it
You can reproduce this on whatever you’ve got — Windows, macOS, Linux. The .NET diagnostic tools ship cross-platform and they’re all you need:
dotnet-counters— live view ofalloc-rate,gen-2-gc-count, and% time in GC. This alone would have caught the bug.dotnet-trace— sampling profiler. Export to speedscope, open the JSON at speedscope.app, andString.Replacelights up like a Christmas tree.dotnet-gcdump— heap snapshot, perfect for proving Gen 2 promotion.- PerfView (Windows) or BenchmarkDotNet (anywhere) for deeper analysis.
- And of course, a commercial profiler like dotTrace or ANTS if you prefer a GUI — they all show the same thing.
I put together a minimal repro in the companion repo: three versions of
the same dropdown render — the pathological original, the StringBuilder
fix, and the HtmlAgilityPack version — with allocation and GC counters
printed at the end. Run it with dotnet run -c Release. The gap between
the first row and the other two is impossible to miss; which of the two
fixes wins on which metric is its own little surprise.
The lesson, again
The team had been looking under the database streetlight because that’s where bugs usually are. They weren’t wrong to look there. They were wrong to keep looking there after the first day.
A profiler doesn’t have opinions. It doesn’t know which code is “supposed to” be slow. It just tells you where the time went. Spend the ten minutes to attach one, and it will save you the two weeks of guessing.
Don’t summon magic. Profile.