I watched DynamoDB change under my conformance suite
My DynamoDB conformance suite went red this week, and the test that failed was real AWS.
That needs explaining. The suite runs a few hundred tests against real DynamoDB first, records exactly what the real thing does, then runs the same tests against the emulators people reach for locally - DynamoDB Local, LocalStack, Dynalite, my own engine - and scores how closely each one matches. Real DynamoDB is the reference. It sits at 100% by definition, because it's the thing everything else is measured against. So when the reference itself fails a test, something's up: either my test is wrong, or the real thing has moved underneath it.
It was the second one. Real DynamoDB had changed how it words a chunk of its validation errors. The empty-table-name case used to come back as Value '' at 'tableName' failed to satisfy constraint...; now it's Value at 'TableName'... - the echoed value gone, the field name in PascalCase. A pile of other bespoke messages picked up a 1 validation error detected: envelope they didn't have before. And PutItem with a { NULL: false } attribute, which used to be rejected outright, is now accepted and quietly stored as { NULL: true }.
None of that is written down anywhere. DynamoDB's exact error strings aren't part of any published contract. The only way to know what they are is to ask the service and record what it says, which is exactly what the suite does - so it noticed within a week.
Here's the part that caught me out. I'd assumed a change like this lands everywhere at once, so I fired the same inputs at four regions to be sure. It doesn't. eu-west-2 and eu-central-1 return the new wording. us-east-1 and ap-southeast-2 still return the old. Two regions on one side of the line, two on the other.
Most of that split is cosmetic - the wording moved, not the behaviour. The exception is { NULL: false }: two regions now accept a write the other two still reject. That one isn't the error text reading differently, it's the service taking different input depending on where you call it, and it's what makes "which region" change the answer rather than just the message.
So which is it - has AWS changed DynamoDB, or do these regions just behave differently? I think it's the first, and that the regional split is the change still in motion rather than a permanent fact of geography. Two things point that way. The two regions that moved moved in exactly the same way - same dropped value, same envelope, same { NULL: false } behaviour - which reads like one upstream change propagating outward, not two regions independently drifting apart. And AWS has no reason to deliberately fork European validation from everyone else's. The simplest story that fits the evidence is a staged rollout that's reached Europe and not yet reached Virginia or Sydney.
I can't prove that from the outside. I won't know the laggards have caught up until they do, and a regional fault that gets quietly reverted would look much the same from here. But "a change rolling out region by region" fits better than anything else, so that's how I'm reading it. Which makes the durable point not "DynamoDB differs by region" - if I'm right, that sorts itself out - but "DynamoDB's validation moves, it's undocumented, and the only way to know where it's got to is to ask each region directly."
That points at something I'd got wrong in the suite. I'd pinned the exact prose - the full error string, word for word - and the exact prose was never part of any contract AWS owes me. The upstream Smithy model says which constraints exist: this field is required, that one has a minimum length. It says nothing about how the resulting error reads. I was asserting on the one part AWS is free to change without telling anyone, and then acting surprised when it did.
So the direction I'm taking the suite is to assert the contract and stop pinning the wording. Check that the right exception type comes back, against the right field, for the right violated constraint - the things the model actually promises - and let the surrounding prose vary. Pin less, and the next reword stops being a CI failure and goes back to being the non-event it always should have been.
There's a temptation, having seen this, to build a standing "regional drift" monitor: run the strict checks against a row of regions on a schedule, light up whenever they disagree. I've decided not to, at least not yet. If this really is a rollout, the divergence trends to zero the moment it finishes everywhere, and I'd be left maintaining a view that watches nothing most of the time. So I took a one-off snapshot instead - that's how I know the spread is two and two - and I'll build the live view only if drift turns out to be a recurring pattern rather than a single event I've already characterised. No sense turning one honest number into a dozen confusing ones, or standing up infrastructure for something I've already had a good look at.
The smaller, sharper upshot is about the number the suite reports. It's always really been "conformance to DynamoDB in eu-west-2", not "conformance to DynamoDB". I'd quietly treated those as the same thing. They aren't, and the fix is to make the suite say which region it's measuring rather than imply a single global DynamoDB that doesn't exist.
If you've ever written a test that asserts on a DynamoDB error message, or leaned on one in your own code, this is worth sitting with. The string you pinned might be true in your region and not the next one over, and it might not stay true in yours for much longer. Mine changed under me.
The results are at paritysuite.org, and the suite's open source if you want to point it at your own region and see what comes back.