May 12, 2026
6 min read
Build Notes

Turning Scattered Market Data into Signal

The design principles behind market-intelligence pipelines: freshness, lineage, and locality, and why most market reports fail all three.

Michael Donovan
Michael DonovanAI Engineer · Founder · Automotive AI Platform Builder
Turning Scattered Market Data into Signal
Most dealers don't have an AI problem. They have a visibility problem.Vendors are happy to sell ten dashboards that never talk to each other. I have sat in your chair. I know which numbers move the needle and which ones just move invoices.The Signal is where I write down what actually works, what is vendor theater, and the plays I would run in your store this quarter. No buzzword salad. Just the field notes of someone who has carried a bag and shipped the code.

Most market reports are obituaries. They describe what the market looked like back when somebody collected the data, averaged across a geography too big to act on, with no way to check where a single number came from. Operators buy them, skim them, and file them, because nothing inside survives contact with Monday morning. Vandoko MIP, my market intelligence platform now in live beta at vandoko.ai, exists because I got tired of that. These are the design principles underneath it: how collection, geography, and analysis have to work if you want signal instead of a brochure. None of this is specific to automotive, but automotive is where I learned it, and it is the hardest market I know to fake.

Why market reports fail

Three failure modes kill almost every market report, and they are baked in at design time, not introduced by sloppy execution.

Stale. A report is a snapshot and a market is a stream. Pricing moves, inventory turns, competitors change their story week to week. By the time a quarterly PDF lands on a desk, the conditions it describes are gone. Acting on it is steering by the wake.

Unverifiable. Ask where a number came from and watch the room. Most reports cannot answer. No source, no capture date, no method. You are being asked to bet real money on figures you cannot trace, and that is not intelligence. That is rumor with a cover page.

Averaged into mush. A metro-level average is a number that is true everywhere and useful nowhere. Markets are won at street level, in the radius around a specific location, against the specific competitors inside that radius. Average across a metro and the variation you actually needed to see is the first casualty.

These failures persist because the incentives allow them. A report sold by the copy never has to survive a follow-up question. A platform that operators query every week does. Fixing the failures means designing for their opposites: freshness, lineage, and locality. Each one owns a layer of the pipeline.

Collection: freshness is a discipline

Data has a half-life. A price is good for days, inventory for less, a competitor claim until the next site update. Collection built as a one-time pull ignores the half-life and produces stale conclusions delivered with full confidence, which is worse than no conclusion at all, because someone will act on it.

Scrapey-Do is the collection engine inside MIP, and the principle it embodies is that collection is a discipline, not a scrape. Continuous, not occasional, because the market does not pause between quarterly reports. Every fact stamped with when it was captured and where it came from, because a fact without a capture time is a rumor with good posture. And collection kept separate from interpretation, because the engine's job is to gather and stamp, not to editorialize. Mixing the two is how bias slips in before anyone gets a chance to question it.

The unglamorous truth is that the collection layer sets the ceiling for everything above it. No analysis, however clever, can recover signal that was never captured or freshness that was never maintained. Garbage in is forever.

Geography: locality is not a filter

Most pipelines treat geography as a column. Collect everything, group by region at the end, ship the averages. That is backwards. The market that matters is the drive radius around a location, the specific competitors inside it, the roads people actually take. The unit of competition is local, so the unit of analysis has to be local, and that decision cannot be bolted on at the reporting stage.

On-the-Map is the geographic layer of MIP, and the principle is that geography is a first-class dimension, present from collection onward. What gets collected, what gets compared, and who even counts as a competitor are geographic questions before they are data questions. Two businesses forty miles apart can live in different worlds. A pipeline that cannot see that difference will hand both of them the same advice, and at least one of them will pay for it.

The test is simple. Take any metric in any report and ask for it at a tighter radius. If the number cannot be recomputed locally, geography was a decoration, not a dimension, and the report was never about your market.

Analysis: show your work

The analysis layer is where modern pipelines fall in love with themselves. A language model summarizing a pile of data produces fluent, confident prose whether or not the facts underneath support it. Fluency is not lineage. An analysis you cannot trace is mush with a good vocabulary.

Vandoko-AI is the analysis layer of MIP, and the rule it lives under is short: every claim has to answer two questions, says who, and as of when. Conclusions stay tied to the collected facts beneath them, facts that carry their timestamps and sources from Scrapey-Do, scoped to the geography On-the-Map defined. The job of analysis is not to sound smart. It is to compress fresh, verified, local facts into a decision someone can defend in front of a skeptical owner, and to survive the follow-up question.

Freshness pays off twice here. Analysis on stale data does not just repeat the staleness, it amplifies it, because a confident summary of old facts reads as current. The layer that sounds the most authoritative is exactly the layer that most needs receipts.

Freshness, lineage, locality in practice

Strip away the architecture and the three principles become three questions anyone can ask of any market report, including mine.

When was each data point collected? Not the report date, the capture date of the facts inside it. If nobody can answer, you are reading history.

Can I see the source for this number? Any number, your pick. If the trail dead-ends at trust us, every other number inherits that answer.

Can I get this at my radius? The same metric, recomputed for the geography you actually compete in. If it only exists at metro scale, it was never about you.

The questions work because they attack design decisions a pipeline cannot fake after the fact. Freshness is an architecture choice. Lineage is an architecture choice. Locality is an architecture choice. A vendor can redesign a slide deck overnight, but nobody retrofits provenance into a pipeline that never collected it.

A report that survives all three questions is rare. That rarity is the reason MIP exists, and the standard it has to hold itself to, because a market intelligence platform that cannot pass its own audit is just a faster way to print brochures.

The signal is the point

Vandoko MIP is in live beta at vandoko.ai, combining Scrapey-Do for collection, On-the-Map for geography, and Vandoko-AI for analysis, built on the principles above. It is one of several systems I have shipped on the same conviction: operators deserve numbers they can trace, fresh enough to act on, scoped to the market they actually fight in. The rest of the record is at /work. If your market reports keep failing the three questions, I built the alternative, and I am glad to walk you through it.