Sales Data: What to Collect, What to Trust, and What to Act On

Every revenue team I have worked with believes it has a data problem. Almost none of them have the data problem they think they have. They assume the issue is that they are not collecting enough. The real issue is that they collect far too much, trust almost none of it, and feed the whole murky pile into a forecast anyway.

Sales data is the recorded set of facts about your pipeline and deals: stage, amount, close date, age, owner, source, and the outcomes that follow. Useful sales data is captured the same way every time, anchored to a real deal event, and trustworthy enough to act on without a second phone call to confirm it. By that definition, most of what lives in a CRM is not useful sales data. It is data that exists. Those are not the same thing, and confusing them is where forecasts go to die.

I have built revenue and forecast models for B2B SaaS companies for two decades, and the failure is always the same shape. The CRM is full, the reports run, the dashboard renders, and the forecast it produces is precise about numbers that were never real. So let me make the claim I will defend for the rest of this guide: collecting more sales data makes your forecast worse, not better, until you have fixed what the data you already have is worth.

Collect less, trust more

The instinct when a forecast misses is to add fields. Capture more, the thinking goes, and the model will have more to work with. This is exactly backwards, and it is the contrarian point I will stand behind.

Every required field is a tax on the person entering it. A rep closing deals does not want to fill in fourteen properties, so when you require fourteen, they fill the three that gate the save and fake the rest. The fields you added to get richer data degrade the fields you already depended on, because attention spent satisfying a required dropdown is attention not spent getting the close date right. More fields, lower quality, across the board.

The teams with the most trustworthy sales data are not the ones capturing the most. They are the ones capturing the least they can act on, and capturing it cleanly. A short list of fields that reps actually maintain beats a long list half of them ignore. When I sit down with a sales leader whose forecast keeps missing, the first move is almost never to add data. It is to find the handful of fields the forecast truly rests on and make those few unimpeachable, then delete most of the rest so the maintained fields stop competing for attention with fields nobody reads.

Comparing tools is the easy part

The hard part is knowing which one will actually make your forecast land. ORM builds a custom model on your live pipeline and tells your team what to change, not just what happened.

See how ORM works or open the Forecast Accuracy Scorecard →

The fields that earn their place

Not all sales data is equal, and the divide that matters is leading versus lagging. A lagging field records an outcome after it is decided. A leading field records an input that moves while the deal is still open, which means it is the only kind you can act on in time to change anything.

Here is where the common fields land once you sort them that way, and what each one is actually for.

Field	Type	What it is for	Collect?
Stage entry date	Leading	The timestamp everything else derives from. Lets you measure age in stage and stage conversion.	Yes, automate it
Deal age	Leading	A deal stuck past its normal stage duration is the earliest sign it is slipping or dead.	Yes, derived
Close date and its change history	Leading	A close date that keeps sliding is a forecast risk no snapshot reveals. The history is the signal.	Yes, track changes
Next-step date	Leading	A deal with no scheduled next step is not really in the pipeline, whatever the stage says.	Yes
Amount	Leading	Weights the pipeline. Worthless if it is a placeholder, so require evidence behind it.	Yes, with rigor
Deal source	Leading	Lets you cut conversion by where deals come from, which is where the real differences hide.	Yes
Win or loss reason	Lagging	Teaches the model why deals convert, but only if the values are disciplined, not free text.	Yes, controlled values
Closed revenue	Lagging	The outcome the forecast is judged against. Essential, but it confirms rather than warns.	Yes
Raw activity counts	Neither useful	Calls and emails logged. Motion, not progress, and rarely tied to any decision.	No, or do not require

Read the table by the right-hand column, not the left. The point is not that activity logging is evil. It is that a field you do not act on costs you twice: once in entry friction and once in the trust it borrows from the fields you do act on. The leading set up top is where a forecast finds its early warning. The lagging set is the scorekeeping. The bottom row is the stuff that fills a record and changes nothing, and it is usually the first thing a team over-collects because it is the easiest to capture.

For the wider operating set these fields roll into, the 22 sales operations metrics guide covers the full panel, and the broader sales metrics breakdown shows how the fields become measures.

The DRIFT test for sales data quality

Knowing which fields to keep is half the job. The other half is knowing whether the data in them can be trusted, and "looks complete" is not the same as "is true." A field can be 100 percent populated and 100 percent fictional. So I run every important field through five questions before I let it near a forecast. I call it the DRIFT test, and the name is the point, because untrustworthy data drifts away from reality quietly while the record stays full.

Dated. Does the field carry a timestamp, so you can see movement and not just a current state? A stage with no entry date tells you where a deal is. A stage with an entry date tells you whether it is stuck. Reconciled. Does the field agree with the others on the same record? A deal in late stage with a close date in the past and no next step is not a real late-stage deal. Cross-field contradictions are how you find the records that lie. Independent. Is the value set by evidence or by the rep's mood? An amount typed from optimism and a stage advanced because a call went well are both independent of reality. Tie the field to a deal event a manager could verify. Fresh. When was it last touched? A close date last updated sixty days ago in a ninety-day cycle is stale by definition, and a stale field in a forecast is worse than a missing one, because the model treats it as current. Tied to an action. If this field moved, would anyone do anything? A field nobody acts on will not be maintained, no matter how strict the requirement, so it will rot and pollute everything computed from it.

A field that passes all five is forecast-grade. A field that fails two or more is decoration, and you should either fix the process behind it or stop pretending it means something. Most CRMs I open are full of fields that fail Fresh and Tied, which is precisely why the forecast built on them keeps missing. Cleaning the process is more sales process optimization than data work, because data quality is a downstream symptom of how the deal motion is run.

A worked example: when the data lies and the forecast inherits it

Numbers below are illustrative, not a benchmark. They are chosen to show the mechanism.

Lattimore Cloud is a mid-market B2B SaaS company. Its CRM is, by every completeness report, in great shape. Required fields are filled on 98 percent of open deals. The dashboard is green. The forecast for the quarter calls for 4.2M in new ARR, and leadership presents it to the board with confidence, because the data behind it is, after all, complete.

Run the open pipeline through the DRIFT test and the picture changes fast.

A cluster of enterprise deals worth roughly 1.4M sits in the proposal stage. They pass completeness. They fail Fresh and fail Reconciled. Their close dates were set at deal creation and never updated, several now sit in the past, and not one carries a next-step date. These are not late-stage deals. They are deals that stalled, kept their optimistic stage out of inertia, and were never demoted because no field forced the question. The forecast counted them at full proposal-stage weight.

A second pattern shows up under Independent. Amounts across the commercial segment skew high because reps enter the aspirational number, the one before procurement negotiates it down. Historically those deals close at about 80 percent of entered amount, but the forecast took the entered figure at face value, inflating the whole segment.

So the 4.2M was never real. It was complete data describing a pipeline that did not exist. Re-weight the stalled enterprise cluster to its true probability, mark the commercial amounts to historical realized value, and the honest number lands closer to 3.1M. The gap was not a modeling error. The model did its job perfectly on data that lied to it. This is what people miss about forecast accuracy: only 7% of companies achieve 90%+ forecast accuracy (Gartner), and most of the misses are not bad math. They are good math run on undated, unreconciled, stale inputs that looked complete enough to trust.

Catch it in week two, and Lattimore reworks the stalled deals while there is still a quarter to save them. Catch it at close, and it is a postmortem. The data was the same in both cases. The difference was whether anyone tested it before the forecast did.

Turning trustworthy data into a forecast

Once the fields are sorted and the trustworthy ones are isolated, the forecast is the easy part, because the hard part was never the math. It is the data underneath.

The sequence is the same every time. Start from the leading fields that passed DRIFT, weight open pipeline by stage conversion rates pulled from your own history rather than rep confidence, and reconcile the weighted result against what actually closed in comparable past quarters. Then, and this is the step teams skip, inspect by segment. A blended forecast hides exactly the kind of single-segment rot that sank Lattimore's enterprise cluster, the same way a blended win rate hides a collapsing segment behind a healthy average.

Two failures recur here. The first is forecasting from lagging data alone, building the number off closed-won history while ignoring the leading fields that show this quarter's pipeline is aging faster than last quarter's did. That produces a forecast that is accurate about the past and blind to the present, which matters more now that sales cycles have lengthened 22% since 2022 (Digital Bloom, 2025), so this quarter's deals convert slower than your history assumes. The second is trusting stage weights set by habit. If your CRM says proposal-stage deals close at 70 percent because someone configured that number in 2021, but your actual realized rate is 45 percent, every forecast is 25 points optimistic before a single deal moves. Derive the weights from data, re-derive them as the motion changes, and never let a default stand in for evidence.

This is also why data quality and forecasting are not two projects. They are one. You do not clean the data and then forecast. The act of forecasting is what surfaces which data was never trustworthy, and the discipline of demanding forecast-grade fields is what keeps the data clean. The full sales forecasting complete guide walks the modeling end to end, and the forecast accuracy guide covers how to measure whether the number you produced is holding up.

Fix the inputs before you blame the model

If you take one thing from this, make it the order of operations. When a forecast misses, the room reaches for the model, the methodology, the tool. Almost always, the model was fine. It faithfully reported the future implied by undated, unreconciled, optimistic inputs that happened to be complete enough to look real.

So before you re-platform your forecasting stack or hire someone to rebuild the model, run your own fields through DRIFT and ask which of them you would actually bet the quarter on. Most teams find that three or four fields carry the entire forecast, and that those few are exactly the ones nobody guards. Make those unimpeachable, delete the noise competing for the rep's attention, and the forecast you already had gets dramatically more honest without a single new field. The reason a stale close date or an aspirational amount is so dangerous is that it never announces itself in a same-snapshot report. It only surfaces when every open deal is reconciled against history and inspected live, which is the kind of forecast model ORM builds, so the field that was quietly lying gets caught while the quarter is still yours to fix.

Free tools

Put this into practice

Forecast Accuracy Scorecard

Score your forecast against benchmarks by quarter.

Open the tool →

Pipeline Velocity Calculator

Calculate daily, monthly, and quarterly velocity.

Open the tool →

Frequently Asked Questions

What is sales data?

Sales data is the recorded facts about your pipeline and deals: stage, amount, close date, age, owner, source, and the outcomes that follow them. Useful sales data is captured the same way every time, tied to a real deal event, and trustworthy enough to act on. Most of what sits in a CRM is recorded but not trusted, which is a different thing entirely.

Why is sales data quality important for forecasting?

A forecast is a function of the data you feed it, so it inherits every flaw in that data. If close dates slip without being updated, amounts are guesses, and stages are set by habit rather than evidence, the forecast is precise about numbers that were never real. You cannot model your way out of bad inputs. Garbage in, confident garbage out.

What is the difference between leading and lagging sales data?

Lagging sales data records an outcome after it happened, like closed revenue or final win rate. Leading sales data records an input that moves before the outcome, like stage entry dates, deal age, and next-step recency. Leading fields give you time to act while a deal is still live. Lagging fields confirm what already happened.

Which sales data should you actually collect?

Collect the fields you will act on and the timestamps that let you measure movement: stage and the date each stage was entered, amount, close date and its change history, deal source, and next-step date. Skip fields nobody acts on. Every field you require that no one uses adds entry friction and quietly degrades the fields that matter.

How do you turn sales data into a forecast?

Start from trusted fields, weight open pipeline by evidence-based stage conversion rather than rep optimism, reconcile the result against historical actuals, and inspect by segment. The forecast is only as good as the stage dates, amounts, and conversion history underneath it, so data quality is the first step of forecasting, not a separate cleanup project.

What is a vanity field in sales data?

A vanity field is sales data you collect because it is easy, not because it changes a decision. Raw activity logs, lead counts with no conversion attached, and free-text fields nobody queries are the usual ones. They fill the record and flatter the reporting without ever moving a forecast or a deal.

Pete Furseth

ORM Technologies

Pete has built custom revenue forecast models for B2B SaaS companies for over a decade.

June only

Five free days of implementation

Start with ORM before the end of June and your first five days of implementation are free. We build your forecast model on your live pipeline, then you decide.

Claim your five days

Sales Data: What to Collect, What to Trust, and What to Act On

Collect less, trust more

The fields that earn their place

The DRIFT test for sales data quality

A worked example: when the data lies and the forecast inherits it

Turning trustworthy data into a forecast

Fix the inputs before you blame the model

Frequently Asked Questions

Five free days of implementation

Five free days of implementation.

Get the RevOps Playbook.

Sales Data: What to Collect, What to Trust, and What to Act On

Collect less, trust more

The fields that earn their place

The DRIFT test for sales data quality

A worked example: when the data lies and the forecast inherits it

Turning trustworthy data into a forecast

Fix the inputs before you blame the model

Frequently Asked Questions

Five free days of implementation

Related Reading

Explore ORM