The Hidden Cost of "Collect Everything" Observability
The observability industry sold us on collecting everything. Now organizations are drowning in data and costs. Here's what we learned and how to fix it.
"Collect everything - you never know what you'll need." This advice, repeated by observability vendors for years, has created a crisis. Organizations are paying millions for data they never query, while the truly important signals get lost in the noise. It's time to rethink this approach.
How We Got Here
The "collect everything" philosophy emerged from genuine problems:
- Production debugging often required data that wasn't collected
- Hindsight made it clear what would have been useful
- Storage seemed cheap compared to engineering time
- Vendors had every incentive to encourage more data
The advice made sense at smaller scale. At modern scale, it's a disaster.
The True Cost
Direct Financial Cost
Typical enterprise observability spend:
Log management: $150,000/month
APM/Tracing: $80,000/month
Metrics: $40,000/month
Total: $270,000/month
$3.2M/year
What's actually useful:
Critical data: ~20% of volume
Occasionally useful: ~30% of volume
Never queried: ~50% of volume
You're paying $1.6M/year for data nobody looks at.
Hidden Costs
- Query performance - More data = slower searches
- Alert noise - More data = more false positives
- Cognitive overload - Engineers can't find signal in noise
- Infrastructure - Network, compute, storage for useless data
Opportunity Cost
That $1.6M/year could fund:
- 10 additional engineers
- Better tooling
- Actual product improvements
- Customer success initiatives
Why Vendors Won't Tell You This
Observability vendors have misaligned incentives:
- Revenue based on data volume
- No penalty for collecting useless data
- Fear-based marketing ("you might need it!")
- Upselling more features to handle more data
Vendor perspective:
Customer collects 100GB/day → $10,000/month
Customer collects 10GB/day → $1,000/month
Which does the vendor prefer?
What Actually Matters
The 80/20 Rule of Observability
80% of debugging value comes from 20% of telemetry:
High Value (always collect):
- Errors and exceptions with stack traces
- Request failures and their context
- Performance outliers (p99 latency)
- Security events
- Business transactions (orders, payments)
Medium Value (sample or summarize):
- Successful request details
- Detailed traces for normal operations
- Fine-grained metrics
- Verbose framework output
Low Value (filter aggressively):
- Debug logs in production
- Health check responses
- Static asset requests
- Duplicate events
- Framework internal logging
A Smarter Approach
Collect with Purpose
Before adding telemetry, ask:
1. What question will this data answer?
2. How often will we query it?
3. What's the cost to collect vs. value?
4. Can we derive this from other data?
Implement Tiered Collection
Tier 1: Always collect (100%)
- Errors, security, business-critical
Tier 2: Sample (10-25%)
- Successful operations, detailed traces
Tier 3: On-demand only
- Debug logging, enabled when investigating
Use Dynamic Verbosity
Normal state:
Log level: warning
Trace sampling: 5%
Metrics: aggregated
Incident state (auto-detected):
Log level: debug
Trace sampling: 100%
Metrics: high-resolution
Saves 80% normally, full visibility when needed.
Real-World Transformation
Case Study: SaaS Company
Before:
Daily log volume: 500 GB
Monthly cost: $45,000
Query response time: 30-60 seconds
False positive alerts: 50/day
After intelligent collection:
Daily log volume: 75 GB
Monthly cost: $8,000
Query response time: 2-5 seconds
False positive alerts: 5/day
Savings: 82%
Visibility: Improved (less noise)
How to Start Reducing
Step 1: Audit Current Data
For each log source:
□ Query frequency (how often searched?)
□ Query patterns (what do people look for?)
□ Volume contribution (% of total)
□ Cost contribution (% of bill)
Step 2: Identify Candidates
Prime reduction candidates:
- Sources never queried in 30 days
- Debug/trace level logs
- High-volume, low-value sources
- Redundant data (collected multiple ways)
Step 3: Implement Gradually
Week 1: Filter obvious noise (health checks, etc.)
Week 2: Implement sampling for high-volume sources
Week 3: Reduce retention for low-value data
Week 4: Review and adjust
Step 4: Monitor the Reduction
Track:
- Cost reduction achieved
- Query patterns (are people missing data?)
- Incident investigation times
- Alert quality
Fighting the Fear
The biggest obstacle to data reduction is fear:
"What if we need it?"
Responses:
- In 3 years of "collect everything," how often did you use old debug logs?
- Can you reconstruct from other sources if truly needed?
- Is the fear worth $1M+/year?
- Can you enable verbose logging on-demand?
401 Clicks Approach
401 Clicks is built for intentional observability:
- Predictable pricing that doesn't punish reasonable collection
- Built-in filtering for common noise patterns
- Recommendations for what to keep vs. filter
- Cost transparency by source
The New Observability Mindset
Old thinking:
"Collect everything, figure it out later"
"Storage is cheap"
"You never know what you'll need"
New thinking:
"Collect what answers questions"
"Attention is expensive"
"Know what you need before you collect"
Conclusion
The "collect everything" era is ending. Organizations are realizing that more data doesn't mean better observability - often it means worse. The signal gets lost in the noise, the bills grow unsustainably, and engineers spend more time searching than investigating.
The future is intentional observability: collecting what matters, filtering what doesn't, and having the discipline to say "we don't need that." Start your audit today. Your budget and your on-call engineers will thank you.
Admin
Published on January 13, 2026