DORA Metrics
"How good is our team's DevOps capability?" It's not easy to answer this question objectively. DORA metrics solve this problem. Based on data from thousands of teams worldwide, they enable objective comparisons.
DORA (DevOps Research and Assessment) is Google Cloud's DevOps research team. This team analyzes thousands of IT organizations worldwide every year and publishes the "State of DevOps Report." DORA metrics are the core result of this research.
DORA Metrics Overview
DORA metrics consist of four key indicators. Let's explore why each one is important.
- Deployment Frequency: Number of deployments to production. More frequent deployments mean faster value delivery to users.
- Lead Time for Changes: Time from commit to deployment. Shorter lead time means faster response to market changes.
- Change Failure Rate: Rate requiring rollback/hotfix. Lower rate means better quality and stability.
- Time to Restore: Time from incident to recovery. Shorter time minimizes user impact.
These four metrics balance each other. For example, trying to increase only deployment frequency might increase failure rate. DORA metrics measure both speed and stability to show true DevOps capability.
Performance Grade Criteria
Each metric is evaluated on a four-level scale: Elite, High, Medium, and Low. What grade is your team at?
Elite Grade
The highest performance grade, achieved by world-class DevOps organizations like Netflix and Google.
- Deployment Frequency: Multiple times a day, can deploy on demand
- Lead Time for Changes: Less than 1 hour
- Change Failure Rate: 0-15%
- Time to Restore: Less than 1 hour
Elite grade is achieved by only about 20% of teams. Reaching this level requires fully automated CI/CD, thorough testing, and a DevOps culture.
High Grade
The level achieved by most mature DevOps teams. This is an achievable goal.
- Deployment Frequency: Between once a day and once a week
- Lead Time for Changes: Between 1 day and 1 week
- Change Failure Rate: 16-30%
- Time to Restore: Less than 1 day
Medium Grade
The level typically associated with teams that have started their DevOps journey. Don't worry - it means there's a lot of room for improvement.
- Deployment Frequency: Between once a week and once a month
- Lead Time for Changes: Between 1 week and 1 month
- Change Failure Rate: 31-45%
- Time to Restore: Between 1 day and 1 week
Low Grade
A level where improvement is urgently needed. But with the right direction, you can improve quickly.
- Deployment Frequency: Less than once a month
- Lead Time for Changes: More than 1 month
- Change Failure Rate: 46% or more
- Time to Restore: More than 1 week
There's no need to be discouraged if you're at Low grade. Many traditional IT organizations start at this level. By focusing on automation and process improvement, you can quickly rise to Medium or High grade.
Viewing DORA Metrics
Let's learn how to check DORA metrics in KIWI.
Viewing from the Dashboard
- Access the [Dashboard] page
- Find the DORA Metrics Panel in the top right of the screen
- The current value and grade for each indicator are color-coded.
- You can quickly see which metrics are good and which need improvement.
Reading the Metrics Panel
- Current value: The most recently measured value. Check the specific numbers for details.
- Grade: Elite (purple), High (blue), Medium (yellow), or Low (red). Quickly identify status by color.
- Trend: Arrow showing improvement or decline versus the previous period. An upward arrow means improving.
- Details: Click to navigate to the time-series chart and analyze changes over time.
You can tell the status just by looking at colors. Purple and blue mean good status, yellow means improvement needed, red means immediate improvement needed.
Detailed Metrics
Let's take a closer look at what each metric measures and how you can improve it.
Deployment Frequency
Measures the number of times code has been successfully deployed to the production environment.
The more frequently you deploy, the faster you can deliver new features and bug fixes to users. Also, deploying frequently in small units makes it easier to find the cause when problems occur.
In KIWI, you can view deployment counts in daily, weekly, and monthly units.
How to improve:
-
Automate CI/CD pipelines
- Reduce deployment barriers by eliminating manual work
- Enable KIWI's Auto CI feature to automatically trigger builds when code is pushed.
-
Build a culture of deploying frequently in small units
- Split large features into smaller PRs
- The mindset "deploy often, don't wait for perfection" is important.
Lead Time for Changes
Measures the time from when a developer commits code to when it's deployed and running in production.
Shorter lead time means faster response to market changes. You can release new features before competitors.
Measurement span:
- Code commit: Developer pushes code to Git
- Build complete: Build and tests complete in the CI pipeline
- Deploy complete: Deployment to the production environment is complete.
How to improve:
-
Optimize build time
- Use Docker layer caching.
- Remove unnecessary dependencies
-
Run tests in parallel
- Running tests in parallel can significantly reduce CI time
-
Simplify approval processes
- Consider removing unnecessary deployment approval steps.
Change Failure Rate
The percentage of deployments that required a rollback, hotfix, or emergency patch.
High failure rate means poor user experience and wasted team time. Lowering failure rate allows the team to focus more on developing new features.
Cases counted as failures:
- Cases where a rollback was required due to a problem.
- Cases where a hotfix deployment was required to fix a bug
- Cases where an emergency patch was required due to an unexpected issue
Planned feature modifications or improvement deployments are not counted as failures.
How to improve:
-
Maintain test coverage at 80% or higher
- More automated tests means more bugs caught before deployment.
-
Thorough code reviews
- Configure so that merging isn't possible without reviewer approval.
-
Validate thoroughly in staging environment
- Test in staging for at least 1 day before production deployment.
-
Apply gradual deployments
- Use Canary or Blue-Green deployments to apply to a portion of traffic first
Time to Restore
The time from when a service incident occurs to full recovery.
Incidents can happen anytime. What matters is how quickly you recover. Long recovery times lead to user churn and revenue loss.
Measurement span:
- Incident detection: The monitoring system generates an alert
- Root cause analysis: Analyze logs and metrics to identify the cause.
- Recovery complete: The service returns to a normal state.
How to improve:
-
Strengthen monitoring and alerting
- The faster you detect incidents, the faster you can respond
-
Automate rollback procedures
- Prepare to roll back to the previous version with one button click.
-
Document incident response processes
- Pre-determining who does what reduces confusion.
-
Introduce Chaos Engineering
- Intentionally cause incidents during normal times to train recovery capabilities.
Trend Analysis
Looking at trends over time is more important than looking at a single number. Identify whether you're improving or worsening.
Using Time-Series Charts
You can view the trend of each metric over time using time-series charts.
- Daily: Identify short-term issues. Check for any sudden changes in recent days.
- Weekly: Pattern analysis. Look for issues concentrated on specific days (e.g., Friday deploy, Monday rollback).
- Monthly: Long-term trends. Evaluate overall improvement or worsening.
Review DORA metric trends together in weekly meetings. Checking "did we improve or worsen compared to last week" makes it easier to set improvement directions.
Benchmark Comparison
KIWI displays grades based on the State of DevOps Report research results from Google Cloud. This allows you to compare your team's position against IT organizations worldwide.
You can objectively answer the question "Is our team doing okay?" High grade means you're in the top 30%.
Real-World Usage Scenarios
Let's look at specific scenarios of how to actually improve DORA metrics.
Scenario 1: Deployment Frequency from Low to Medium Grade
Deploying 2-3 times a month, which is Low grade. The team has a perception that "deployment is scary" and "deployment is a big deal."
Improvement plan:
- Enable KIWI Auto CI: Auto build on code push eliminates manual work
- Split large features into small PRs: Enables frequent merging in small units.
- Complete code reviews within 24 hours: Eliminates bottlenecks.
- Minimize manual approval steps: Lowers deployment barriers.
Result: Deployment frequency increases to 3-4 times a week, achieving Medium grade.
Scenario 2: Change Failure Rate from 30% to Below 15%
30% of deployments require rollback, which is Medium grade. Every deployment is stressful, and rollbacks have become routine.
Improvement plan:
- Increase test coverage from 50% to 80%: Catch bugs before deployment.
- Make SAST/SCA scans mandatory: Block security vulnerabilities and bugs in advance.
- Validate at least 1 day in staging: Discover production issues in advance.
- Introduce Canary deployment: Validate with a portion of traffic first
Result: Rollback rate drops to 10%, achieving High grade.
Don't try to change everything at once. It's important to improve one thing at a time and progress gradually. Try improving one thing per 2-week sprint.
Best Practices
Here are best practices for improving DORA metrics, organized into three perspectives: team culture, technical, and process.
Team Culture
Good DevOps capability comes from culture more than tools.
-
Build a culture of deploying frequently in small increments
- "Deployment is a big deal" should become "deployment is routine"
-
Use failures as learning opportunities
- Rather than blaming someone, discuss "why the system didn't prevent this problem"
- Prevent recurrence through post-mortems.
-
Continuously invest in deployment automation
- Deployment becomes scary when there's a lot of manual work
Technical
The right technical investments are the foundation for improving DORA metrics.
- Optimize CI/CD pipelines to reduce build/deployment time
- Build automated tests to catch regression bugs early.
- Strengthen monitoring and alerting to quickly identify issues.
- Automate rollback procedures to reduce recovery time from incidents.
Process
Good processes produce consistent results.
- Simplify unnecessary deployment approval steps.
- Document incident response processes.
- Review DORA metrics regularly weekly or monthly.
Spend just 5 minutes in your weekly team meeting to check DORA metrics together. Checking "did we improve compared to last week?" every week leads to natural improvement.
Related Guides
- Dashboard Usage - Dashboard feature details.
- Notification Management - Real-time notifications.
- Auto CI Setup - Automated build/deploy setup