As cybersecurity professionals, we often hear about the importance of metrics and how they can help measure the effectiveness of our efforts and help create more mature security operations. However, it is important to remember that not all metrics are created equal, and some, if not most of the widely shared ones, can even be harmful to security teams if used incorrectly and help spread burnout at high rates. This is why it is important to take a step back and carefully consider the metrics that we use to evaluate the performance of our security teams. I am of the opinion that most organizations fail to use metrics to their own full advantage when it comes to managing their cybersecurity operations team(s).
After a very enriching Twitter thread created by the great Taz Wake, I decided to do a bit more research and came across this site, where the Gitlab security team shares some of their performance indicators and metrics. I immediately felt compelled to write a blog post, as it is clear that they have put a lot of thought into the metrics they are using. It became apparent to me that instead of simply just focusing on more traditional or sadly popular metrics such as the the number of vulnerabilities discovered, or the time it takes to fix or respond to a specific kind of cybersecurity incident, the Gitlab security team has chosen to move beyond popular time-to-action approaches and measure other factors that can impact the quality of life of their team members and have a wider impact in the maturity of their cybersecurity efforts.
For example, the Gitlab security team has included metrics such as team member retention percentage, average age of open positions, and on-call engagement volume. By focusing on these metrics, they are able to get a better sense of the overall health and well-being of their team, which is an important and often overlooked factor in ensuring a team’s success. Another metric that stands out is the promotion rate for team members. By tracking this metric, the Gitlab security team is able to see how well their team members are progressing in their careers and whether they are being given the opportunities they need to grow and develop within the company or team, being cognizant that talent retention is key to their mission. This is important because it can help identify any potential bottlenecks or barriers that may be preventing team members from advancing in their careers or simply leaving their organization to seek opportunities elsewhere, and allows the team to take action to address this issue before attrition becomes an issue.
In addition to metrics that impact the quality of life of cybersecurity team members, there are other metrics that can be useful in evaluating the performance of a team and provide very valuable information to leaders amd members within. For example, the Gitlab security team tracks the rate at which they are solving problems through automation, which we already identified as a great way to reduce burnout among cybersecurity professionals in a previous post. They also track the categories of security incidents they respond to, in order to ensure they are fairly distributed or to identify existing gaps if the load is uneven towards one or just a few categories in particular. Other metrics that can be helpful in evaluating the maturity of a cybersecurity team include the rate at which incidents deviate from existing playbooks or documentation, as well as the rate at which repeated incidents or alerts that are not actionable are seen. These metrics can help identify gaps in documentation and training, as well as potential tuning candidates to decrease the workload and reduce burnout among team members.
While metrics like meantime to containment or closure to an incident may seem like useful ways to measure the performance of a cybersecurity team, they can actually be quite harmful in the long run. This is because these metrics can promote a culture of speed and quick fixes, rather than focusing on long-term solutions and the maturity of the team. Pushing for quick containment or closure of incidents can lead to burnout among team members, who may feel pressure to rush through their work in order to meet these metrics. This can also lead to rushed and incorrect decisions being made, which can have serious consequences for the security of an organization. In addition, these metrics do not do much to help the long-term maturity of a cybersecurity team, and may even hinder their growth and development, given that they can promote speed over quality of work, hands-on training and development of documentation, or full understanding of a security incident.
In conclusion, it is clear that the Gitlab security team has taken a thoughtful and comprehensive approach to their use of metrics. By making many of their metrics public, they have shown their commitment to transparency and accountability. Additionally, their focus on metrics that measure the quality of life of their team members and the maturity of their team is a refreshing departure from the more traditional time-to-action metrics that are commonly used in the cybersecurity industry. It is important to remember that when developing metrics for cybersecurity teams, we should not just focus on what will impress boards or executives, or what will make our customers happy. We also need to consider the well-being and growth of our teams, and use metrics to help improve their quality of life and support their maturity. By taking this approach, we can create a more sustainable and effective cybersecurity workforce.
_______________________________________
These are my thoughts around metrics and why most organizations fail to use them to their full advantage. Did I miss any important point? Do you have any questions or comments? Let’s continue the conversation either here ,on Twitter @spapjh or on Mastodon at spapjh@infosec.exchange.
I think the “quality of life” metrics might actually be able to stop me from having a default “I hate metrics” position.
I like how you have summed up the issue about quick fixes. You’ve captured one of my biggest issues with metric-driven security in a straightforward way.
Thank you for sharing this post!