Evidence-based Methods in Cyber Security Research PDF 6,000Kb
We've given this keynote at the first Annual UK Cyber Security PhD Winter School beginning of 2020. It considers the state-of-play of cyber security research, especially considering our investigation into cyber security user studies and offers recommendations for each area under consideration.
Abstract. In recent years, cyber security security user studies have been appraised in meta-research, mostly focusing on the completeness of their statistical inferences and the fidelity of their statistical reporting. However, estimates of the field's distribution of statistical power and its publication bias have not received much attention. We aim to estimate the effect sizes and their standard errors present as well as the implications on statistical power and publication bias. We built upon a published systematic literature review o 146 user studies in cyber security (2006--2016). We took into account 431 statistical inferences including t-, χ2,r-, one-way F-tests, and Z-tests. In addition, we coded the corresponding total sample sizes, group sizes and test families. Given these data, we established the observed effect sizes and evaluated the overall publication bias. We further computed the statistical power vis-à-vis of parametrized population thresholds to gain unbiased estimates of the power distribution. Results. We obtained a distribution of effect sizes and their conversion into comparable log odds ratios together with their standard errors. We further gained insights into key biases such as the publication bias and the winner's curse.
Note. The technical report appeared as Thomas Groß. Statistical Reliability of 10 Years of Cyber Security User Studies. arXiv:2010.02117, 2020.
The definitive version of this work is published as Thomas Groß. Statistical Reliability of 10 Years of Cyber Security User Studies. In Proceedings of the 10th International Workshop on Socio-Technical Aspects in Security (STAST'2020), LNCS, 11739, Springer Verlag, 2020.
Abstract. Studies in socio-technical aspects of security often rely on user studies and statistical inferences on investigated relations to make their case. To ascertain this capacity, we investigated the reporting fidelity of security user studies. Based on a systematic literature review of 114 user studies in cyber security from selected venues in the 10 years 2006-2016, we evaluated fidelity of the reporting of 1775 statistical inferences. We conducted a systematic classification of incomplete reporting, reporting inconsistencies and decision errors, leading to multinomial logistic regression (MLR). We found that half the cyber security user studies considered reported incomplete results, in stark difference to comparable results in a field of psychology. Our MLR on analysis outcomes yielded a slight increase of likelihood of incomplete tests over time, while SOUPS yielded a few percent greater likelihood to report statistics correctly than other venues. In this study, we offer the first fully quantitative analysis of the state-of-play of socio-technical studies in security.
Note. This technical report appeared as Thomas Groß. Fidelity of Statistical Reporting in 10 Years of Cyber Security User Studies. arXiv:2004.06672, 2020.
The definitive version of this work is published as Thomas Groß. Fidelity of Statistical Reporting in 10 Years of Cyber Security User Studies. In Proceedings of the 9th International Workshop on Socio-Technical Aspects in Security (STAST'2019), LNCS 11739, Springer Verlag, 2020, pp. 1-24
Abstract. In the recent years, there has been a movement to strengthen evidence-based methods in cyber security under the flag of “science of security.” It is therefore an opportune time to take stock of the state-of-play of the field. We evaluated the state-of-play of evidence-based methods in cyber security user studies. We conducted a systematic literature review study of cyber security user studies from relevant venues in the years 2006-2016. We established a qualitative coding of the included sample papers with an a priori codebook of 9 indicators of reporting completeness. We further extracted effect sizes for papers with parametric tests on differences between means for a quantitative analysis of effect size distribution and post-hoc power. Results. We observed that only 21% of studies replicated existing methods while 78% provided the documentation to enable future replication. With respect to internal validity, we found that only 24% provided operationalization of research questions and hypotheses. We observed that reporting did largely not adhere to APA guidelines as relevant reporting standard: only 6% provided comprehensive reporting of results that would support meta-analysis. We, further, noticed a considerable reliance on p-value significance, where only 1% of the studies provided effect size estimates. Of the tests selected for quantitative analysis, 80% reported a trivial to small effect, while only 28% had post-hoc power > 80%. Only 16% were still statistically significant after Bonferroni correction for the multiple-comparisons made. This study offers a first evidence-based reflection on the state-of-play in the field and indicates areas that could help advancing the field’s research methodology.
Note. Kovila Coopamootoo and Thomas Groß. A Systematic Evaluation of Evidence-Based Methods in Cyber Security User Studies. Newcastle University Technical Report TR-1528, 2019