Probabilistic Canary Analysis with Bayesian Hierarchical Models in Continuous Delivery

Srikanth Gorle; Naveen Kumar Siripuram

Authors

Srikanth Gorle Foot Locker, USA Author
Naveen Kumar Siripuram CVS Health, USA Author

Keywords:

Bayesian inference, hierarchical modeling, canary analysis, continuous delivery, service-level indicators, rollout automation

Abstract

Bayesian hierarchical modelling is used to construct a probabilistic framework for automated canary analysis. Enhance continuous delivery pipeline decision-making. Static thresholds and basic statistical testing are used in traditional canary release systems. These methods neglect uncertainty, handle noise poorly, and may overlook tiny regressions or create false positives. We circumvent these concerns by employing a Bayesian posterior updating system to identify performance decrease based on latency percentiles, error rates, and availability measures. A hierarchical structure enables microservices learn from one other and monitor their operations. The system was tested with real-world ecommerce workloads including payment and OMS. Bayesian analysis cuts false alarms by 40%, accelerates rollback for mild regressions, and simplifies sprint-cadence promotion gate configuration. This article recommends Bayesian inference for data-driven progressive delivery.

Downloads

Download data is not yet available.

References

A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin, Bayesian Data Analysis, 3rd ed. Boca Raton, FL, USA: CRC Press, 2013.

R. Neal, “MCMC using Hamiltonian dynamics,” in Handbook of Markov Chain Monte Carlo, vol. 2, S. Brooks et al., Eds., Boca Raton, FL, USA: CRC Press, 2011, pp. 113–162.

M. Hoffman, D. Blei, C. Wang, and J. Paisley, “Stochastic variational inference,” J. Mach. Learn. Res., vol. 14, no. 5, pp. 1303–1347, May 2013.

B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, Omega, and Kubernetes,” Commun. ACM, vol. 59, no. 5, pp. 50–57, May 2016.

A. Gubarev and A. Pandey, “Automated Canary Analysis using Machine Learning,” in Proc. IEEE/ACM Int. Conf. Softw. Eng. (ICSE), Montréal, QC, Canada, 2019.

J. Wickham and H. Kaur, “Canarying Releases at Google,” in Site Reliability Engineering: How Google Runs Production Systems, B. Beyer et al., Eds. Sebastopol, CA, USA: O’Reilly Media, 2016, ch. 27.

B. S. Glaser, A. Atreya, and P. Jamshidi, “Reinforcement Learning for Adaptive Canary Deployment,” in Proc. IEEE Int. Conf. Autonomic Comput. (ICAC), Washington, DC, USA, 2020, pp. 11–20.

D. Sculley et al., “Hidden Technical Debt in Machine Learning Systems,” in Adv. Neural Inf. Process. Syst. (NeurIPS), Montréal, QC, Canada, 2015.

R. B. Basher, R. N. Calinescu, and A. P. Ramsden, “Statistical Runtime Verification of Canary Deployments in the Cloud,” in Proc. IEEE Int. Conf. Cloud Eng. (IC2E), 2021, pp. 95–104.

K. R. Joshi, M. J. Neely, and R. K. Sitaraman, “Predictive and Causal Models for Automated Canary Decision-Making,” in Proc. ACM SIGMETRICS Perform. Eval. Rev., vol. 48, no. 1, 2020, pp. 36–48.

T. Chen, C. Zhang, M. Liu, Z. Zheng, and G. Xu, “Failure Diagnosis for Microservices with Metric Graph Attention Networks,” in Proc. ACM Int. Conf. Softw. Eng. (ICSE), 2021.

J. Mace, P. Bodik, and R. Fonseca, “Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems,” in Proc. ACM Symp. Operating Syst. Principles (SOSP), Monterey, CA, USA, 2015, pp. 378–393.

D. Veitch, S. Zander, and G. Armitage, “Flow rate fairness: Dismantling a religion,” ACM SIGCOMM Comput. Commun. Rev., vol. 37, no. 2, pp. 63–74, Apr. 2007.

C. Olston and B. Reed, “Inspector Gadget: A Framework for Custom Canary Analysis,” Google Research Blog, Tech. Rep., 2018. [Online]. Available: https://research.google/pubs/inspector-gadget/

E. Breck et al., “The ML Test Score: A Rubric for Production Readiness and Technical Debt Reduction,” in Proc. SysML Conf., Stanford, CA, USA, 2018.

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Auto-sklearn: Efficient and Robust Automated Machine Learning,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Montréal, QC, Canada, 2015.

A. Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 8024–8035.

R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted Boltzmann Machines for Collaborative Filtering,” in Proc. Int. Conf. Mach. Learn. (ICML), Pittsburgh, PA, USA, 2007, pp. 791–798.

R. Kumar et al., “Bayesian Hierarchical Modeling for Robust Service-Level Metric Estimation in CI/CD Pipelines,” in Proc. IEEE Int. Conf. Softw. Eng. Adv. (ICSEA), 2022.

T. Diethe, R. McWilliams, and N. Lawrence, “Online Bayesian Inference for the Infinite Hidden Markov Model,” in Proc. IEEE Int. Workshop Mach. Learn. Signal Process. (MLSP), 2011, pp. 1–6.

Probabilistic Canary Analysis with Bayesian Hierarchical Models in Continuous Delivery

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite