Comparing Sequential Forecasters

Yo Joong Choe
Corresponding Author
Yo Joong Choe
[email protected]
https://orcid.org/0000-0002-0614-9477
Data Science Institute, University of Chicago, Chicago, Illinois 60637;
Search for more papers by this author
,
Aaditya Ramdas
Aaditya Ramdas
[email protected]
https://orcid.org/0000-0003-0497-311X
Department of Statistics and Data Science, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author

Yo Joong Choe

Corresponding Author

Yo Joong Choe

[email protected]

https://orcid.org/0000-0002-0614-9477

Data Science Institute, University of Chicago, Chicago, Illinois 60637;

Search for more papers by this author

Aaditya Ramdas

[email protected]

https://orcid.org/0000-0003-0497-311X

Department of Statistics and Data Science, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Published Online:17 Oct 2023https://doi.org/10.1287/opre.2021.0792

References

Abernethy JD, Frongillo RM (2012) A characterization of scoring rules for linear properties. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory, vol. 23, 27.1–27.13.Google Scholar
Arnold S, Henzi A, Ziegel JF (2023) Sequentially valid tests for forecast calibration. Ann. Appl. Stat. 17(3):1909–1935.Google Scholar
Bauer H (2001) Measure and Integration Theory (De Gruyter, Berlin, New York).Crossref, Google Scholar
Brier GW (1950) Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78(1):1–3.Crossref, Google Scholar
Darling DA, Robbins H (1967) Confidence sequences for mean, variance, and median. Proc. Natl. Acad. Sci. USA 58(1):66–68.Crossref, Google Scholar
Dawid AP (1984) Statistical theory: The prequential approach. J. Roy. Statist. Soc. Ser A 147(2):278–290.Crossref, Google Scholar
Dawid AP, Musio M (2014) Theory and applications of proper scoring rules. Metron 72(2):169–183.Crossref, Google Scholar
DeGroot MH, Fienberg SE (1983) The comparison and evaluation of forecasters. Statistician 32(1–2):12–22.Crossref, Google Scholar
Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J. Bus. Econom. Statist. 13(3):253–263.Crossref, Google Scholar
Ehm W, Krüger F (2018) Forecast dominance testing via sign randomization. Electronic J. Statist. 12(2):3758–3793.Crossref, Google Scholar
Ehm W, Gneiting T, Jordan A, Krüger F (2016) Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. J. Roy. Statist. Soc. Ser. B Statist. Methodology 78(3):505–562.Crossref, Google Scholar
Frongillo RM, Kash IA (2021) General truthfulness characterizations via convex analysis. Games Econom. Behav. 130:636–662.Crossref, Google Scholar
Giacomini R, White H (2006) Tests of conditional predictive ability. Econometrica 74(6):1545–1578.Crossref, Google Scholar
Gneiting T (2011) Making and evaluating point forecasts. J. Amer. Statist. Assoc. 106(494):746–762.Crossref, Google Scholar
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102(477):359–378.Crossref, Google Scholar
Gneiting T, Balabdaoui F, Raftery AE (2007) Probabilistic forecasts, calibration and sharpness. J. Roy. Statist. Soc. Ser. B Statist. Methodology 69(2):243–268.Crossref, Google Scholar
Good IJ (1971) Comment on “Measuring information and uncertainty” by Robert J. Buehler. Godambe VP, Sprott DA, eds. Foundations of Statistical Inference (Holt, Rinehart and Winston, Toronto), 337–339.Google Scholar
Good IJ (1952) Rational decisions. J. Roy. Statist. Soc. B 14(1):107–114.Crossref, Google Scholar
Grünwald P, de Heide R, Koolen W (2023) Safe testing. J. Roy. Statist. Soc. Ser. B Statist. Methodology Forthcoming.Google Scholar
Grünwald PD, Dawid AP (2004) Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. Statist. 32(4):1367–1433.Crossref, Google Scholar
Henzi A, Ziegel JF (2022) Valid sequential inference on probability forecast performance. Biometrika 109(3):647–663.Crossref, Google Scholar
Henzi A, Ziegel JF, Gneiting T (2021) Isotonic distributional regression. J. Roy. Statist. Soc. Ser. B Statist. Methodology 83(5):963–993.Crossref, Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.Crossref, Google Scholar
Howard SR, Ramdas A (2022) Sequential estimation of quantiles with applications to A/B testing and best-arm identification. Bernoulli 28(3):1704–1728.Crossref, Google Scholar
Howard SR, Ramdas A, McAuliffe J, Sekhon J (2020) Time-uniform Chernoff bounds via nonnegative supermartingales. Probab. Surveys 17:257–317.Crossref, Google Scholar
Howard SR, Ramdas A, McAuliffe J, Sekhon J (2021) Time-uniform, nonparametric, nonasymptotic confidence sequences. Ann. Statist. 49(2):1055–1080.Crossref, Google Scholar
Jamieson KG, Jain L (2018) A bandit approach to sequential experimental design with false discovery control. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31. (Curran Associates, Red Hook, NY), 3664–3674.Google Scholar
Jamieson K, Malloy M, Nowak R, Bubeck S (2014) lil’UCB: An optimal exploration algorithm for multi-armed bandits. Proc. 27th Conf. Learning Theory. Proceedings of Machine Learning Research, vol. 35 (PMLR, New York) 423–439.Google Scholar
Johari R, Koomen P, Pekelis L, Walsh D (2022) Always valid inference: Continuous monitoring of A/B tests. Oper. Res. 70(3):1806–1821.Link, Google Scholar
Lai TL (1976a) Boundary crossing probabilities for sample sums and confidence sequences. Ann. Probab. 4(2):299–312.Crossref, Google Scholar
Lai TL (1976b) On confidence sequences. Ann. Statist. 4(2):265–280.Crossref, Google Scholar
Lai TL, Gross ST, Shen DB (2011) Evaluating probability forecasts. Ann. Statist. 39(5):2356–2382.Crossref, Google Scholar
Lehmann EL (1975) Nonparametrics: Statistical Methods Based on Ranks (Holden-Day, San Francisco).Google Scholar
McCarthy J (1956) Measures of the value of information. Proc. Natl. Acad. Sci. USA 42(9):654–655.Crossref, Google Scholar
Messner JW, Mayr GJ, Wilks DS, Zeileis A (2014) Extending extended logistic regression: Extended vs. separate vs. ordered vs. censored. Monthly Weather Rev. 142(8):3003–3014.Crossref, Google Scholar
Ovcharov EY (2018) Proper scoring rules and Bregman divergence. Bernoulli 24(1):53–79.Crossref, Google Scholar
Ramdas A, Grünwald P, Vovk V, Shafer G (2023) Game-theoretic statistics and safe anytime-valid inference. Statist. Sci. 38(4):576–597.Crossref, Google Scholar
Ramdas A, Ruf J, Larsson M, Koolen W (2020) Admissible anytime-valid sequential inference must rely on nonnegative martingales. Preprint, submitted September 7, https://arxiv.org/abs/2009.03167.Google Scholar
Ramdas A, Ruf J, Larsson M, Koolen WM (2022) Testing exchangeability: Fork-convexity, supermartingales and e-processes. Internat. J. Approximate Reasoning 141:83–109.Crossref, Google Scholar
Robbins H (1970) Statistical methods related to the law of the iterated logarithm. Ann. Math. Statist. 41(5):1397–1409.Crossref, Google Scholar
Robbins H, Siegmund D (1970) Boundary crossing probabilities for the wiener process and sample sums. Ann. Math. Statist. 41(5):1410–1429.Crossref, Google Scholar
Rosenbaum PR (1995) Observational Studies (Springer, New York, NY), 1–12.Crossref, Google Scholar
Savage LJ (1971) Elicitation of personal probabilities and expectations. J. Amer. Statist. Assoc. 66(336):783–801.Crossref, Google Scholar
Schervish MJ (1989) A general method for comparing probability assessors. Ann. Statist. 17(4):1856–1879.Crossref, Google Scholar
Seillier-Moiseiwitsch F, Dawid A (1993) On testing the validity of sequential probability forecasts. J. Amer. Statist. Assoc. 88(421):355–359.Crossref, Google Scholar
Shafer G (2021) Testing by betting: A strategy for statistical and scientific communication. J. Roy. Statist. Soc. Ser. A 184(2):407–431.Crossref, Google Scholar
Shafer G, Vovk V (2019) Game-Theoretic Foundations for Probability and Finance, vol. 455 (Wiley, Chichester, UK).Crossref, Google Scholar
Shafer G, Shen A, Vereshchagin N, Vovk V (2011) Test martingales, Bayes factors and p-values. Statist. Sci. 26(1):84–101.Crossref, Google Scholar
Vannitsem S, Bremnes JB, Demaeyer J, Evans GR, Flowerdew J, Hemri S, Lerch S, Roberts N, Theis S, Atencia A (2021) Statistical postprocessing for weather forecasts: Review, challenges, and avenues in a big data world. Bull. Amer. Meteorological Soc. 102(3):E681–E699.Crossref, Google Scholar
Ville J (1939) Étude Critique de la Notion de Collectif (Gauthier-Villars, Paris).Google Scholar
Vovk V, Wang R (2021) E-values: Calibration, combination and applications. Ann. Statist. 49(3):1736–1754.Crossref, Google Scholar
Vovk V, Takemura A, Shafer G (2005) Defensive forecasting. Internat. Workshop Artificial Intelligence Statist. (PMLR, New York), 365–372.Google Scholar
Waggoner B (2021) Linear functions to the extended reals. Preprint, submitted February 18, https://arxiv.org/abs/2102.09552.Google Scholar
Waudby-Smith I, Ramdas A (2023) Estimating means of bounded random variables by betting. J. Roy. Statist. Soc. Ser. B Statist. Methodology. Forthcoming.Crossref, Google Scholar
Waudby-Smith I, Arbour D, Sinha R, Kennedy EH, Ramdas A (2021) Time-uniform central limit theory and asymptotic confidence sequences. Preprint, submitted March 11, https://arxiv.org/abs/2103.06476.Google Scholar
Winkler RL (1994) Evaluating probabilities: Asymmetric scoring rules. Management Sci. 40(11):1395–1405.Link, Google Scholar
Winkler RL, Munoz J, Cervera JL, Bernardo JM, Blattenberger G, Kadane JB, Lindley DV, Murphy AH, Oliver RM, Ríos-Insua D (1996) Scoring rules and the evaluation of probabilities. Test 5(1):1–60.Crossref, Google Scholar
Yen YM, Yen TJ (2021) Testing forecast accuracy of expectiles and quantiles with the extremal consistent loss functions. Internat. J. Forecasting 37(2):733–758.Crossref, Google Scholar
Ziegel JF, Krüger F, Jordan A, Fasciati F (2020) Robust forecast evaluation of expected shortfall. J. Financial Econom. 18(1):95–120.Crossref, Google Scholar

Volume 72, Issue 4

July-August 2024

Pages iii-vi, 1317-1750, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:December 20, 2021
Accepted:July 05, 2023
Published Online:October 17, 2023

Cite as

Yo Joong Choe, Aaditya Ramdas (2023) Comparing Sequential Forecasters. Operations Research 72(4):1368-1387.

https://doi.org/10.1287/opre.2021.0792

Keywords

Acknowledgments

The authors thank Alexander Henzi, Johanna F. Ziegel, Rafael M. Frongillo, and the anonymous reviewers for their valuable feedback on this work. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. The manuscript was submitted and revised when Y. J. Choe was at Carnegie Mellon University.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Comparing Sequential Forecasters

References

Volume 72, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News