Comparing Sequential Forecasters

Published Online:https://doi.org/10.1287/opre.2021.0792

References

  • Abernethy JD, Frongillo RM (2012) A characterization of scoring rules for linear properties. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory, vol. 23, 27.1–27.13.Google Scholar
  • Arnold S, Henzi A, Ziegel JF (2023) Sequentially valid tests for forecast calibration. Ann. Appl. Stat. 17(3):1909–1935.Google Scholar
  • Bauer H (2001) Measure and Integration Theory (De Gruyter, Berlin, New York).CrossrefGoogle Scholar
  • Brier GW (1950) Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78(1):1–3.CrossrefGoogle Scholar
  • Darling DA, Robbins H (1967) Confidence sequences for mean, variance, and median. Proc. Natl. Acad. Sci. USA 58(1):66–68.CrossrefGoogle Scholar
  • Dawid AP (1984) Statistical theory: The prequential approach. J. Roy. Statist. Soc. Ser A 147(2):278–290.CrossrefGoogle Scholar
  • Dawid AP, Musio M (2014) Theory and applications of proper scoring rules. Metron 72(2):169–183.CrossrefGoogle Scholar
  • DeGroot MH, Fienberg SE (1983) The comparison and evaluation of forecasters. Statistician 32(1–2):12–22.CrossrefGoogle Scholar
  • Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J. Bus. Econom. Statist. 13(3):253–263.CrossrefGoogle Scholar
  • Ehm W, Krüger F (2018) Forecast dominance testing via sign randomization. Electronic J. Statist. 12(2):3758–3793.CrossrefGoogle Scholar
  • Ehm W, Gneiting T, Jordan A, Krüger F (2016) Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. J. Roy. Statist. Soc. Ser. B Statist. Methodology 78(3):505–562.CrossrefGoogle Scholar
  • Frongillo RM, Kash IA (2021) General truthfulness characterizations via convex analysis. Games Econom. Behav. 130:636–662.CrossrefGoogle Scholar
  • Giacomini R, White H (2006) Tests of conditional predictive ability. Econometrica 74(6):1545–1578.CrossrefGoogle Scholar
  • Gneiting T (2011) Making and evaluating point forecasts. J. Amer. Statist. Assoc. 106(494):746–762.CrossrefGoogle Scholar
  • Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102(477):359–378.CrossrefGoogle Scholar
  • Gneiting T, Balabdaoui F, Raftery AE (2007) Probabilistic forecasts, calibration and sharpness. J. Roy. Statist. Soc. Ser. B Statist. Methodology 69(2):243–268.CrossrefGoogle Scholar
  • Good IJ (1971) Comment on “Measuring information and uncertainty” by Robert J. Buehler. Godambe VP, Sprott DA, eds. Foundations of Statistical Inference (Holt, Rinehart and Winston, Toronto), 337–339.Google Scholar
  • Good IJ (1952) Rational decisions. J. Roy. Statist. Soc. B 14(1):107–114.CrossrefGoogle Scholar
  • Grünwald P, de Heide R, Koolen W (2023) Safe testing. J. Roy. Statist. Soc. Ser. B Statist. Methodology Forthcoming.Google Scholar
  • Grünwald PD, Dawid AP (2004) Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. Statist. 32(4):1367–1433.CrossrefGoogle Scholar
  • Henzi A, Ziegel JF (2022) Valid sequential inference on probability forecast performance. Biometrika 109(3):647–663.CrossrefGoogle Scholar
  • Henzi A, Ziegel JF, Gneiting T (2021) Isotonic distributional regression. J. Roy. Statist. Soc. Ser. B Statist. Methodology 83(5):963–993.CrossrefGoogle Scholar
  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.CrossrefGoogle Scholar
  • Howard SR, Ramdas A (2022) Sequential estimation of quantiles with applications to A/B testing and best-arm identification. Bernoulli 28(3):1704–1728.CrossrefGoogle Scholar
  • Howard SR, Ramdas A, McAuliffe J, Sekhon J (2020) Time-uniform Chernoff bounds via nonnegative supermartingales. Probab. Surveys 17:257–317.CrossrefGoogle Scholar
  • Howard SR, Ramdas A, McAuliffe J, Sekhon J (2021) Time-uniform, nonparametric, nonasymptotic confidence sequences. Ann. Statist. 49(2):1055–1080.CrossrefGoogle Scholar
  • Jamieson KG, Jain L (2018) A bandit approach to sequential experimental design with false discovery control. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31. (Curran Associates, Red Hook, NY), 3664–3674.Google Scholar
  • Jamieson K, Malloy M, Nowak R, Bubeck S (2014) lil’UCB: An optimal exploration algorithm for multi-armed bandits. Proc. 27th Conf. Learning Theory. Proceedings of Machine Learning Research, vol. 35 (PMLR, New York) 423–439.Google Scholar
  • Johari R, Koomen P, Pekelis L, Walsh D (2022) Always valid inference: Continuous monitoring of A/B tests. Oper. Res. 70(3):1806–1821.LinkGoogle Scholar
  • Lai TL (1976a) Boundary crossing probabilities for sample sums and confidence sequences. Ann. Probab. 4(2):299–312.CrossrefGoogle Scholar
  • Lai TL (1976b) On confidence sequences. Ann. Statist. 4(2):265–280.CrossrefGoogle Scholar
  • Lai TL, Gross ST, Shen DB (2011) Evaluating probability forecasts. Ann. Statist. 39(5):2356–2382.CrossrefGoogle Scholar
  • Lehmann EL (1975) Nonparametrics: Statistical Methods Based on Ranks (Holden-Day, San Francisco).Google Scholar
  • McCarthy J (1956) Measures of the value of information. Proc. Natl. Acad. Sci. USA 42(9):654–655.CrossrefGoogle Scholar
  • Messner JW, Mayr GJ, Wilks DS, Zeileis A (2014) Extending extended logistic regression: Extended vs. separate vs. ordered vs. censored. Monthly Weather Rev. 142(8):3003–3014.CrossrefGoogle Scholar
  • Ovcharov EY (2018) Proper scoring rules and Bregman divergence. Bernoulli 24(1):53–79.CrossrefGoogle Scholar
  • Ramdas A, Grünwald P, Vovk V, Shafer G (2023) Game-theoretic statistics and safe anytime-valid inference. Statist. Sci. 38(4):576–597.CrossrefGoogle Scholar
  • Ramdas A, Ruf J, Larsson M, Koolen W (2020) Admissible anytime-valid sequential inference must rely on nonnegative martingales. Preprint, submitted September 7, https://arxiv.org/abs/2009.03167.Google Scholar
  • Ramdas A, Ruf J, Larsson M, Koolen WM (2022) Testing exchangeability: Fork-convexity, supermartingales and e-processes. Internat. J. Approximate Reasoning 141:83–109.CrossrefGoogle Scholar
  • Robbins H (1970) Statistical methods related to the law of the iterated logarithm. Ann. Math. Statist. 41(5):1397–1409.CrossrefGoogle Scholar
  • Robbins H, Siegmund D (1970) Boundary crossing probabilities for the wiener process and sample sums. Ann. Math. Statist. 41(5):1410–1429.CrossrefGoogle Scholar
  • Rosenbaum PR (1995) Observational Studies (Springer, New York, NY), 1–12.CrossrefGoogle Scholar
  • Savage LJ (1971) Elicitation of personal probabilities and expectations. J. Amer. Statist. Assoc. 66(336):783–801.CrossrefGoogle Scholar
  • Schervish MJ (1989) A general method for comparing probability assessors. Ann. Statist. 17(4):1856–1879.CrossrefGoogle Scholar
  • Seillier-Moiseiwitsch F, Dawid A (1993) On testing the validity of sequential probability forecasts. J. Amer. Statist. Assoc. 88(421):355–359.CrossrefGoogle Scholar
  • Shafer G (2021) Testing by betting: A strategy for statistical and scientific communication. J. Roy. Statist. Soc. Ser. A 184(2):407–431.CrossrefGoogle Scholar
  • Shafer G, Vovk V (2019) Game-Theoretic Foundations for Probability and Finance, vol. 455 (Wiley, Chichester, UK).CrossrefGoogle Scholar
  • Shafer G, Shen A, Vereshchagin N, Vovk V (2011) Test martingales, Bayes factors and p-values. Statist. Sci. 26(1):84–101.CrossrefGoogle Scholar
  • Vannitsem S, Bremnes JB, Demaeyer J, Evans GR, Flowerdew J, Hemri S, Lerch S, Roberts N, Theis S, Atencia A (2021) Statistical postprocessing for weather forecasts: Review, challenges, and avenues in a big data world. Bull. Amer. Meteorological Soc. 102(3):E681–E699.CrossrefGoogle Scholar
  • Ville J (1939) Étude Critique de la Notion de Collectif (Gauthier-Villars, Paris).Google Scholar
  • Vovk V, Wang R (2021) E-values: Calibration, combination and applications. Ann. Statist. 49(3):1736–1754.CrossrefGoogle Scholar
  • Vovk V, Takemura A, Shafer G (2005) Defensive forecasting. Internat. Workshop Artificial Intelligence Statist. (PMLR, New York), 365–372.Google Scholar
  • Waggoner B (2021) Linear functions to the extended reals. Preprint, submitted February 18, https://arxiv.org/abs/2102.09552.Google Scholar
  • Waudby-Smith I, Ramdas A (2023) Estimating means of bounded random variables by betting. J. Roy. Statist. Soc. Ser. B Statist. Methodology. Forthcoming.CrossrefGoogle Scholar
  • Waudby-Smith I, Arbour D, Sinha R, Kennedy EH, Ramdas A (2021) Time-uniform central limit theory and asymptotic confidence sequences. Preprint, submitted March 11, https://arxiv.org/abs/2103.06476.Google Scholar
  • Winkler RL (1994) Evaluating probabilities: Asymmetric scoring rules. Management Sci. 40(11):1395–1405.LinkGoogle Scholar
  • Winkler RL, Munoz J, Cervera JL, Bernardo JM, Blattenberger G, Kadane JB, Lindley DV, Murphy AH, Oliver RM, Ríos-Insua D (1996) Scoring rules and the evaluation of probabilities. Test 5(1):1–60.CrossrefGoogle Scholar
  • Yen YM, Yen TJ (2021) Testing forecast accuracy of expectiles and quantiles with the extremal consistent loss functions. Internat. J. Forecasting 37(2):733–758.CrossrefGoogle Scholar
  • Ziegel JF, Krüger F, Jordan A, Fasciati F (2020) Robust forecast evaluation of expected shortfall. J. Financial Econom. 18(1):95–120.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.