A Machine Learning Framework for Assessing Experts’ Decision Quality
References
- (2015) Do financial experts make better investment decisions? J. Financial Intermediation 24(4):514–536.Crossref, Google Scholar
- (1994) A model of the effects of audit task complexity. Accounting Organ. Soc. 19(3):213–234.Crossref, Google Scholar
- (1999) Identifying mislabeled training data. J. Artificial Intelligence Res. 11(1):131–167.Crossref, Google Scholar
- (2012) The risks and rewards of speaking up: Managerial responses to employee voice. Acad. Management J. 55(4):851–875.Crossref, Google Scholar
- (2009) Semi-supervised learning. IEEE Trans. Neural Networks 20(3):542–542.Crossref, Google Scholar
- (2011) Decision markets with good incentives. Chen N, Elkind E, Koutsoupias E, eds. Internet and Network Economics. WINE 2011, Lecture Notes in Computer Science, vol. 7090 (Springer Berlin Heidelberg, Berlin, Heidelberg), 72–83.Google Scholar
- (2015) A system for scalable and reliable technical-skill testing in online labor markets. Comput. Networks 90:110–120.Crossref, Google Scholar
- (2003) The importance of cognitive errors in diagnosis and strategies to minimize them. Acad. Medicine 78(8):775–780.Crossref, Google Scholar
- (2013) POMDP-based control of workflows for crowdsourcing. Artificial Intelligence 202:52–85.Crossref, Google Scholar
- (2013) Aggregating crowdsourced binary ratings. Proc. 22nd Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 285–294.Google Scholar
- (2011) Extraneous factors in judicial decisions. Proc. Natl. Acad. Sci. USA 108(17):6889–6892.Crossref, Google Scholar
- (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Statist. Soc. Ser. C Appl. Statist. 28(1):20–28.Google Scholar
- (2009) Vox populi: Collecting high-quality labels from a crowd. Proc. 22nd Conf. Comput. Learn. Theory (COLT) (Montreal, Canada), 377–386.Google Scholar
- (2017) The Limits of Expertise: Rethinking Pilot Error and the Causes of Airline Accidents (Routledge, London).Crossref, Google Scholar
- (2006) Why experts make errors. J. Forensic Identification 56(4):600–616.Google Scholar
- (2006) Contextual information renders experts vulnerable to making erroneous identifications. Forensic Sci. Internat. 156(1):74–78.Crossref, Google Scholar
- (2011) The paradox of human expertise: Why experts get it wrong. Kapur N, ed. The Paradoxical Brain (Cambridge University Press, Cambridge, UK), 177–188.Google Scholar
- (2005) When emotions get the better of us: The effect of contextual top-down processing on matching fingerprints. Appl. Cognitive Psych. 19(6):799–809.Crossref, Google Scholar
- (2011) Thinking, feeling and deciding: The influence of emotions on the decision making and performance of traders. J. Organ. Behav. 32(8):1044–1061.Crossref, Google Scholar
- (2008) The performance evaluation context: Social, emotional, cognitive, political, and relationship components. Human Resources Management Rev. 18(3):146–163.Crossref, Google Scholar
- (2000) Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Statist. 28(2):337–407.Crossref, Google Scholar
- (2020) Cost-accuracy aware adaptive labeling for active learning. Proc. AAAI Conf. Artificial Intelligence 34(3):2569–2576.Google Scholar
- (2021) Who is a better decision maker? Data-driven expert ranking under unobserved quality. Production Oper. Management 30(1):127–144.Crossref, Google Scholar
- (2019) More for less: Adaptive labeling payments in online labor markets. Data Mining Knowledge Discovery 33(6):1625–1673.Crossref, Google Scholar
- (2002) Reducing diagnostic errors in medicine: What’s the goal? Acad. Medicine 77(10):981–992.Crossref, Google Scholar
- (2005) Forced Ranking: Making Performance Management Work (Harvard Business School Press, Boston).Google Scholar
- (1988) A meta-analysis of self-supervisor, self-peer, and peer-supervisor ratings. Personnel Psych. 41(1):43–62.Crossref, Google Scholar
- (2002) Counting deaths due to medical errors. JAMA 288(19):2404.Crossref, Google Scholar
- Hopkins M, Reeber E, Forman G, Suermondt J (1999) Spambase. UCI Machine Learning Repository. https://doi.org/10.24432/C53G6X.Google Scholar
- (2017) Cost-effective active learning from diverse labelers. Proc. 26th Internat. Joint Conf. Artificial Intelligence (IJCAI’17) (AAAI Press, Palo Alto, CA), 1879–1885.Google Scholar
- (1999) Expert decision making. Systems Engrg. 2(1):32–45.Crossref, Google Scholar
- (2014) Repeated labeling using multiple noisy labelers. Data Mining Knowledge Discovery 28(2):402–441.Crossref, Google Scholar
- (1991) Article commentary: Judgment and decision making: A personal view. Psych. Sci. 2(3):142–145.Crossref, Google Scholar
- (1982) Judgment Under Uncertainty: Heuristics and Biases (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2018) Learning from noisy singly-labeled data. 6th Internat. Conf. Learn. Representations, {ICLR} 2018 (Vancouver), Conf. Track Proc. (OpenReview.net).Google Scholar
- (2016) Improving labeling quality using positive label frequency threshold algorithm. Internat. J. Comput. Sci. Engrg. Comm. 4(6):1467–1473.Google Scholar
- (2013) The future of crowd work. Proc. 2013 Conf. Comput. Supported Cooperative Work (CSCW ’13) (Association for Computing Machinery, New York), 1301–1318.Google Scholar
- (1993) Decision Making in Action: Models and Methods, vol. 3 (Ablex, Norwood, NJ).Google Scholar
- (2021) Dynamic, multidimensional, and skillset-specific reputation systems for online work. Inform. Systems Res. 32(3):688–712.Link, Google Scholar
- (2016) Reputation transferability in online labor markets. Management Sci. 62(6):1687–1706.Link, Google Scholar
- (2002) Diagnostic errors. Acad. Emergency Medicine 9(7):740–750.Crossref, Google Scholar
- (1991) The nature of adverse events in hospitalized patients: Results of the Harvard medical practice study II. New England J. Medicine 324(6):377–384.Crossref, Google Scholar
- (2014) Time of day and the decision to prescribe antibiotics. JAMA Internal Medicine 174(12):2029–2031.Crossref, Google Scholar
- (2014) Compensation (McGraw-Hill, New York).Google Scholar
- (1998) Confirmation bias: A ubiquitous phenomenon in many guises. Rev. General Psych. 2(2):175–220.Crossref, Google Scholar
- (1989) The development of expertise in dermatology. Arch. Dermatology 125(8):1063–1068.Crossref, Google Scholar
- (2013) Learning from multiple annotators: Distinguishing good from random labelers. Pattern Recognition Lett. 34(12):1428–1436.Crossref, Google Scholar
- (2004) Active sampling for class probability estimation and ranking. Machine Learn. 54(2):153–178.Crossref, Google Scholar
- (2013) 25-year summary of US malpractice claims for diagnostic errors 1986–2010: An analysis from the national practitioner data bank. BMJ Quality Safety 22(8):672–680.Crossref, Google Scholar
- (2002) Methods and metrics for cold-start recommendations. Proc. 25th Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (SIGIR ’02) (Association for Computing Machinery, New York), 253–260.Google Scholar
- (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin, Madison.Google Scholar
- (1992) Competence in experts: The role of task characteristics. Organ. Behav. Human Decision Processes 53(2):252–266.Crossref, Google Scholar
- (2002) Performance-based assessment of expertise: How to decide if someone is an expert or not. Eur. J. Oper. Res. 136(2):253–263.Crossref, Google Scholar
- (2017) Majority voting and pairing with multiple noisy labeling. IEEE Trans. Knowledge Data Engrg. 31(7):1355–1368.Crossref, Google Scholar
- (2014) The frequency of diagnostic errors in outpatient care: Estimations from three large observational studies involving us adult populations. BMJ Quality Safety 23(9):727–731.Crossref, Google Scholar
- (1994) Relationship of day vs. night sleep to physician performance and mood. Ann. Emergency Medicine 24(5):928–934.Crossref, Google Scholar
- (1982) Diagnosis of diabetic eye disease. JAMA 247(23):3231–3234.Crossref, Google Scholar
- (2019) Learning from noisy labels by regularized estimation of annotator confusion. 2019 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 11236–11245.Google Scholar
- (2017) Expert Political Judgment: How Good Is It? How Can We Know? (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
- (1973) Availability: A heuristic for judging frequency and probability. Cognitive Psych. 5(2):207–232.Crossref, Google Scholar
- (1974) Judgment under uncertainty: Heuristics and biases: Biases in judgments reveal some heuristics of thinking under uncertainty. Science 185(4157):1124–1131.Crossref, Google Scholar
- (2017) Extent of diagnostic agreement among medical referrals. J. Evaluation Clinical Practice 23(4):870–874.Crossref, Google Scholar
- (2017) Cost-effective quality assurance in crowd labeling. Inform. Systems Res. 28(1):137–158.Link, Google Scholar
- (2020) Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surveys 53(3):1–34.Crossref, Google Scholar
- (2019) Using patient-specific quality information to unlock hidden healthcare capabilities. Manufacturing Service Oper. Management 21(3):582–601.Link, Google Scholar
- (1989) Ceilings in the reliability and validity of performance ratings: The case of expert raters. Acad. Management J. 32(1):213–222.Crossref, Google Scholar
- (2016) A systematic review of predictions of survival in palliative care: How accurate are clinicians and who are the experts? PLoS One 11(8):e0161407.Crossref, Google Scholar
- (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Adv. Neural Inform. Processing Systems 22 - Proc. 2009 Conf., vol. 22 (Curran Associates Inc., Red Hook, NY), 2035–2043.Google Scholar
- (2012) Estimating diagnostic accuracy of raters without a gold standard by exploiting a group of experts. Biometrics 68(4):1294–1302.Crossref, Google Scholar
- (2022) Physicians’ knowledge on specific rare diseases and its associated factors: A national cross-sectional study from China. Orphanet J. Rare Diseases 17(1):1–13.Crossref, Google Scholar
- (2018) A brief introduction to weakly supervised learning. National Sci. Rev. 5(1):44–53.Crossref, Google Scholar
- (2012) Learning from the wisdom of crowds by minimax entropy. Proc. 25th Internat. Conf. Neural Inform. Processing Systems - Volume 2 (NIPS’12) (Curran Associates Inc., Red Hook, NY), 2195–2203.Google Scholar
- (2015) The challenges in defining and measuring diagnostic error. Diagnosis (Berlin) 2(2):97–103.Crossref, Google Scholar

