- From the Fair Distribution of Predictions to the Fair Distribution of Social Goods (2024)
Sebastian Zezulka, Konstantin Genin
Proceedings of FAccT 2024
[paper]Deploying an algorithmically informed policy is a significant intervention in society. Prominent methods for algorithmic fairness focus on the distribution of predictions at the time of training, rather than the distribution of social goods that arises after deploying the algorithm in a specific social context. However, requiring a `fair’ distribution of predictions may undermine efforts at establishing a fair distribution of social goods. First, we argue that addressing this problem requires a notion of prospective fairness that anticipates the change in the distribution of social goods after deployment. Second, we provide formal conditions under which this change is identified from pre-deployment data. That requires accounting for different kinds of performative effects. Here, we focus on the way predictions change policy decisions and, consequently, the causally downstream distribution of social goods. Throughout, we are guided by an application from public administration: the use of algorithms to predict who among the recently unemployed will remain unemployed in the long term and to target them with labor market programs. Third, using administrative data from the Swiss public employment service, we simulate how such algorithmically informed policies would affect gender inequalities in long-term unemployment. When risk predictions are required to be `fair' according to statistical parity and equality of opportunity, targeting decisions are less effective, undermining efforts to both lower overall levels of long-term unemployment and to close the gender gap in long-term unemployment.
- Reliability in Machine Learning (2024)
Thomas Grote, Konstantin Genin, Emily Sullivan
Philosophy Compass
[paper]Issues of reliability are claiming center-stage in the epistemology of machine learning. This paper unifies different branches in the literature and points to promising research directions, whilst also providing an accessible introduction to key concepts in statistics and machine learning – as far as they are concerned with reliability.
- Success Concepts for Causal Discovery (2022)
Konstantin Genin, Conor Mayo-Wilson
Behaviormetrika
[paper]Existing causal discovery algorithms are often evaluated using two success criteria, one that is typically unachievable and the other which is too weak for practical purposes. The unachievable criterion—uniform consistency—requires that a discovery algorithm identify the correct causal structure at a known sample size. The weak but achievable criterion—pointwise consistency—requires only that one identify the correct causal structure in the limit. We investigate two intermediate success criteria—decidability and progressive solvability—that are stricter than mere consistency but weaker than uniform consistency. To do so, we review several topological theorems characterizing the causal discovery problems that are decidable and progressively solvable. We show, under a variety of common modeling assumptions, that there is no uniformly consistent procedure for identifying the direction of a causal edge, but there are statistical decision procedures and progressive solutions. We focus on linear models in which the error terms are either non-Gaussian or contain no Gaussian components; the latter modeling assumption is novel to this paper. We focus especially on which success criteria remain feasible when confounders are present.
- On Falsifiable Statistical Hypotheses (2022)
Konstantin Genin
Philosophies
[paper]Popper argued that a statistical falsification requires a prior methodological decision to regard sufficiently improbable events as ruled out. This suggestion has generated a number of fruitful approaches, but also a number of apparent paradoxes and ultimately, no clear consensus. It is still commonly claimed that, since random samples are logically consistent with all the statistical hypotheses on the table, falsification simply does not apply in realistic statistical settings. We claim that the situation is considerably improved if we ask a conceptual question beforehand: when should a statistical hypothesis be regarded as falsifiable. To this end, we propose several different notions of statistical falsifiability and prove that, whichever definition we prefer, the same hypotheses turn out to be falsifiable. This shows that statistical falsifiability enjoys a kind of conceptual robustness. These notions of statistical falsifiability are arrived at by proposing statistical analogues to intuitive properties enjoyed by exemplary falsifiable hypotheses familiar from classical philosophy of science. This demonstrates that, to a large extent, this philosophical tradition is on the right conceptual track. Finally, we demonstrate that, under weak assumptions, the statistically falsifiable hypotheses correspond precisely to the closed sets in a standard topology on probability measures. This means that standard techniques from statistics and measure theory can be used to determine exactly which hypotheses are statistically falsifiable. In other words, the proposed notion of statistical falsifiability both answers to our conceptual demands and submits to standard mathematical techniques.
- Statistical Undecidability in Linear, Non-Gaussian Models in the Presence of Latent Confounders. (2021)
Konstantin Genin
NeurIPS 2021
[paper]Since Spirtes et al. (2000), it is known that if causal relationships are linear and noise terms are independent and Gaussian, causal orientation is not identified from observational data — even if faithfulness is satisfied. Shimizu et al. (2006) showed that linear, non-Gaussian (LiNGAM) causal models are identified from observational data, so long as no latent confounders are present. That holds even when faithfulness fails. Genin and Mayo-Wilson (2020) refine that identifiability result: not only are causal relationships identified, but causal orientation is statistically decidable. That means that there is a method that converges in probability to the correct orientation and, at every sample size, outputs an incorrect orientation with low probability. These results raise questions about what happens in the presence of latent confounders. Hoyer et al. (2008) and Salehkaleybar et al. (2020) show that, although the causal model is not uniquely identified, causal orientation among observed variables is identified in the presence of latent confounders, so long as faithfulness is satisfied. This paper refines these results. When we allow for the presence of latent confounders, causal orientation is no longer statistically decidable. Although it is possible to converge in probability to the correct orientation, it is not possible to do so with finite-sample bounds on the probability of orientation errors, even if causal faithfulness is satisfied. However, that limiting result suggests adjustments of the standard LiNGAM assumptions that restores decidability.
- Randomized Controlled Trials in Medical AI: A Methodological Critique. (2021)
Konstantin Genin, Thomas Grote
Philosophy of Medicine
[paper]Various high-profile publications claim that medical AI systems perform as well, or better, than clinical experts. However, very few controlled trials have been performed and the quality of existing studies has been called into question. There is growing concern that existing studies overestimate the clinical benefits of AI systems. This has led to calls for more, and higher-quality, randomized controlled trials of medical AI systems. While this a welcome development, AI RCTs raise novel methodological challenges that have seen little discussion. In this paper, we discuss some of the challenges arising in the context of AI RCTs and make some suggestions for how to meet them.
- Statistical Decidability in Linear, Non-Gaussian Models. (2020)
Konstantin Genin, Conor Mayo-Wilson
Causal Discovery & Causality-Inspired Machine Learning, NeurIPS 2020
[paper] [proceedings]The main result of this paper is to show that the direction of any causal edge in a LiNGAM is what we call statistically decidable. Statistical decidability is a reliability concept that is, in a sense, intermediate between the familiar notions of consistency and uniform consistency. A set of models is statistically decidable if, for any α > 0, there is a consistent procedure that, at every sample size, hypothesizes a false model with chance less than α. Such procedures may exist even when uniformly consistent ones do not. Uniform consistency requires that one be able to determine the sample size a priori at which one’s chances of identifying the true model are at least 1-α statistical decidability requires no such pre-experimental guarantees.
- Formal Representations of Belief. (2020)
Konstantin Genin, Franz Huber
Stanford Encyclopedia of Philosophy
[entry]Epistemologists are interested in the norms governing the structure and dynamics of systems of belief: how an individual’s beliefs must cohere in order to be considered rational; how they must be reflected in decision making; and how they ought to accommodate new evidence. Formal epistemologists pursue these questions by constructing mathematical models, or “formal representations,” of belief systems that are, in some sense, epistemically exemplary. These models capture something important about how an ideally rational agent would manage her epistemic life. This entry gives an overview of the formal representations that have been proposed for this purpose.
- Full and Partial Belief. (2019)
Konstantin Genin
Open Handbook of Formal Epistemology
Pettigrew, Richard and Weisberg, John eds.
[paper]The question of how partial and full belief are related has received considerable attention in formal epistemology, giving rise to several subtle, elegant and, unfortunately, incompatible solutions. The debate between these alternatives is the heart of this article. The context and background information necessary to appreciate this debate is developed at some length.
- Learning, Theory Choice, and Belief Revision. (2018)
Konstantin Genin, Kevin T. Kelly.
Studia Logica.
[preprint] [doi]This paper presents new logical relations connecting three topics pertaining to inductive inference: (I) synchronic norms of theory choice, like the preferences for simpler and more falsifiable theories, (II) diachronic norms of theory change familiar from belief revision and AGM theory, and (III) the justification of such norms by truth-conduciveness, or learning performance.
- The Topology of Statistical Verifiability. (2017)
Konstantin Genin, Kevin T. Kelly.
In proceedings of TARK 2017, Liverpool.
[preprint] [doi]In topological learning theory, open sets are interpreted as hypotheses deductively verifiable by true propositional information that rules out relevant possibilities. However, in statistical data analysis, one routinely receives random samples logically compatible with every statistical hypothesis. We bridge the gap between propositional and statistical data by solving for the unique topology on probability measures in which the open sets are exactly the statistically verifiable hypotheses. Furthermore, we extend that result to a topological characterization of learnability in the limit from statistical data.
- Realism, Rhetoric, and Reliability. (2016)
Kevin T. Kelly, Konstantin Genin, Hanti Lin.
Synthese 193(4): 1191-1223.
[preprint] [doi]Glymour’s early work on confirmation theory (1980) eloquently stressed the rhetorical plausibility of Ockham’s razor in scientific arguments. His subsequent, seminal research on causal discovery (Spirtes et al. 2000) still concerns methods with a strong bias toward simpler causal models, and it also comes with a story about reliability---the methods are guaranteed to converge to true causal structure in the limit. However, there is a familiar gap between convergent reliability and scientific rhetoric: convergence in the long run is compatible with any conclusion in the short run. For that reason, Carnap (1945) suggested that the proper sense of reliability for scientific inference should lie somewhere between short-run reliability and mere convergence in the limit. One natural such concept is straightest possible convergence to the truth, where straightness is explicated in terms of minimizing reversals of opinion (drawing a conclusion and then replacing it with a logically incompatible one) and cycles of opinion (returning to an opinion previously rejected) prior to convergence. We close the gap between scientific rhetoric and scientific reliability by showing (1) that Ockham’s razor is necessary for cycle-optimal convergence to the truth, and (2) that patiently waiting for information to resolve conflicts among simplest hypotheses is necessary for reversal-optimal convergence to the truth.
- Theory Choice, Theory Change, and Inductive Truth-Conduciveness. (2015)
Konstantin Genin, Kevin T. Kelly.
Proceedings of the Fifteenth Conference on Theoretical Aspects of Rationality and Knowledge (TARK).
[preprint]This is an extended abstract for the above paper Learning, Theory Choice, and Belief Revision.
- Complexity, Ockham’s Razor, and Truth. (2014)
Kevin T. Kelly, Konstantin Genin.
Modes of Explanation: Affordances for Action and Prediction. Lissack, Michael ed., Palgrave Macmillan.
[doi]Ockham’s razor says: “Choose the simplest theory compatible with the data.” Without Ockham’s razor, theoretical science cannot get very far, since there are always ever more complicated explanations compatible with current evidence. Scientific lore pretends that reality is simple---but gravitation works by a quadratic, rather than a linear, law; and what about the shocking failure of parity conservation in particle physics? Ockham speaks so strongly in its favor that demonstrating its falsity resulted in a Nobel Prize in physics (Lee and Yang, 1957). So why trust Ockham?
- Student Profiling from Tutoring System Log Data: When do Multiple Graphical Representations Matter? (2013)
Ryan Carlson, Konstantin Genin, Martina Rau, Richard Scheines.
Proceedings of the Education Data Mining (EDM) Conference.
[preprint]We analyze log-data generated by an experiment with Fractions Tutor, an intelligent tutoring system. The experiment compares the educational effectiveness of instruction with single and multiple graphical representations. We cluster students by their learning strategy and find that the association between experimental condition and learning outcome is found among students implementing just one of the learning strategies. The behaviors that characterize this group illuminate the mechanism underlying the effectiveness of multiple representations and suggest strategies for tailoring instruction to individual students.
- Performativity and Prospective Fairness (2023)
Sebastian Zezulka, Konstantin Genin
Fairness Through the Lens of Time at NeurIPS 2023
[paper]Deploying an algorithmically informed policy is a significant intervention in the structure of society. As is increasingly acknowledged, predictive algorithms have performative effects: using them can shift the distribution of social outcomes away from the one on which the algorithms were trained. Algorithmic fairness research is usually motivated by the worry that these performative effects will exacerbate the structural inequalities that gave rise to the training data. However, standard retrospective fairness methodologies are ill-suited to predict these effects. They impose static fairness constraints that hold after the predictive algorithm is trained, but before it is deployed and, therefore, before performative effects have had a chance to kick in. However, satisfying static fairness criteria after training is not sufficient to avoid exacerbating inequality after deployment. Addressing the fundamental worry that motivates algorithmic fairness requires explicitly comparing the change in relevant structural inequalities before and after deployment. We propose a prospective methodology for estimating this post-deployment change from pre-deployment data and knowledge about the algorithmic policy. That requires a strategy for distinguishing between, and accounting for, different kinds of performative effects. In this paper, we focus on the algorithmic effect on the causally downstream outcome variable. Throughout, we are guided by an application from public administration: the use of algorithms to (1) predict who among the recently unemployed will stay unemployed for the long term and (2) targeting them with labor market programs. We illustrate our proposal by showing how to predict whether such policies will exacerbate gender inequalities in the labor market.
- Inductive vs. Deductive Statistical Inference (2018)
Konstantin Genin, Kevin T. Kelly
Presented at the 2018 Philosophy of Science Association Meeting, Seattle
[paper]The distinction between deductive (infallible, monotonic) and inductive (falli-ble, non-monotonic) inference is fundamental in the philosophy of science. However, virtually all scientific inference is statistical, which falls on the inductive side of the traditional distinction. We propose that deduction should be nearly infallible and monotonic, up to an arbitrarily small, a priori bound on chance of error. A challenge to that revision is that deduction, so conceived, has a structure entirely distinct from ideal, infallible deduction, blocking useful analogies from the logical to the statistical domain. We respond by tracing the logical insights of traditional philosophy of science to the underlying information topology over possible worlds, which corresponds to deductive verifiability. Then we isolate the unique information topology over probabilistic worlds that corresponds to statistical verifiability. That topology provides a structural bridge between statistics and logical insights in the philosophy science.
- How Inductive is Bayesian Conditioning? (2017)
Konstantin Genin
Presented at Experience and Updating Workshop, Bochum.
[abstract]Bayesian conditioning is widely considered to license inductive inferences to universal hypotheses. However, several authors [Kelly, 1996, Shear et al., 2017] have called attention to a sense in which those inferences are essentially deductive: if H has high credence after conditioning on E, then the material condition E ⊃ H has even higher prior probability. In this note, I show that a similar feature attends Jeffrey conditioning. Furthermore, I briefly address the extent of non-deductive undermining of prior beliefs.
- A Topological Explanation of Empirical Simplicity. (2016)
Kevin T. Kelly, Konstantin Genin
Presented at the 2016 Philosophy of Science Association Meeting, Atlanta
[paper]We present and motivate a new explication of empirical simplicity that avoids many of the problems with earlier accounts. The proposal is grounded in information topology, the topological space generated by the set of all possible information states inquiry might encounter. Our proposal is closely related to Popper’s, but we show that it improves upon his in at least two respects: maximal simplicity is equivalent to refutability and stronger hypotheses are not simpler. Finally, we explain how to extend the topological viewpoint to statistical inductive inference.
- The Topology of Statistical Inquiry (2018) [abstract] [mansucript]
Konstantin Genin
Accepted for fulfillment of degree requirements, PhD. Logic, Computation and Methodology.