What I love about this is how it flips the usual take on ‘misinformation.’ Instead of starting with who’s wrong, it starts with how people come to believe what they believe. That shift,from judging to understanding,feels like something we badly need, especially when everyone’s convinced they’re the only ones being rational. Honestly, more political debates would go somewhere if we cared less about being right and more about why we think we're right.
Interesting. I wonder which epistemological branch is closest to Machine Learning (ML). It seems to me ML implicitly adopts a minimalist epistemological stance - perhaps closest to a Bayesian or probabilistic epistemology, if that's a thing? (ChatGPT tells me it is.) Afaics in ML knowledge equals knowing a joint probability function, either the density (p.d.f.) or cumulative (c.d.f.) distribution. Illustrating with a joint p.d.f. next.
This joint p.d.f. (or co-counts in discrete cases) records how often different observed events co-occur. Generally, observations are N-dimensional vectors rather than single scalars. Time is the special dimension in the real world where counting occurs. (mathematically it can be just another dimension.)
Usually we partition our observations Z into two sets: Z = (X, Y). X are things we can directly observe, whereas Y are things we care about but cannot observe directly. Hence us needing to observe X, and knowing the relationship f_{X,Y}(x,y), to give us the means to understand Y.
Before observing X, all we know about Y is its marginal (prior) distribution $f_Y(y)$, obtained by marginalising out X from the joint distribution: $f_Y(y) = \int_x f_{X,Y}(x,y) dx$. (If Y is discrete, this integral becomes a summation.)
After observing a specific value of X, say x = a, we gain more information about Y. Geometrically, we intersect the joint distribution $f_{X,Y}(x,y)$ with the plane $x = a$, yielding a slice $f_{X,Y}(x=a,y)$. However, this slice alone isn't yet a proper p.d.f. because it doesn't integrate to 1. To correct this, we normalise it by dividing it with the marginal distribution of X at $x = a$, i.e., $f_X(a) = \int_y f_{X,Y}(a,y) dy$. This gives us the conditional distribution $f_{Y|X}(y|x=a) = \frac{f_{X,Y}(x=a,y)}{f_X(x=a)}$.
(Noticing this relationship for it's Bayesian structure. We've got $f_{X,Y}(x,y) = f_{Y|X}(y|x) f_X(x)$. Marginalising to find $f_Y(y) = \int f_{Y|X}(y|x) f_X(x) dx $ involves integrating over all possible conditionals, weighted by their probability $f_X(x)$. Bayesian updating embedded in ML?)
Once we have this conditional p.d.f. $f_{Y|X}(y|x=a)$, it encodes all our updated knowledge about Y, given the observation about X that is $x = a$. We can subsequently use this p.d.f. for forecasting - choosing a point, or an interval, (two points,) or to weight things in e.g. decision-making in various contexts, etc.
Interesting! Very into the idea that we will conceptualise knowledge in different ways for different purposes. Or, in fancier language, build different epistemological models. Substacker @Daniel Greco has an excellent book about this:
ChatGPT has a propensity to only tell you about a very particular Bayesian flavor of ML methods because of its representation in the literature between 2000-2020 unless you system prompt it to remember that ML uses frequentist devices like cross-validation and conformal prediction. I wouldn't automatically assume that Bayesian confirmation theory is the closest aligned to ML.
I would argue that Neo-Popperian schools are also quite compatible with rigorous practitioners of empirical ML. See Gelman and Shalizi "Philosophy and the practice of Bayesian statistics". https://arxiv.org/abs/1006.3868
More recently, this exchange between Jim Berger and Aad Van Der Vaart https://doi.org/10.51387/22-NEJSDS4A tells you a bit more about where Bayesian ML is headed these days — towards bayesian methods that have good consistency or good "frequentist properties".
Hey thanks - appreciate that. I know this is almost a religious experience for some, this Bayesian thing. I've met a lecturer once, (~year 2000) who used to tick "Bayesian" on census forms, id-forms on entry to various countries and so on. I'm an engineer myself by temperament, don't hold strong feelings there. For me it's a "meh", what works - it's fine.
But I didn't have in mind what ChatGPT summarises. I had in mind what LLM-s implement, what epistemology is built into them. By the architecture chosen for them by their creators, (=what network weights are forever zeros,) and by the training algorithm and the objective function minimised used in fitting them to predict the data.
Autoregressive LLM-s typically implement a conditional probability function, f_{Y|X}(y|x), where (approx) X is the chat so far, and Y is the next word(s). It's all in "tokens" embeddings space, so word-parts not words, etc. (all I write - simplified - is between not-quite-right and wrong at some level.) For that conditional p.d.f. we know since 1991, (Richard & Lippmann "Neural Network Classifiers Estimate Bayesian a posteriori Probabilities"; later Rojas 1996 nicer proof) that it converges to the posterior probability P(class|input) under conditions.
Similar - for regression (not classification) NN- s outputs converge to the conditional mean of y|x. Nicely written and plotted in Bishop (1995) "Neural Networks for Pattern Recognition", page 203. (got it, but can't paste a pic here?)
The picture generators are implemented as samplers. From some pic that is the current state, (that may start as pure noise too) and moving to a new state, that is the next picture. This is guided by the derivative of the conditional log probability prob(next pic|current pic, step noise level, prompt). That one is learned during training when the network gets noisy pic on input (and SNR level, and maybe prompt), and has to guess the clean(er) pic. So again - this is what the network is created to do. Some function of, or related to, the conditional probability.
Now—these are gigantic functions with trillions of parameters. To my mind, the unresolved unknowns are less to do whether Bayesian or not. We know it's better to average over more models than less—but we can barely afford a single model, an ensemble is a stretch, and more than that almost impossible. The unresolved problems to my mind are like "how big a model can we build with how much of data?", and - "what kind of data?" For obv - just "more data" is trivially not true, as doubling the data set by simply copying it over twice does us no good.
And so on. Hope this makes sense to you as much as it does to me atm. :-)
My introduction to epistemology was as an 18 year old in Phil 101, deciphering George Berkeley’s “Theory of Knowledge,” an excessively wordy meditation on why trees crashing in forests are silent unless one is foolish enough to be sitting near one. Fast forward to a thirty-year career as a counterintelligence officer, where the endgame was to ‘seek the truth’ (which, for the record, never set me free) through a rigorous skepticism that would impress Bishop Berkeley. We are all epistemologists at heart and, in our peculiar way, fun at parties.
Great projects. I like the spotlight on what expertise is, and who are the real experts. It's kind of been key since ancient Athens, so we haven't really solved it yet.
Agree. One reason is that there’s always going to be a tension between democratic ideals and expertise hierarchies. You can navigate these tensions but it’s difficult to do that at a societal level. One could justifiably claim that right now we’re not doing a good job on this.
I enjoyed this and see a lot of agreement between my own ideas and what Robin is talking about in terms of political epistemology. One quibble I would have is the reference to "postmodern relativism." Obviously, "postmodernism" is a difficult concept to talk about because it's caricatured so much in popular media by disingenuous people, but I find a lot of so called postmodernists are basically saying the same thing Robin is here--knowledge is mediated by communities, which doesn't mean that all knowledge is relative or "fake," but that knowledge has a history and a sociology and we can understand that to allow us to better understand what knowledge is and how it's produced. I'm not trying to say Robin is a postmodernist, but the opposite: the postmodernists say something similar to the argument here, but unfortunately they're often misrepresented.
Yeah I agree with you. It was the caricature I had in mind. Part of the problem is on the side of some writers in this tradition, who have the very French thing of delighting in the provocative. But the main problem is on the side of those who try to interpret the tradition, where there’s a complete absence of interpretive charity, and refusal to understand the basic project. At bottom I think a lot of philosophers, particularly analytic philosophers, are so uninterested in a sociological and historical kind of understanding that they assume everyone they read is in the business of making normative or metaphysical claims.
I loved my epistemology and metaphysics class in college (and was interested is street epistemology before that)! Only problem was I misread when our midterm paper was due and ended up writing 12 pages on the different methods of knowing and whether they're equally reliable forms of obtaining knowledge in one day.
Interestingly enough—I now work in a similar area, showing all political perspectives on daily issues in the news. My boss was a philosophy major and we spend so much time talking about how we're trying to bridge the gap between an information free for all and ministry of truth.
An interesting set of observations. I'm an epistemologist as well, and when it comes to epistemology, I do absolutely nothing that McKenna does. That's a little weird, but totally true.
Maybe it’s a personal failing but I can’t really write anything interesting about what epistemology (or for that matter any other sub area of philosophy is). So I just wrote about what I do, on the basis that if I can’t make what I do sound worth doing I might as well pack up and go home.
I don't think I can write anything terribly informative about "what epistemology is" either. When I try to describe it to non-philosophers, I say things like "It's the study of a closely-connected group of concepts, including knowledge, evidence, reason, truth, confidence, disagreement, objectivity, subjectivity, and so on". It seems okay.
If I were G. E. Moore, I would wave my hand at a shelf of epistemology books and say "It's what's discussed in these books".
I still find it interesting that two people can do epistemology for many years and yet don't do any of the same topics. It shows how large the field is.
It’s probably a good thing. If everyone is doing the exact same thing then you’ve got a very unhealthy field. I imagine what I do right now is quite different from what most epistemologists do, and I don’t have (or want to have) an argument that it’s the best thing to do as an epistemologists. I do think it’s the way to approach it if you want to do political epistemology but that’s a much weaker claim.
Very interesting! I understand you are focusing mostly on disciplines where lay people are organizing an embryo of a parallel scientific community, such as organized groups of patients suffering from obscure and sometimes discredited conditions do for medicine. As a researcher in physics I don’t see this happening in my discipline though. What we get are mostly “crackpots” that happen to work individually and mostly on theoretical issues. I put the word crackpot in scare quotes because that’s the derogatory term the official community uses to label certain fringe researchers, but from your point of view those represent an equally useful epistemological lab. We do have a dedicated preprint server (vixra, parallel to arxiv) which is basically hosting the kind of fringe contributions that would get rejected by arxiv. But based on my impressions we do not have organized, cohesive groups of lay people working on alternatives to the official consensus. I wonder whether this is due to the nature of the discipline (e.g. the fact that theoretical physics is potentially open to individual contributions, while at the opposite end experiments require huge collaborations) or to sociological characteristics of the community. Not that we can draw a clear line, I guess.
Thanks! Yeah I want to draw quite a few distinctions here:
Between sciences that are about human beings and sciences that are not.
Between issues in “human sciences” that directly (or strongly indirectly) impact on ordinary people and issues that do not. (This would include questions of application, but perhaps some more theoretical ones too).
Between relatively “settled” science and science that is still in the process of being settled.
I’m interested in cases where, within the human sciences, you have something—like the development of drugs, treatments, and the like—that is still very much “up in the air” and it is possible for non-credentialed experts to make some sort of contribution. So, no, I’m not about defending crank physicists, as much fun as they may be.
> But our attempts to grasp those truths—at least, the sorts of truths that are relevant in politics—are mediated by social practices and our external environment.
How is political epistemology different from social epistemology?
If you have a very expansive conception of the political then there is at best only a nominal difference. If you think that social interactions and the like don’t already presuppose a political context then you can say that political epistemology is a sub field within social epistemology. At least that’s my view.
Fascinating post, I learned a lot. This discipline would go a long way to inviting in the layman by coming up with a less jargonized way to describe their work. Epistemological, a seven syllable word? My eyes have always glazed over when encountering this utterly opaque term. Now I know how important this work is.
What I love about this is how it flips the usual take on ‘misinformation.’ Instead of starting with who’s wrong, it starts with how people come to believe what they believe. That shift,from judging to understanding,feels like something we badly need, especially when everyone’s convinced they’re the only ones being rational. Honestly, more political debates would go somewhere if we cared less about being right and more about why we think we're right.
Agreed!
Interesting. I wonder which epistemological branch is closest to Machine Learning (ML). It seems to me ML implicitly adopts a minimalist epistemological stance - perhaps closest to a Bayesian or probabilistic epistemology, if that's a thing? (ChatGPT tells me it is.) Afaics in ML knowledge equals knowing a joint probability function, either the density (p.d.f.) or cumulative (c.d.f.) distribution. Illustrating with a joint p.d.f. next.
This joint p.d.f. (or co-counts in discrete cases) records how often different observed events co-occur. Generally, observations are N-dimensional vectors rather than single scalars. Time is the special dimension in the real world where counting occurs. (mathematically it can be just another dimension.)
Usually we partition our observations Z into two sets: Z = (X, Y). X are things we can directly observe, whereas Y are things we care about but cannot observe directly. Hence us needing to observe X, and knowing the relationship f_{X,Y}(x,y), to give us the means to understand Y.
Before observing X, all we know about Y is its marginal (prior) distribution $f_Y(y)$, obtained by marginalising out X from the joint distribution: $f_Y(y) = \int_x f_{X,Y}(x,y) dx$. (If Y is discrete, this integral becomes a summation.)
After observing a specific value of X, say x = a, we gain more information about Y. Geometrically, we intersect the joint distribution $f_{X,Y}(x,y)$ with the plane $x = a$, yielding a slice $f_{X,Y}(x=a,y)$. However, this slice alone isn't yet a proper p.d.f. because it doesn't integrate to 1. To correct this, we normalise it by dividing it with the marginal distribution of X at $x = a$, i.e., $f_X(a) = \int_y f_{X,Y}(a,y) dy$. This gives us the conditional distribution $f_{Y|X}(y|x=a) = \frac{f_{X,Y}(x=a,y)}{f_X(x=a)}$.
(Noticing this relationship for it's Bayesian structure. We've got $f_{X,Y}(x,y) = f_{Y|X}(y|x) f_X(x)$. Marginalising to find $f_Y(y) = \int f_{Y|X}(y|x) f_X(x) dx $ involves integrating over all possible conditionals, weighted by their probability $f_X(x)$. Bayesian updating embedded in ML?)
Once we have this conditional p.d.f. $f_{Y|X}(y|x=a)$, it encodes all our updated knowledge about Y, given the observation about X that is $x = a$. We can subsequently use this p.d.f. for forecasting - choosing a point, or an interval, (two points,) or to weight things in e.g. decision-making in various contexts, etc.
Interesting! Very into the idea that we will conceptualise knowledge in different ways for different purposes. Or, in fancier language, build different epistemological models. Substacker @Daniel Greco has an excellent book about this:
https://global.oup.com/academic/product/idealization-in-epistemology-9780198860556
Thanks for the link. Yeah after doing ML in my former life, I got exercised enough to plot the above in a pic, here -
https://ljubomirj.github.io/pdf-joint-cond-marg-1of3.png
...now that I've got some time on my hands. (after the end of another episode in my latter life.)
ChatGPT has a propensity to only tell you about a very particular Bayesian flavor of ML methods because of its representation in the literature between 2000-2020 unless you system prompt it to remember that ML uses frequentist devices like cross-validation and conformal prediction. I wouldn't automatically assume that Bayesian confirmation theory is the closest aligned to ML.
I would argue that Neo-Popperian schools are also quite compatible with rigorous practitioners of empirical ML. See Gelman and Shalizi "Philosophy and the practice of Bayesian statistics". https://arxiv.org/abs/1006.3868
More recently, this exchange between Jim Berger and Aad Van Der Vaart https://doi.org/10.51387/22-NEJSDS4A tells you a bit more about where Bayesian ML is headed these days — towards bayesian methods that have good consistency or good "frequentist properties".
https://arxiv.org/abs/2105.14045;
https://fitelson.org/probability/good_bnbc.pdf
Hey thanks - appreciate that. I know this is almost a religious experience for some, this Bayesian thing. I've met a lecturer once, (~year 2000) who used to tick "Bayesian" on census forms, id-forms on entry to various countries and so on. I'm an engineer myself by temperament, don't hold strong feelings there. For me it's a "meh", what works - it's fine.
But I didn't have in mind what ChatGPT summarises. I had in mind what LLM-s implement, what epistemology is built into them. By the architecture chosen for them by their creators, (=what network weights are forever zeros,) and by the training algorithm and the objective function minimised used in fitting them to predict the data.
Autoregressive LLM-s typically implement a conditional probability function, f_{Y|X}(y|x), where (approx) X is the chat so far, and Y is the next word(s). It's all in "tokens" embeddings space, so word-parts not words, etc. (all I write - simplified - is between not-quite-right and wrong at some level.) For that conditional p.d.f. we know since 1991, (Richard & Lippmann "Neural Network Classifiers Estimate Bayesian a posteriori Probabilities"; later Rojas 1996 nicer proof) that it converges to the posterior probability P(class|input) under conditions.
Similar - for regression (not classification) NN- s outputs converge to the conditional mean of y|x. Nicely written and plotted in Bishop (1995) "Neural Networks for Pattern Recognition", page 203. (got it, but can't paste a pic here?)
The picture generators are implemented as samplers. From some pic that is the current state, (that may start as pure noise too) and moving to a new state, that is the next picture. This is guided by the derivative of the conditional log probability prob(next pic|current pic, step noise level, prompt). That one is learned during training when the network gets noisy pic on input (and SNR level, and maybe prompt), and has to guess the clean(er) pic. So again - this is what the network is created to do. Some function of, or related to, the conditional probability.
Now—these are gigantic functions with trillions of parameters. To my mind, the unresolved unknowns are less to do whether Bayesian or not. We know it's better to average over more models than less—but we can barely afford a single model, an ensemble is a stretch, and more than that almost impossible. The unresolved problems to my mind are like "how big a model can we build with how much of data?", and - "what kind of data?" For obv - just "more data" is trivially not true, as doubling the data set by simply copying it over twice does us no good.
And so on. Hope this makes sense to you as much as it does to me atm. :-)
My introduction to epistemology was as an 18 year old in Phil 101, deciphering George Berkeley’s “Theory of Knowledge,” an excessively wordy meditation on why trees crashing in forests are silent unless one is foolish enough to be sitting near one. Fast forward to a thirty-year career as a counterintelligence officer, where the endgame was to ‘seek the truth’ (which, for the record, never set me free) through a rigorous skepticism that would impress Bishop Berkeley. We are all epistemologists at heart and, in our peculiar way, fun at parties.
Great projects. I like the spotlight on what expertise is, and who are the real experts. It's kind of been key since ancient Athens, so we haven't really solved it yet.
Agree. One reason is that there’s always going to be a tension between democratic ideals and expertise hierarchies. You can navigate these tensions but it’s difficult to do that at a societal level. One could justifiably claim that right now we’re not doing a good job on this.
I enjoyed this and see a lot of agreement between my own ideas and what Robin is talking about in terms of political epistemology. One quibble I would have is the reference to "postmodern relativism." Obviously, "postmodernism" is a difficult concept to talk about because it's caricatured so much in popular media by disingenuous people, but I find a lot of so called postmodernists are basically saying the same thing Robin is here--knowledge is mediated by communities, which doesn't mean that all knowledge is relative or "fake," but that knowledge has a history and a sociology and we can understand that to allow us to better understand what knowledge is and how it's produced. I'm not trying to say Robin is a postmodernist, but the opposite: the postmodernists say something similar to the argument here, but unfortunately they're often misrepresented.
Yeah I agree with you. It was the caricature I had in mind. Part of the problem is on the side of some writers in this tradition, who have the very French thing of delighting in the provocative. But the main problem is on the side of those who try to interpret the tradition, where there’s a complete absence of interpretive charity, and refusal to understand the basic project. At bottom I think a lot of philosophers, particularly analytic philosophers, are so uninterested in a sociological and historical kind of understanding that they assume everyone they read is in the business of making normative or metaphysical claims.
That makes sense. I had the distorted view myself until I actually went and read some primary sources.
I loved my epistemology and metaphysics class in college (and was interested is street epistemology before that)! Only problem was I misread when our midterm paper was due and ended up writing 12 pages on the different methods of knowing and whether they're equally reliable forms of obtaining knowledge in one day.
Interestingly enough—I now work in a similar area, showing all political perspectives on daily issues in the news. My boss was a philosophy major and we spend so much time talking about how we're trying to bridge the gap between an information free for all and ministry of truth.
Sounds like an interesting job!
An interesting set of observations. I'm an epistemologist as well, and when it comes to epistemology, I do absolutely nothing that McKenna does. That's a little weird, but totally true.
Maybe it’s a personal failing but I can’t really write anything interesting about what epistemology (or for that matter any other sub area of philosophy is). So I just wrote about what I do, on the basis that if I can’t make what I do sound worth doing I might as well pack up and go home.
I don't think I can write anything terribly informative about "what epistemology is" either. When I try to describe it to non-philosophers, I say things like "It's the study of a closely-connected group of concepts, including knowledge, evidence, reason, truth, confidence, disagreement, objectivity, subjectivity, and so on". It seems okay.
If I were G. E. Moore, I would wave my hand at a shelf of epistemology books and say "It's what's discussed in these books".
I still find it interesting that two people can do epistemology for many years and yet don't do any of the same topics. It shows how large the field is.
It’s probably a good thing. If everyone is doing the exact same thing then you’ve got a very unhealthy field. I imagine what I do right now is quite different from what most epistemologists do, and I don’t have (or want to have) an argument that it’s the best thing to do as an epistemologists. I do think it’s the way to approach it if you want to do political epistemology but that’s a much weaker claim.
Very interesting! I understand you are focusing mostly on disciplines where lay people are organizing an embryo of a parallel scientific community, such as organized groups of patients suffering from obscure and sometimes discredited conditions do for medicine. As a researcher in physics I don’t see this happening in my discipline though. What we get are mostly “crackpots” that happen to work individually and mostly on theoretical issues. I put the word crackpot in scare quotes because that’s the derogatory term the official community uses to label certain fringe researchers, but from your point of view those represent an equally useful epistemological lab. We do have a dedicated preprint server (vixra, parallel to arxiv) which is basically hosting the kind of fringe contributions that would get rejected by arxiv. But based on my impressions we do not have organized, cohesive groups of lay people working on alternatives to the official consensus. I wonder whether this is due to the nature of the discipline (e.g. the fact that theoretical physics is potentially open to individual contributions, while at the opposite end experiments require huge collaborations) or to sociological characteristics of the community. Not that we can draw a clear line, I guess.
Thanks! Yeah I want to draw quite a few distinctions here:
Between sciences that are about human beings and sciences that are not.
Between issues in “human sciences” that directly (or strongly indirectly) impact on ordinary people and issues that do not. (This would include questions of application, but perhaps some more theoretical ones too).
Between relatively “settled” science and science that is still in the process of being settled.
I’m interested in cases where, within the human sciences, you have something—like the development of drugs, treatments, and the like—that is still very much “up in the air” and it is possible for non-credentialed experts to make some sort of contribution. So, no, I’m not about defending crank physicists, as much fun as they may be.
> But our attempts to grasp those truths—at least, the sorts of truths that are relevant in politics—are mediated by social practices and our external environment.
How is political epistemology different from social epistemology?
If you have a very expansive conception of the political then there is at best only a nominal difference. If you think that social interactions and the like don’t already presuppose a political context then you can say that political epistemology is a sub field within social epistemology. At least that’s my view.
Fascinating post, I learned a lot. This discipline would go a long way to inviting in the layman by coming up with a less jargonized way to describe their work. Epistemological, a seven syllable word? My eyes have always glazed over when encountering this utterly opaque term. Now I know how important this work is.
It’s not the best word, but nobody has managed to come up with a better one.