Discussion about this post

User's avatar
Ljubomir Josifovski's avatar

Interesting. I wonder which epistemological branch is closest to Machine Learning (ML). It seems to me ML implicitly adopts a minimalist epistemological stance - perhaps closest to a Bayesian or probabilistic epistemology, if that's a thing? (ChatGPT tells me it is.) Afaics in ML knowledge equals knowing a joint probability function, either the density (p.d.f.) or cumulative (c.d.f.) distribution. Illustrating with a joint p.d.f. next.

This joint p.d.f. (or co-counts in discrete cases) records how often different observed events co-occur. Generally, observations are N-dimensional vectors rather than single scalars. Time is the special dimension in the real world where counting occurs. (mathematically it can be just another dimension.)

Usually we partition our observations Z into two sets: Z = (X, Y). X are things we can directly observe, whereas Y are things we care about but cannot observe directly. Hence us needing to observe X, and knowing the relationship f_{X,Y}(x,y), to give us the means to understand Y.

Before observing X, all we know about Y is its marginal (prior) distribution $f_Y(y)$, obtained by marginalising out X from the joint distribution: $f_Y(y) = \int_x f_{X,Y}(x,y) dx$. (If Y is discrete, this integral becomes a summation.)

After observing a specific value of X, say x = a, we gain more information about Y. Geometrically, we intersect the joint distribution $f_{X,Y}(x,y)$ with the plane $x = a$, yielding a slice $f_{X,Y}(x=a,y)$. However, this slice alone isn't yet a proper p.d.f. because it doesn't integrate to 1. To correct this, we normalise it by dividing it with the marginal distribution of X at $x = a$, i.e., $f_X(a) = \int_y f_{X,Y}(a,y) dy$. This gives us the conditional distribution $f_{Y|X}(y|x=a) = \frac{f_{X,Y}(x=a,y)}{f_X(x=a)}$.

(Noticing this relationship for it's Bayesian structure. We've got $f_{X,Y}(x,y) = f_{Y|X}(y|x) f_X(x)$. Marginalising to find $f_Y(y) = \int f_{Y|X}(y|x) f_X(x) dx $ involves integrating over all possible conditionals, weighted by their probability $f_X(x)$. Bayesian updating embedded in ML?)

Once we have this conditional p.d.f. $f_{Y|X}(y|x=a)$, it encodes all our updated knowledge about Y, given the observation about X that is $x = a$. We can subsequently use this p.d.f. for forecasting - choosing a point, or an interval, (two points,) or to weight things in e.g. decision-making in various contexts, etc.

Expand full comment
Blanca's avatar

What I love about this is how it flips the usual take on ‘misinformation.’ Instead of starting with who’s wrong, it starts with how people come to believe what they believe. That shift,from judging to understanding,feels like something we badly need, especially when everyone’s convinced they’re the only ones being rational. Honestly, more political debates would go somewhere if we cared less about being right and more about why we think we're right.

Expand full comment
23 more comments...

No posts