The Society of the Statistic: Where Does Science Fit?


One of my favorite films is Dead Poets Society. It’s a beautiful story of the fight for the human experience in a system that treats people as a means to an end. 

It’s also an honest story. Such a fight isn’t easy (the system, after all, is where much of the power rests), and it necessarily comes with real human costs. In DPS, these costs are illustrated by the suicide of Neil Perry, a boy with a talent and love for acting but whose father would have nothing but his becoming a doctor. (As for why Neil’s father has such a simplistic metric of success for his child, it’s related, presumably, to economic status or security. The Perry family is not as wealthy as the other families that send their children to Welton, the prestigious private school where DPS is set.) 

Overall, though, DPS is an optimistic story. In the end, many students have learned to think for themselves and value themselves, and the system has lost some of its sway. This is shown in the final scene, where a majority of the students stand on their desks in homage to Keating, the teacher, and in defiance of Mr. Nolan, the headmaster.

Mr. Nolan, in this story, is the embodiment of the system: serious, austere, unempathetic. And his opening speech to a crowd of students (all of whom are white males, by the way) and parents shows how such a system uses statistics and metrics as its guiding light. It’s the very opening lines of the film:

In her first year, Welton Academy graduated five students. Last year we graduated fifty-one. And more than seventy-five percent of those went on to the Ivy League!

Welton may teach some valuable skills. But ultimately, the school and Mr. Nolan are guided by that Ivy League percentage, that one number. It doesn’t matter if its practices are empowering to the students as individuals or not. This Ivy League percentage is what the parents, the customers of Welton, are looking for.

Keating (played by the late, great Robin Williams), on the other hand, is the embodiment of what it means for people to be an end in themselves, rather than just a means to an end. His guiding light is a meaningful human experience for each student. Though, certainly, what would entail such an experience differs from student to student, and thus Keating’s education is largely an attempt to encourage the students to think for themselves. But he also opens their eyes to the joys of poetry, a human passion largely forgotten by the system (forgotten because, well, there’s not much economic value to poetry). 

One of my favorite scenes of Keating is the following. He’s speaking to his classroom full of students in his usual loud and boisterous manner:

No matter what people tell you, words and ideas can change the world. Now, I see that look in Mr. Pitts’ eye, like 19th century literature has nothing to do with going to business school or medical school. Right? Maybe. 

Mr. Hopkins, you may agree with him, thinking: Yes, we should simply study our Mr. Pritchard and learn our rhyme and metre and go quietly about the business of achieving other ambitions. 

Well, I’ve a little secret for you. Huddle up. Huddle up!

The students gather around Keating, who crouches, and continues in a softer, more intimate voice:

We don’t read and write poetry because it’s cute. We read and write poetry because we are members of the human race. And the human race is filled with passion.

Now medicine, law, business, engineering, these are noble pursuits, and necessary to sustain life. But poetry, beauty, romance, love, these are what we stay alive for.

To quote from Whitman: ‘Oh me! Oh life! of the questions of these recurring, Of the endless trains of the faithless, of cities fill’d with the foolish… What good amid these, O me, O life? Answer. That you here—that life exists and identity, That the powerful play goes on and you may contribute a verse.’

It’s a wonderful scene and a refreshing philosophy. But I do have one thing to add to Keating’s list. I too enjoy poetry, but being the predominantly “left brained” person that I am, I love science even more. And science, at its best and truest, can also be what we stay alive for.

But is there even a need to speak of the value of science? I mean, in the way that Keating speaks of the value of poetry? Science, after all, has much more economic value than poetry, so it’s already valued by our current society, right? 

Well, it would seem so, and it’s often discussed as if it is so, but I’m not so sure. And the reason why stems from the fact that science is not one monolithic entity. There are multiple aspects to it, some of which are valued and some of which are not. For example, Richard Feynman, a true “Keating of Physics,” spoke of these multiple values of science in a speech of his:

The first way science is of value is familiar to everyone. It is that scientific knowledge enables us to do all kinds of things and to make all kinds of things…

Another value of science is the fun called intellectual enjoyment which some people get from reading and learning and thinking about it, and which others get from working in it.

He goes on to describe the intellectual enjoyment aspect further:

The same thrill, the same awe and mystery, come again and again when we look at any problem deeply enough. With more knowledge comes deeper, more wonderful mystery, luring one on to penetrate deeper still. Never concerned that the answer may prove disappointing, but with pleasure and confidence we turn over each new stone to find unimagined strangeness leading on to on to more wonderful questions and mysteries…

Feynman, then, spoke of two values of science in the above quotes. Building on what he said, we can consider three values:

  1. Economic usefulness: the ability to do all kinds of things and make all kinds of things that enable the economy to grow
  2. Individual usefulness: the ability for people to utilize their scientific understanding to do all kinds of things and make all kinds of things (for intrinsic purposes)
  3. Intellectual enjoyment: the experience of fun, awe, mystery we get in the pursuit, discovery, and usage of scientific insight

Looking at it in this way, we can see that what economic forces will tend to incentivize is economic usefulness. They won’t directly incentivize the other two; though, this is not necessarily a problem. If, for example, economic growth could only be obtained by encouraging a significant amount of intellectual enjoyment and the development of a significant amount of individual usefulness, then everyone, scientists and capitalists alike, would be happy. The economy could grow and people could find passion in the pursuit of understanding.

If not, though, the costs are obvious: our scientific society would be like Mr. Nolan.


The Rise of Statistics and the Fall of Science (the Joyful Part)

It’s easy to mistake statistics for science. After all, both are logical and detail-oriented and perhaps performed by the same kinds of logical, detail-oriented people. At the very least, we could imagine that they seem more related to each other than, say, to poetry or music.

But that would be an illusion. If we bring in our separate components of science, economically useful science, individually useful science, and intellectually enjoyable science, I’d claim that, actually, the latter two are more related to poetry and music than they are to statistics. 

Because while poetry and music are certainly more emotionally evocative than science, they do share a most important experiential character with, for example, the intellectually enjoyable science (the science of the “idea” and of awe and mystery that Feynman spoke of), which is that they’re all holistic and conceptual.

To put it another way: science, like statistics, involves breaking things down, but intellectually enjoyable science involves also putting the things back together. Finding how the things all relate to form a cohesive, meaningful picture.

Statistics/metrics, on the other hand, are one dimensional. They’re broken down, but they’re not meant to come back together, to form a whole. (Or to be more precise, they can be formed together, but only into another single number, a scalar, or as an unstructured collection.) There’s no mechanistic relationships between them because they’re just measurements. Not to disparage measurements of course! Science absolutely depends on measurements! On experimentation and validation. But what I’m speaking of is when statistics transcend their measurement or validator status and become the end itself, the goal that we seek.

This, sadly, happens often. Especially when it comes to economic entities. For example, corporations may aim for a certain growth percentage (e.g. of customers or profit), or investors may aim for a certain ROI. Even for corporations that deal with artistic mediums, such as film studios, metrics tend to be the goal. 

In an ideal world, this could merely be a useful abstraction. The business men and women at the top of an organization could deal with numbers and logistics, while still empowering the inventors or artists below to “do their thing.” But in practice, it doesn’t work this way. The number-fixation trickles down. And the reason, I believe, is that such a system has a fundamental requirement: in order for the top level to guarantee that it can achieve its goal statistic, it needs to be able to convert each component of the lower layer into a statistic as well. Ditto for the layer below, and below. So in any organizational structure, be it a single corporation, or an entire industry, or an entire economy, if the top level goal is a metric, then at some point going down the hierarchy of organization, there needs to be a translation from numbers to people

And such a thing is impossible. You can attempt, futilely, to approximate people and their experiences/concepts with numbers, but you will always lose the holistic human element. That goal statistic at the top will always create an unbridgeable alienation. And, if we look at the history of modern statistics, we see that, actually, alienation is precisely why it was created.


Eugenics, Capitalism, and The Birth of Modern Statistics

Statistics has always had a dirty little secret (one you certainly won’t find in any high school statistics course). The secret is that the “father of modern statistics,” R.A. Fisher, and many other forefathers of statistics, such as Pearson and Galton, were eugenicists, racists. And not just of the “casual,” societal norm kind. Actively so.

Now, I admit, this fact alone doesn’t prove anything about statistics. I mean, as a pure mathematical construct, it is ethically agnostic. Also, it’s easy to imagine that, for example, eugenics and racism were more normalized at the time (late 19th century – early/mid 20th century). And it’s easy too to imagine that Fisher may have just been a kind of “twisted genius” (that the mathematical part of his brain was on full throttle, but he didn’t have a balanced connection to humanity). 

But the relationship between statistics and racism runs deeper. Even the eminent statistician, Andrew Gelman has recently noted this relationship. In a blog post commenting on the article “RA Fisher and the science of hatred” by Richard J. Evans, he says this: 

Fisher’s personal racist attitudes do not make him special, but in retrospect I think it was a mistake for us in statistics for many years to not fully recognize the centrality of racism to statistics, not just a side hustle but the main gig.

It’s clear from Evans’ analysis that, at least for Fisher, eugenics was not merely a “side interest” of his, but crucial to his whole life motivation. Although we can’t be sure how he came to such a worldview, it seems quite possible that his work in statistics and genetics were, to a large extent, motivated to give rational support to his racist beliefs (this assumption coming from the fact that statistics and genetics can be used quite successfully to support such beliefs; of course, in a deluded way that misses the human picture).

But speaking of statistics as a whole, its relationship with racism is a more confusing question. Gelman, for example, has a sense of the deepness of the relationship. We can see, for example, that statistics can be used to support racist views, to speak of averages of different populations and the like. But what exactly is the relationship?

Viewing this through an economic lens, I believe, makes the relationship more clear. 

Consider first the use-cases of statistics/metrics (as goals, rather than measurements):

  1. Processing large quantities of people or things
  2. Organizing people around a common goal (e.g. in lieu of a holistic vision, a common metric is a way to organize people—even large, scattered groups of people—around a common goal)
  3. Removing human/qualitative concerns; reducing them to numerical concerns (e.g. this can be useful for hiding human concerns, because perhaps we don’t want to see them, or numericizing human concerns for the sake of perceived simplification)

Statistics can be used for any of these use-cases, but if you have a “problem” that benefits from all three of these use-cases, then statistics is the perfect tool! 

Though, if you only care about two out of three of these use-cases, there are other, more effective tools. For example, if you are running a government, you absolutely need to handle a large population of people and organize them around some common goals. But that doesn’t imply that we need to remove human concerns or convert them to numerical concerns. Earlier we mentioned how an organizational structure based on statistics requires the conversion of each lower layer into statistics as well. But that’s not the only way to organize and to delegate. For example, a government may delegate by geographic region (in a hierarchical fashion). This is great. It does “partition” people, but partitioning doesn’t inherently remove the human element. The human element is only removed when we convert subgroups of people to a mere collection of numbers or properties.

For R.A. Fisher, though, this third use-case, removing human concerns, was crucial. Because his eugenicist goals involved the dehumanization of a specific subpopulation of people. Similarly, our economic system often has a drive for this third use-case as well, as Marx so clearly illustrated.

But, as we said, metrics are not the only way to delegate and organize. And in the case of novel production or innovation (e.g. science, technology, art), as opposed to commodity production, such a forced metric-fixation is counterproductive. Because it disincentivizes innovation in the holistic human realm. As we can see now, for example, we’re quite effective at “innovating” in the realm of financialization (developing new ways of turning big numbers into even bigger numbers), but it’s questionable how much meaningful human innovation we’re doing.

It is worth asking the question, though: is a tool bound by its original purpose? Modern statistics, which revolves around testing the “significance” of numbers, may have been designed explicitly for alienation. But can it be repurposed for nobler causes? 

Well, in some cases, yes. I mean, a chainsaw can be used for ice sculpting in addition to cutting trees. But a tool that is tailored for a specific purpose will naturally lend itself to similar purposes.

So, in the case of statistics, what are these similar purposes? And, more importantly, how did statistics grow from satisfying a relatively small niche of eugenicists to dominating nearly all realms of science and technology as it does today?


Computers and the Early Rise of Statistics

Computers as a tool were not developed for statistics. Or for alienation. As far as I can tell, there’s nothing inherently alienating about a computer. And I, for one, love programming and believe computing as a medium has unlimited potential for creativity, empowerment, and fun. (For example, the book Masters of Doom shows a brilliant example of how people can combine computational mastery, creativity/art, and courage to produce something wholly new and exciting.)

However, computers, being at their core number-crunchers, lend themselves well to the processing of statistics. And thus, economic entities can easily wield computers in order to wield statistics. And even, perhaps, convince the public that what they’re doing is “science.”

To better understand the role of computers in the rise of statistics (and to understand exactly what these quotation marks that I’m putting around “science” are), let’s consider an example. Specifically, of a statistical model that is well-cited within academia: the Five Factor Model of personality (also known as the Big Five or OCEAN). 

Now, the FFM is interesting, so I don’t want to diss it too hard. But it’s important to note the properties of a statistical model like this, contrasted to a more conceptual or mechanistic theory of personality.

First, let’s discuss the history of the FFM. Its development began in 1884 with the “Lexical Hypothesis” of Francis Galton (one of the forefathers of statistics we mentioned earlier). This hypothesis states that any important traits of personality should be encoded in words (e.g. adjectives) of any language. For example, quiet, friendly, gregarious, visionary, creative, resourceful, etc. This hypothesis was a crucial breakthrough for what would become the FFM because it allows us to perform an interesting kind of alchemy: converting people to a set of numbers (in this case, because converting to a fixed set of words is equivalent to converting to a set of numbers).

People began using the lexical hypothesis to develop personality questionnaires throughout the early 1900s. But a big breakthrough came in the 1950s and 1960s with the culmination of computer technology and statistical methods. Specifically, Raymond Cattell worked with Charles Spearman to develop new methods of factor analysis (a statistical method) and to apply that to personality questionnaire data. Cattell originally got 16 factors of personality. But later, with the work of Costa, McCrae, and others throughout the 70s, 80s, and 90s, they began to perfect the methodology and found that 5 factors were more stable and comprehensible.

So what does this model give us as people and as a society? Well, it gives us a way to convert people to a descriptive set of five numbers and traits, with these five traits being those that most efficiently summarize the full population of people (or people-numbers). Knowing what the five traits are is interesting. But the assumption is that such traits are static, unchangeable, so it’s not very actionable from an individual perspective (except perhaps to “know your place” relative to others). From an economic standpoint, though, this population summarization could still be useful, for example, in better allocating labor. 

Although the theory pertains to personality, its key breakthroughs were in (1) data generation (the alchemy of converting people to numbers) and (2) data analysis. In other words, not in the study of people (psychology), as neither involved actually probing into or investigating people in any kind of deep way. 

Now, again, don’t get me wrong. This doesn’t mean that the breakthroughs are not interesting or actual breakthroughs. The study of data and statistics are in themselves interesting and are themselves fields of inquiry. But importantly, these are breakthroughs in applied statistics, not psychology. And applied statistics is moreso a discipline of engineering than it is of science (as Chomsky has stated many times on this same topic).

Let’s contrast the FFM to a more psychological theory of personality, Carl Jung’s theory of “cognitive functions.” The watered down and bastardized version of this theory, the MBTI, gets a lot of flack (for good reason, since it’s removing the mechanics from a mechanical model to make it more marketable; though, statistically-speaking, the MBTI correlates closely with the FFM). But the original theory, from Jung’s massive tome, Psychological Types, is much more interesting and conceptual. Jung’s theory came from the deep study of people in his personal practice of psychoanalysis, as well as trying to explain differences of mentality among the leaders of the new psychoanalytic movement (e.g. Adler and Freud). For Jung, it started simple, with the concepts of introversion and extraversion (now ubiquitous). But he slowly built up his concepts over time to explain more aspects of how people think (not just categorizing people but explaining their mechanisms of thinking, and observing that the same mechanisms are used interchangeably by different types of people).

Is Jung’s theory “true”? Well, can we even say that Newton’s theory of gravitation is “true”? I mean, given that the “accuracy” of his model was displaced by Einstein’s theory of general relativity? Are the particles of quantum mechanics “real”? What are they exactly?

In other words, it’s difficult to comment on the “truth” of Jung’s theory, but what I can say is that his concepts are, to a good extent, empowering and representative of meaningful aspects of the way people think. Also, there is a beautiful logic to the whole system. It’s largely analogous, in my opinion, to Newton’s theory of gravitation. Newton’s theory is not a statistical theory. He didn’t design it to optimize for predictive accuracy. But his holistic concept of gravity remains as representative and explanatory as ever. And, for example, what Einstein’s theory added to Newton’s picture was not so much better predictive accuracy as it was the wholly new concept of spacetime

Some people claim that non-statistical theories like the ones we discussed are like astrology. But this is a funny analogy considering that astrology is absolutely not a conceptual or mechanistic theory. Astrology claims that things just are a certain way. Why are they that way? I don’t know. They just are. And in this sense of being non-mechanistic, astrology is actually much more similar to statistical models.

I’ve digressed, but coming back to our main points: a good theory should come with predictive accuracy, but predictive accuracy should not be our goal (and predictive accuracy in lieu of a holistic concept is no theory at all, even though it may still have economic usefulness). Also, suffice it to say that even in the early days of computers, computers were already lending themselves to the spread of statistical modeling.


Big Data and Meta-Statistics

Today, we are well beyond the early days of computers. It hasn’t been so many years in the grand scheme of history, but already computers are ubiquitous and have drastically altered the way we live. Among the many effects of this societal shift, one important one is—you guessed it—more widespread statistical modeling, and specifically, more usage of statistics/metrics as goals. The primary things we seek. 

But why exactly has our ubiquity of computers and smartphones led to greater usage of statistics? I can’t speak to the full picture of what has happened, but one crucially important factor is the rise of Big Data, or more precisely, Big User Data.

Over time, we’ve made numerous breakthroughs in the speed and scale of computation, as well as in data storage. And now we have the ability to save and process great swaths of information on how people behave. Furthermore, with the invention of Deep Learning, we have developed powerful methods of what we could call meta-statistics, the ability to take any dataset and create a generic pattern matcher or pattern generator for that data. 

So what do these developments mean? Well, there are many ways that these technologies can be combined for profitable economic purposes. For example, being able to statistically mimic human behavior means that we can automate certain human tasks or functions. (Not to say that the model does it as well as people. Likely not, since it’s mimicry. But if a statistical model can approximate a task well enough for cheaply enough, then it can be used for automation in many cases.) Also, being able to model people’s behavior means being able, to some extent, to direct consumer behavior. A very lucrative skill. 

Not to say that all of our new applications of statistical modeling are manipulative. Certainly not! For example, I love our new voice assistants, which, as far as I can tell, were designed to assist moreso than to automate or control (also, voice assistants are not using statistics at the top level, merely to perform well-defined subtasks, such as converting speech to text and text to speech). Sadly, though, these truly assistive use-cases appear to be less profitable overall in our present economy.

Coming back to the rise of statistics, these profitable use-cases have encouraged corporations to invest in new and better ways of statistical modeling (especially deep learning). And to invest in their development and application across all fields of science (even in many places where they perhaps don’t really belong).


Conclusion

I hope it’s clear by now: there is an inherent alienating effect of statistics/metrics. It was part of their design and it continues to lend itself to similar use-cases (many of which could be called “imperialistic”). And this is bad for most everyone.

For scientists and technologists, though, there is an additional danger. A more personal danger. When our Keating spirit is destroyed and only Mr. Nolan remains, our very sense of mystery can disappear. I myself have felt this at times, wondering things like have we already discovered most of what there is to discover? Will future generations have anything important left to discover? And I’ve seen others express similar thoughts. 

For example, we may speak of science as a fractal. Or we may speak of low hanging fruit and high hanging fruit, the metaphor of fruit on a tree. These metaphors, however, which are related to finiteness and territory, just come from our identification as an economic agent, our thinking of our own value or success as coming from our contribution to the wider economic system. Capital is finite and territorial, so capitalist science is finite and territorial too. 

But speaking of humanist science, a much more natural metaphor is the one Newton gave centuries ago, towards the end of his life:

I do not know what I may appear to the world, but to myself I seem to have been only like a boy playing on the sea-shore, and diverting myself in now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me.

We’ve found many more smooth pebbles and pretty shells since Newton’s time, but the ocean remains. And that’s because unlike capitalist science, real human science, the science that we stay alive for, is unbounded.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s