The Death of Mystery is an Illusion

There is an existential worry I’ve had. I don’t think of it often, but it creeps up from time to time and never fully goes away. It appears in many different forms, such as the following:

  • Have we already discovered the fundamentals of science? Is there much less left to discover than has already been discovered?
  • Given that we’ve already globalized the world, is our sense of awe or mystery about the world permanently gone or diminished?
  • In the near future, will scientists or technologists even be able to make new breakthroughs? Will everything worth discovering already be discovered? Will they be left, at best, to merely explore highly specialized niches?

I call this existential worry the death of mystery. In general, I don’t worry much about the first two bullets above, but I tend to feel the third bullet more: even if mystery is not dead yet, it often feels to me that it is dying.

But today, I had an epiphany that clarified this fear of mine and gave me strong reason to believe that the fear is much more of an illusion than I originally thought.

I got this epiphany while reading The Myth of Artificial Intelligence by Erik J. Larson. To give some context, the thesis of the book is that the vision of AI—portrayed, for example, by Ray Kurzweil, Elon Musk, and Nick Bostrom—as a form of superintelligence that is imminent given our current technology, is in fact a myth and a myth that is often pushed with ulterior motives, such as to grow the AI bubble through fear-mongering, or more generally to profit from fear-mongering in some way. In reality, Larson claims, how to create a superintelligence is a complete scientific unknown. Our current approaches to AI, while useful computational methods, in no way indicate that we are heading towards anything resembling a so-called superintelligence. Based on my experience in the field, I must say that I completely agree with Larson.

But let’s return to the subject of this post before we stray too far… This is the passage that triggered my epiphany:

Mythology about AI is bad, then, because it covers up a scientific mystery in endless talk of ongoing progress. The myth props up belief in inevitable success, but genuine respect for science should bring us back to the drawing board. This brings us to the second subject of these pages: the cultural consequences of the myth. Pursuing the myth is not a good way to follow “the smart money,” or even a neutral stance. It is bad for science, and it is bad for us. Why? One reason is that we are unlikely to get innovation if we ignore a core mystery rather than face up to it. A healthy culture for innovation emphasizes exploring unknowns, not hyping extensions of existing methods—especially when these methods have been shown to be inadequate to take us much further. Mythology about inevitable success in AI tends to extinguish the very culture of innovation necessary for real progress—with or without human-level AI. The myth also encourages resignation to the creep of machine-land, where genuine invention is sidelined in favor of futuristic talk advocating current approaches, often from entrenched interests.

My god, what an insightful passage… We can see that this myth about AI is a specific case of the death of mystery. And in this case, it’s not a real death at all! It’s a hoax. A faked death. Entrenched interests are pushing this hoax because they benefit from it. And it’s easy to see how they could benefit: for one, positive speculation about technology means higher valuations of technology companies.

Similarly, the general death of mystery is not a real death either. It’s often a media fabrication, or simply the result of the current cultural attitude, which is constantly changing. I wonder if people in past generations—even hundreds of years ago—ever felt that mystery was dead; I suspect some did.

The simple fact is that if you fixate on what has already been discovered and consider it the end-all-be-all, you will feel that mystery is dead, and if you fixate on the obvious mysteries in front of your face, you will feel that mystery is everywhere. It’s as simple as that. So let’s stop reading or listening to “thought leaders,” and let’s focus on those obvious mysteries in front of our faces.

Is Human Behavior Just Elaborate Running and Tumbling?

1. Introduction

You know that scene in the movie, A Beautiful Mind, where John Nash is studying pigeons “hoping to extract an algorithm to define their movement”? Well, it appears we’ve found that algorithm, and it defines the movement of humans too. Furthermore, it provides us with a new lens for understanding psychological phenomena.

In getting to this algorithm, one key insight is to realize that the dopamine circuit—which simultaneously controls movement, motivation, and encoding reward prediction errors in humans and other vertebrates—serves the same role as the chemotaxis circuit in bacteria, and is algorithmically equivalent as well. 

This insight came to me merely insofar as it led me to google “dopamine chemotaxis.” Upon googling, though, I found that Karin and Alon had taken the insight much further to building a formal model and demonstrating its experimental accuracy (Karin & Alon, 2021). They began with the following puzzle, which we can pose again here: why is it that the dopamine circuit simultaneously controls movement, motivation, and reward error encoding? In other words, what do all of those things have in common? 

The answer is that the dopamine circuit is playing a navigational role akin to chemotaxis (chemo meaning chemical + taxis meaning movement). Just as bacteria use a scale-invariant run-and-tumble algorithm to navigate towards chemical attractants (rewards), humans use a similar scale-invariant run-and-tumble algorithm to navigate towards more general rewards. The difference is primarily in how we determine our expected rewards. While bacteria determine their expected rewards based on a passive sensing of the environment, humans determine our expected rewards based on a complex, predictive model of the world. And, of course, we have different definitions of what a reward is. 

Perhaps it’s not obvious that such a realization will lead to psychological insights, but as we’ll see, it does in fact provide the foundation for a rigorous and unifying understanding of many psychological phenomena. We’ll need to establish a few things before we get there, though, so first, let’s take a look at bacteria.

2. Bacterial Chemotaxis

Bacteria use a process known as chemotaxis to move towards chemical attractants, such as food, and away from chemical repellents, such as toxins. It’s an incredibly well-studied process. As you can imagine, back in the 1600s when cells were first observed under a microscope (by Leeuwenhoek and others), one of the first questions they probably asked was “how are these cells moving?” The modern biochemical study of chemotaxis was pioneered in the 1960s (Adler, 1966). The modern mathematical and algorithmic study of chemotaxis was pioneered in the 1990s (Barkai & Leibler, 1997; Alon et al, 1999). And today it’s understood down to the minute biochemical and algorithmic details.

The process (or algorithm) consists of a biased random walk of “runs” and “tumbles.” During a run, the bacterium coordinates its flagella to swim in a straight line. During a tumble, one flagellum kicks out, causing the bacteria to spin and start swimming in a different direction. The process is random (a random walk), but it’s nevertheless effective in finding attractants due to being biased towards attractants and away from repellents. 

The way the biasing works is the following: the bacterium keeps track of the change in the chemical gradient as it runs. If it is climbing up the gradient, it will tend to keep running in that direction and not tumble since it “smells” that it’s getting closer to the attractant. The tendency of running vs tumbling is controlled by a single parameter, the tumbling frequency ϕ (tumbles/sec). If the bacterium is climbing up the chemical gradient, tumbling frequency goes down. If it is going down the chemical gradient, tumbling frequency goes up. It’s quite simple. But despite its simplicity, this run-and-tumble algorithm guarantees that, on average, the bacterium will head in the right direction and obtain its rewards.

Figure 1: Runs and tumbles

There is one other crucial aspect of this algorithm, which is how tumbling frequency changes as a function of the change in the chemical attractant level. If the attractant level goes up, tumbling frequency goes down; however, if the attractant level ceases to continue going up, tumbling frequency will return to its steady state value. In other words, it’s adaptive.

Specifically, it is adaptive in a way that is known as scale invariance, which means, for example, that the attractant level going from 1 to 2 causes the same effect as going from 2 to 4 (or 1000 to 2000). Another way of describing scale invariance is that changes are always relative: the absolute values don’t matter; only the relative difference compared to the baseline matters. Furthermore, this scale invariance has the property of exact adaptation, which means that tumbling frequency levels will return to their steady state when there ceases to be any change in the chemical gradient. This exact adaptation is evolutionarily crucial: if it were not exactly adaptive, it would not be robust to changes in the environment and likely would not be able to survive and reproduce long-term (we’ll see shortly how animal reward-taxis has this same property of exact adaptation).

The scale invariance is implemented in bacteria with a simple feedback loop: the protein that increases the rate of tumbling also decreases its own rate of production, so it’s a self-regulating system that is designed to always return to its steady state.

Figure 2: Exact adaptation

The formal chemotaxis model, as well as pseudo-code, can be found in the appendix. But for now, our intuitive understanding should suffice to begin to understand the “animal algorithm,” which we’ll see is strikingly similar.

3. Animal Reward-taxis

Animal movement has a superficial similarity with bacterial movements in the sense that, well, animals also run and change directions. However, Karin and Alon (K&A) demonstrated that this relationship is not merely superficial; animals are using the exact same algorithm.

K&A called this process in animals reward-taxis (reward + taxis meaning movement), since animals have a broader definition of reward than simply chemical attractants/repellents (e.g. food, play, shelter, mates, etc.). If animals are in an open environment—such as mice running in an open field or zebrafish swimming in open water—their movement looks like a series of runs-and-tumbles up the expected reward gradient, and it also exhibits the property of scale invariance. 

Specifically, the K&A model translates the bacterial chemotaxis model to the animal reward-taxis model via the following changes:

  1. The 2D (or 3D) map of chemical levels c is translated to a 2D (or 3D) map of expected rewards R

  2. Tumbling frequency ϕ is translated to average run duration ϕ (seconds)

  3. Change in the chemical level Δc is translated to change in expected reward ΔR

  4. When a “tumble” occurs, rather than sampling a new direction from a uniformly random distribution, we sample according to the distribution of expected rewards

In the same way that chemotaxis is scale invariant with respect to chemical attractant levels, reward-taxis is scale invariant with respect to expected reward. 

What exactly are “expected rewards”? Well, as we mentioned, all animals have a notion of rewards. And these rewards correspond to specific, numeric values, which can be measured via dopamine responses in the brain. These numeric values, however, are always relative (hence the scale invariance property), and in fact, dopamine response roughly corresponds to the temporal derivative of expected reward (recently, this was shown by precisely manipulating mice’s reward expectations via virtual reality (Kim et al, 2020)). The “expected” part of “expected rewards” means that animals are always making a prediction about future rewards. 

Actually, chemotaxis itself is a special case of reward-taxis. In bacteria’s case, expectations are based simply on sensing the environment and keeping track of the change in chemical levels (and when they tumble, they lose all sense of expectation, hence why they have uniformly random tumbles). 

Figure 3: Animal runs and tumbles (adapted from Karin & Alon, 2021)

The experimental evidence for the K&A model is that it explains the generalized matching law of operant behavior (Baum, 1974; Herrnstein, 1961). Furthermore, it explains many physiological studies on the relative nature of dopamine, and provides an elegant, mechanistic explanation of dopamine’s threefold functionality (movement, motivation, and reward error encoding). 

Thus, to summarize, K&A demonstrated that animals, such as mice and zebrafish, navigate up expected reward landscapes using a reward-taxis algorithm akin to bacteria’s chemotaxis. Which leads us, naturally, to the question of humans.

4. Human Reward-taxis

Humans are animals, and humans do reward-taxis just like animals. However, the runs-and-tumbles of people are not as visibly obvious as the runs-and-tumbles of animals in an open environment.

4a. Hierarchical Runs-and-Tumbles

In the case of human reward-taxis, it’s possible to think of a run as striving towards a goal rather than physically running in a straight line. This is a stretch beyond what K&A showed, but we can make the equivalence between movement and striving towards a goal more clear by considering an example: pursuing a career in X. Such a goal is likely a big, long-term goal; however, progress towards that goal actually consists of a hierarchy of progressively smaller runs-and-tumbles:

  1. In order to pursue your career in X, you need to get a job doing X. You may “run” towards a particular job A, only to be rejected and forced to “run” to a different job B.

  2. Furthermore, in order to get job A or job B, you need to go to the interview (let’s say you have to drive there). Hopefully, you’d be able to follow one driving route to get there, but in the off-chance that a road is blocked or you hit heavy traffic, you may need to take a different route. 

  3. Furthermore, to get the car to drive to your interview site, you may try to use the car you own. Likely it will work, but in the off-chance that your car won’t start, you may need to call a taxi. Etcetera.

Figure 4: Hierarchical runs-and-tumbles

At the base level of these hierarchies, striving does always involve some spatial movement, even if it is merely internal. Thus, we can begin to see how—besides a different perception of expected reward—human’s runs-and-tumbles are analogous to the runs-and-tumbles of animals in an open environment.

4b. Parallel Runs and Partial Tumbles

Another peculiarity of human reward-taxis (likely including other animals), is that we perform multiple runs-and-tumbles in parallel and, to some extent, simultaneously. For example, we can walk to the grocery store to pick up food for dinner (pursuing the reward of food—and possibly other rewards if the dinner includes other people), while simultaneously imagining solutions to a scientific problem (pursuing career or intellectual rewards). Furthermore, we can do partial tumbles. For example, we can switch jobs while maintaining the same living situation, or switch living situations while maintaining the same job. We can also do more drastic tumbles, changing our direction in several dimensions at once.

4c. The Feeling of Exact Adaptation

Furthermore, If we consider the feeling of reward expectations to correspond to desire, then the idea of exact adaptation (i.e. scale invariance) of expected rewards should sound familiar: just when we feel like we’ve gotten what we want, we want more. An extreme example of human’s exact adaptation was given by Dan Gilbert: his team tracked lottery winners as well as paraplegics. Naturally, the lottery winners were ecstatic at first, while the paraplegics were distraught. But only a year later, their subjective well-being levels were equal (Gilbert, 2009). The psychology literature is littered with other examples of this phenomenon (e.g. Furnham & Argyle, 1998; Easterlin, 1968). In other words, we—like all animals—have a growth imperative, a need to continually climb the expected reward gradient.

4d. The Feeling of Run-and-Tumble

To give more intuition for why we are in reality using a run-and-tumble algorithm, consider that in life when we’re “running” in a consistent direction—for example, in a job, in a relationship, or in solving a problem—so long as we’re seeing small improvements, we’ll tend to continue in the same direction, even if that direction is not optimal. It’s only when our growth slows or halts, that we “tumble” and change directions. In other words, our run duration ϕ does appear to be a function of how our reward expectations change over time ΔR. And this is not merely intuition; research backs up this notion. For example, the number one reason people switch jobs is because they are concerned about the lack of opportunities for career advancement (LinkedIn, 2015). It’s all about growth.

4e. What Space Are We Navigating?

So far, we’ve been arguing that we humans do a form of reward-taxis navigation. However, it’s clear that we are not simply navigating the 3D spatial world, moving up and down, left and right, like bacteria or mice in an open field. The question is, then: what space are we navigating?

The answer, to put it simply, is that we’re navigating the space of all our potential actions. These actions include spatial movement. They also include things such as imagination or logical deduction (mental processes), as well as language communication (which involves some amount of physical movement and mental processing). 

There is a hierarchical and chain-reaction nature to actions. At the base level, every action is a combination of physical movement and mental processing; however, these base actions can be organized into a complex set that becomes, for all practical purposes, a single action (for example, driving a car or calling a friend on the phone). To complicate the idea of what constitutes an action further, consider that navigating our expected reward landscape of all possible actions can lead us to new possible actions, and many rewards are, in fact, a means to unlocking more possible actions. In other words, the space we’re navigating is dynamic. Suffice it to say, though, that the space our reward-taxis is navigating is the action space.

4f. The Human Algorithm

Putting all this information together (and taking the complex notion of expected reward as a given for now), we can present a simplified, discrete-time view of the “human algorithm”:

Inputs: initial objective environment E, position within the environment x, set of N possible actions A = {a0, a1, …, aN}, subjective N-dimensional expected reward map M, scalar expected reward level R, change in expected reward ΔR = 0, run duration set to its steady state value ϕ = ϕst, run direction d

while alive do
  new_R ← maybe obtain reward and update expected reward level
  new_ΔRnew_R - R
  ΔR(1-w)*new_ΔR + w*ΔR  (ΔR is a running mean of change in expected rewards)

  ϕ ← f(ΔR, ϕ, ϕst)  (update run duration ‘proportionally’ to ΔR)
  heads ← flip coin with probability p(ϕ) of being heads (p decreasing with ϕ)
  if heads then
    d ← sample new direction according to M (‘tumble’)
  actions ← take a step in direction d (i.e. do the actions corresponding to d)
  E ← update environment (based on actions)
  x ← update position (based on actions)
  A ← update possible actions (based on our new env, pos, and learnings)
  M ← update expected reward map (based on new env, pos, learnings, and actions)

There are, of course, a lot of “implementation details” here, including the fact that this algorithm is running in a highly distributed, decentralized fashion (humans are not simple, merely using a top-level algorithm that is, at its core, simple).

4g. Quick Clarifications

Before moving on, let’s clarify a couple of important questions:

1. K&A proved their results for the midbrain dopamine circuit. What about the forebrain?

It is a leap to take results shown for the midbrain dopamine circuit in zebrafish and mice and extend it to humans with a distinctively large forebrain. However, given our understanding of the function of the forebrain—for example, its role in prediction and reasoning (Bubic et al, 2010)—it is reasonable to assume that it’s role is not to change the “outer loop” of the reward-taxis algorithm but merely to modulate the expected reward term inside. At the very least, if we take this assumption as an axiom for now, we’ll see that it will lead us to a useful framework.

2. What are the “nice” properties of reward-taxis that make it so universal? Are there any alternative algorithms we use or could use?

We did mention a “nice” property of scale invariance, namely, exact adaptation. However, we have not discussed the nice properties of run-and-tumble. One such nice property is that it is incredibly flexible. By changing the dynamic expected reward term, any complex behavior can be elicited. One other nice property is that it works well as a distributed, decentralized algorithm. This property can be observed in bacterial chemotaxis: each of a bacterium’s flagella are operated by independent motors, which are connected to independent sensings of the environment. If one single flagellum “decides” to tumble, the whole bacterium will tumble. Yet when running, they are all able to work in concert.

This distributed, decentralized aspect could be beneficial for humans as well. Consider that we often detect something urgent from one of our senses—for example, feeling a burst of pain when touching a hot surface, or seeing an object flying at us. Having a distributed response system means that that one sense can directly boost dopamine levels, causing us to move and escape the danger. If our biological system was not distributed and each sense or body part had to directly communicate with each other, our response times would likely be much slower.

That being said, we could imagine alternative algorithms, for example, instead of “randomly” running-and-tumbling, what if we did a kind of breadth-first search, systematically exploring the environment? One possible reason we might not do this is the fact that exploration is a learning process: we often don’t have a good enough mental map of the environment to be able to systematically explore. However, it’s important to also note that run-and-tumble is not actually random in the case of reward-taxis: directions leading to a higher expected reward are more likely to be chosen. Furthermore, run-and-tumble is flexible enough to implement a systematic exploration (just imagine the rewards dynamically changing to create a pseudo-breadth-first search). Thus, there may not exist alternative algorithms that provide additional benefits.

4h. Expanding the Expected Reward Term

Now that we’ve established the outer loop of the reward-taxis algorithm, the question becomes, what is going on with the expected reward term? This map is where all our human “magic” is happening, where the value of our consciousness and predictive model of the world must lie. Reward-taxis itself doesn’t explain the details of the expected reward term; however, we can make some base assumptions that will be helpful in our discussion of psychological phenomena.

Assuming for simplicity that our actions are deterministic, in other words, if you take action a in state st, then the next state st+1(a, st) is deterministic, then our numerical value of our expected rewards adheres to a Bellman equation like this (Bellman, 1954):

R(st) = r(st) + γR(st+1) (1)

In each state we get an immediate reward r (though r can be 0, i.e. no reward), followed by all our future rewards (which are discounted by a constant factor γ since immediate rewards are more useful than future rewards).

We could break this equation down by viewing expected reward as a sum over all possible reward outcomes. However, a more useful way to look at it in order to gain psychological insights is to view it as a sum over different reward types. For example, money, accomplishment, love, food, friends, influence, sex, power, etc.:

R(st) =Rmon(st) + Racc(st) + Rlove(st) + Rfood(st) + … (2)

Furthermore, if we focus on “big” rewards (i.e. rewards with a large magnitude that we don’t get often—for example, working hard in order to get a large year-end bonus k time steps in the future), we can rewrite the equation in terms of paths to these big rewards:

R(st) = γkrbonus(st+k) + γk+1R(st+k+1) + (r(st) + … +  r(st+k-1)) (3)  

Intuitively, we can think of γk as a distance-term (distance to the reward you’re seeking) and r(st) + … +  r(st+k-1) as a path-term (these notions of distances and paths will help us to explain some psychological phenomena):

R(st) = dist(rbonus , st) (rbonus(st+k) + γR(st+k+1)) + path(rbonus , st) (4)

5.  A New View of Psychological Phenomena

5a. New Tools

Now that we’ve established an understanding of reward-taxis, let’s see what new tools it gives us for an analysis of human nature and psychology:

  1. A visual and geometrical view of ourselves as an agent navigating an expected reward landscape

  2. A graphical and mathematical understanding of the scale invariant nature of desire and our “growth imperative”

  3. An algorithmic view which lends itself to algorithmic analysis (e.g. what are the extreme cases for the algorithm? What do those extreme cases correspond to in real life?)

  4. The idea that all action is optimizing for expected reward and everything else is essentially a “modifier” of expected reward

The rest of this paper will briefly touch on several aspects of psychology, making conjectures on how our new tools and perspective can provide new understandings of psychological phenomena.

5b. Quirks and “Irrational” Behaviors

In the reward-taxis view of human nature, all actions can be seen as rational given an expected reward landscape (though, this expected reward landscape differs from person to person). This can help us to understand the rationality behind behaviors that appear “irrational” economically or evolutionarily. For example, consider the following:

  1. Why do some “gym rats” build their muscles well past the point of being a way to attract potential mates?

  2. Why are some people willing to free solo climb icy mountains despite the danger?

  3. Why are some people willing to have fewer children in order to pursue career goals?

  4. Why are most billionaire tech CEOs interested in space travel and extraterrestrials?

There are no doubt potential evolutionary explanations for each of these phenomena. But, from a purely evolutionary standpoint, these phenomena are at least (a) not obviously explained or (b) seemingly irrational. 

In the reward-taxis framework, however, each of these can be explained simply as a byproduct of our growth imperative, our need to continually climb the expected reward gradient. In reward-taxis, desire for mates and reproduction can be seen as one of many reward types. Certainly, evolution has made it a large, important term in the reward equation (and certainly, reward-taxis itself is an evolved mechanism), but the desire for different reward types is subjective and subject to change. What is common to all, though, is our desire to keep growing in some subjective direction.

5c. Autonomy

Next, let’s discuss “autonomy.” In reward-taxis, there are many ways to understand autonomy. Reduced autonomy can be understood as an increasing of the distance to many desired reward types (equation 4). Viewing ourselves as navigating agents, we can imagine restricted autonomy as an artificial warping of our reward landscapes that restricts our movement. 

This also gives us a new view of the concept of intrinsic vs extrinsic motivation and “motivational crowding out” (Deci & Ryan, 2000). In the reward-taxis model, what is important is not a distinction between intrinsic vs extrinsic but (a) whether we’re able to continually sense that our expected rewards are growing, (b) whether we “tumble” and change directions, and (c) whether distances to our rewards are too high. 

Thus, if we look at some classic studies of motivational crowding out—for example, taking kids that are intrinsically motivated to draw, then giving them extrinsic rewards for drawing, and subsequently taking those rewards away to find that the kids no longer appear to be intrinsically motivated to draw (Lepper et al, 1973)—we can interpret these studies differently than they’ve been historically interpreted. For example, when the additional extrinsic reward is taken away, the kids will no longer feel that they’re climbing the reward gradient and thus are more likely to tumble to a different activity. It has nothing to do with the fact that the reward was extrinsic, merely that the kid’s relative growth was stifled.

5d. Depression

Reward-taxis does not immediately tell us what the source of depression is, but it does lend itself to conjecture. For example, there are at least two intuitive ways depression can be understood in reward-taxis. The first way is our exactly-adaptive desire system being “broken” in the sense that it is not exactly adapting as it normally does. Specifically, being stuck in a lower-than-usual steady state. The second way is being in a local optimum of the expected reward landscape, a state where whatever action you take, you see a decrease in expected reward. We could call this second way our reward expectations being “broken.” Both ways can lead to a state of constant tumbling (or slower movement if we consider K&A’s alternative model where dopamine regulates running speed). 

How, though, could our desire system possibly get stuck in a lower-than-usual steady state? Isn’t it designed to be perfectly robust and adaptive?

Well, it’s important to note that, in reality, biological systems that exhibit exact adaptation are not exactly adaptive over an infinite range of inputs. If input levels are sufficiently high or sufficiently low, the systems may break (for example, in the case of chemotaxis, it’s adaptive over five orders of magnitude of inputs). Thus, dopamine levels becoming sufficiently high—for example, via chronic stress—could be a cause of problems.

There is a more realistic hypothesis, though. If we consider the fact that dopamine is a relative measure of expected rewards, we could hypothesize that in depression we’re stuck constantly comparing our current expectations to some alternative state, for example, constantly ruminating over a past state, or constantly dreaming of an alternative state that were restricted from.

5e. Personality

Interestingly, even bacteria have different personalities. And even genetically identical bacteria have different personalities. Specifically, some bacteria tumble much more often than others (and some run much longer on average than others). This is due to the fact that although tumbling frequency always returns to its steady state, different bacteria have different steady state values (Spudich & Koshland, 1976). This can be seen as analogous to introversion vs extraversion in humans, which is driven by none other than dopamine (our equivalent to the tumbling frequency protein in bacteria) (DeYoung, 2013).

Beyond introversion and extraversion, we can see how other, more complex personalities could emerge from reward-taxis dynamics. Specifically, looking at the distance term in expected reward from equation (4), dist(r, st), we could say that the distance to a particular reward always depends on some “skill.” Also, using a particular skill to seek a reward will develop the skill further, reducing the distance to other rewards that rely on that skill. Thus, it creates a feedback loop of relying on and developing a particular skill more and more. A simple case of this dynamic can be seen as the cause of right-handedness vs left-handedness, and in fact, it’s also the mechanism Jung proposed as the driving force behind personality differences (Jung, 1921). In that case, the “skills” correspond to discrete, specialized sets of modules in the brain, and preference for certain skills over others manifests as personality differences.

5f. Motivation and Needs

Many psychological theories—for example, those of McClelland, Deci & Ryan, and Maslow, among others (McClelland, 1987; Deci & Ryan, 2000; Maslow, 1943)—describe human motivation via a set of categories describing our needs, drives, and/or motivators. These categories are insightful and often heavily experimentally validated, however, they do not necessarily represent core causal factors of motivation (and experiments do not typically test whether they are core causal factors).

Thus, reward-taxis, being a mechanistic explanation of our underlying motivation system can be a helpful aid to these existing psychological theories of needs, drives, and/or motivators. Furthermore, reward-taxis can help us understand how and why different categories of needs are different.

For example, let’s consider Maslow’s dichotomy of growth needs vs deficiency needs. Many studies have questioned whether this dichotomy is a true dichotomy, or whether there is a hierarchy of needs as Maslow claimed (Wahba et al, 1976). Certainly some needs are absolutely required for survival, and others are important for good health. But there is another side to Maslow’s distinction, which we can understand in reward-taxis as the shape of reward functions

Some reward types are quickly satiated at a particular threshold, and beyond that threshold, an additional amount of that reward is not even perceived as rewarding. These can be thought of as the deficiency needs (or deficiency rewards). Other reward types are unbounded, never fully satisfied, or reach diminishing returns more slowly. These are the growth needs (or growth rewards). The is a spectrum of such reward functions, which can be incorporated in our above equations by making rewards a function of what how much of that reward we already have (which can be captured in the state st).

5g. Repressed Needs/Desires

We can also use reward-taxis to re-interpret Freud’s idea of repressed needs/desires and sublimation. Freud claimed that desires that are frowned upon by society, such as certain sexual desires, could be “sublimated” to acts of higher social value, such as scientific, artist, or career pursuits (Freud, 1930). In the reward-taxis model, this is due to the fact that although some reward types (e.g. sex) may have large magnitude, they may also have high distance (e.g. due to social constraints), thus causing other reward types with lower magnitude but shorter distance to correspond to a higher overall expectation (and thus causing our adaptive desire system to compel us towards these other reward types).

5h. Consciousness

Lastly, let’s briefly discuss consciousness. Reward-taxis does not explain consciousness, but it does support the idea that the purpose of consciousness is to help us in building a predictive model of the world (in order to make more accurate predictions about future rewards).

6. Conclusion, Limitations, and Future Directions

Here we presented the reward-taxis model of human nature. We claim that reward-taxis does not merely describe certain narrow aspects of human behavior but that (1) all action is based on expected reward and that (2) the algorithm used to act based on expected reward is the reward-taxis algorithm, which appears to be shared, in some form, by all organisms from bacteria to humans.

That said, there are many limitations to this model. For example, understanding how we determine our maps of expected reward is likely more important than understanding the reward-taxis algorithm that utilizes expected reward (the details of these “maps” and how they’re learned over time is essentially what separates bacteria from humans). Although we were able to make some high level assumptions about reward expectation, reward-taxis itself does not explain it.

Another major limitation is that this is a single-agent model. Any interaction between people would have to be captured within the expected reward, which gives us a limited view, especially given that we’re such a social animal. In other words, it’s primarily viewing humans from a third person perspective and, to some extent, a first person perspective, but it’s missing the “group perspective.”

Besides those limitations, there are many areas that are not necessarily limitations but which we were not able to get into, such as the following (which could be good future directions):

  • Incorporating learning, uncertainty/risk, and energy expenditure into the model

  • Going deeper into particular areas of psychology

  • Validating reward-taxis in humans by seeing if it predicts behavior (in some well-defined scenarios)

  • Considering applications to AI

  • Looking deeper at the “nice properties” of the reward-taxis algorithm (why does it appear to be shared by all life forms?)

  • Considering how to “hack” the algorithm (i.e. live better lives)

7. References

Adler, J. (1966). Chemotaxis in bacteria. Science, 153(3737), 708-716.

Adler, M., & Alon, U. (2018). Fold-change detection in biological systems. Current Opinion in Systems Biology, 8, 81-89.

Alon, U., Surette, M. G., Barkai, N., & Leibler, S. (1999). Robustness in bacterial chemotaxis. Nature, 397(6715), 168-171.

Belujon, P., & Grace, A. A. (2015). Regulation of dopamine system responsivity and its adaptive and pathological response to stress. Proceedings of the Royal Society B: Biological Sciences, 282(1805), 20142516.

Barkai, N., & Leibler, S. (1997). Robustness in simple biochemical networks. Nature, 387(6636), 913-917.

Baum, W.M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–42.

Bellman, R. (1954). The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6), 503-515.

Bubic, A., Von Cramon, D. Y., & Schubotz, R. I. (2010). Prediction, cognition and the brain. Frontiers in human neuroscience, 4, 25.

DeYoung, C. G. (2013). The neuromodulator of exploration: A unifying theory of the role of dopamine in personality. Frontiers in human neuroscience, 7, 762.

Easterlin, R. A. (1968). The American baby boom in historical perspective. In Population, labor force, and long swings in economic growth: the American experience (pp. 77-110). NBER.

Freud, S. (1930/2015). Civilization and its discontents. Broadview Press.

Furnham, A., & Argyle, M. (1998). The psychology of money. Psychology Press.

Gilbert, D. (2009). Stumbling on happiness. Vintage Canada.

Herrnstein, R.J. (1961). Relative and absolute strength of responses as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behaviour, 4, 267–72.

Jung, C. (1921/2016). Psychological types. Routledge.

Karin, O., & Alon, U. (2021). The dopamine circuit as a reward-taxis navigation system. bioRxiv.

Kim, H. R., Malik, A. N., Mikhael, J. G., Bech, P., Tsutsui-Kimura, I., Sun, F., … & Uchida, N. (2020). A unified framework for dopamine signals across timescales. Cell, 183(6), 1600-1616.

Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children’s intrinsic interest with extrinsic reward: A test of the” overjustification” hypothesis. Journal of Personality and social Psychology, 28(1), 129.

LinkedIn. (2015). Job Switchers Survey. LinkedIn Business Solutions.

Maslow, A. H. (1943). A theory of human motivation. Psychological review, 50(4), 370.

McClelland, D. C. (1987). Human motivation. CUP Archive.

Spudich, J. L., & Koshland, D. E. (1976). Non-genetic individuality: chance in the single cell. Nature, 262(5568), 467-471.

Ryan, R. M., & Deci, E. L. (2000). Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemporary educational psychology, 25(1), 54-67.

Wahba, M. A., & Bridwell, L. G. (1976). Maslow reconsidered: A review of research on the need hierarchy theory. Organizational behavior and human performance, 15(2), 212-240.


A1. Chemotaxis Model

In the language of systems biology, the chemotaxis circuit implements a non-linear integral feedback loop, which has the following model (Adler & Alon, 2018):

u is input (chemical from outside the bacterial cell), x is an internal node (a protein), and y is the output (a protein). ρ is a constant (the ratio of removal rates of x and y). The dot notation means the change/derivative (e.g. dx/dt).

Here is pseudo-code for the chemotaxis algorithm. For a simplified view, we imagine that time is discrete:

Inputs: initial environment E, position within the environment x, attractant level C, change in attractant level ΔC = 0, tumble frequency set to its steady state value ϕ = ϕst, run direction d

while alive do
  new_C ← measure attractant level
  new_ΔCnew_C - C
  ΔC(1-w)*new_ΔC + w*ΔC  (ΔC is a running mean of change in attractant level)

  ϕ ← f(-ΔC, ϕ, ϕst)  (update tumble frequency ‘proportionally’ to -ΔC)
  heads ← flip coin with probability ϕ of being heads
  if heads then
    d ← sample new direction uniformly (tumble)
  x ← move one step in direction d 

6 Scientific Reasons Why Your Productivity Hacks Never Work

If you’ve read any productivity or self-help articles or books, you know that reading rarely makes you more productive. Sadly, this article is no different (it’s probably not going to make you more productive).

But at the very least, I hope to teach you a few things about the science of productivity (which is really the science of motivation in general). And more importantly, I hope to help you chill the fuck out. I see way too many productivity cults and way too much self-torture to “get ahead.” It would be OK if these things really did get you ahead, but more often it’s just banging your head against a wall. So without further ado, here are 6 reasons why your productivity hacks never work:

1. You Can’t Use Logic to Change Your Emotional Self

To take an analogy from psychologist Jonathan Haidt, you can think of your mind like a Rider riding an Elephant. The Rider is you, the logical, conscious being and the Elephant is also you but the emotional, subconscious you.

The reason you want to hack your productivity—from the perspective of the Rider—is because you have a lazy Elephant:

“This damn Elephant just won’t go where I want him to go! *whip*”

But at the end of the day, the Elephant just doesn’t want to go. And sadly, elephants don’t read, or they don’t read most non-fiction at least. So if you read productivity advice that’s targeted at you, the Rider, rather than you, the Elephant, it will almost certainly fail. You, the Rider, already knew what you wanted before you got the advice. The problem was your lazy Elephant. The advice may have made you more confident in what you wanted, but your problem wasn’t a lack of Rider confidence.

The productivity advice didn’t target your real problem, so no wonder it failed.

That being said, smart writers know to target your Elephant. For example, here’s two books I read that did this masterfully:

  • How to Get Rich by Felix Dennis – Felix Dennis calls this book an anti-self help book because he repeatedly says things like “You probably won’t get rich…Even if you read this you have almost no chance of getting rich…If you’re older, you have even less chance of getting rich.” But ironically, this anti-self help works better than plain vanilla self-help because it triggers your emotions. It may make you say “Screw you, Felix. I’ll show you.” or “Challenge accepted!” Now we’re speaking elephant.
  • The War of Art by Steven Pressfield – When I first opened this book I became immediately skeptical because it was written in an illogical, almost religious tone. But quickly I got hooked and realized that that was exactly why it was effective.

2. Meditation Can Fix Many Problems…But Not Productivity

This lesson is intended for those individuals that attempt to use meditation to increase their productivity. Meditation is ineffective at increasing productivity, and to get to the bottom of why it’s ineffective, we need to break down how emotions operate in our mind and body. With each emotion, there’s a feeling you have, and there’s also body changes that occur. For example, with anger you feel…well…angry, but also your blood pressure goes up, your heart-rate increases, and your face tightens.

If I had to guess the order in which those events happens, intuitively I’d guess something like: (1) some stimuli comes in (for example, someone insulted you), (2) your brain processes the stimuli and decides to be angry, (3) you feel angry, and finally (4) you have all the body changes, like an angry scowl.

But in fact, (4) comes before (2) and (3). You receive some stimuli, your body immediately reacts, and then your brain detects that your body is angry, so you start feeling angry. It detects rather than decides. Surprising, right? This is called the James-Lange Theory of Emotion, and this weird theory explains how meditation works.

Meditation works by hacking this system. If you’re stressed, you’ll be breathing quickly. But by forcing yourself to breathe slowly, your mind will detect that slow breathing and eventually it will think “oh, everything must be good since my body isn’t stressed.” Believe it or not, even faking a smile for long enough can make you happier.

“Fake it til you make it” works for emotions. Motivation, though, is not an emotion, so it can’t be hacked this way, at least not to the extent you want.

3. You’re Not Celebrating Your Small Wins Enough

So what is motivation if not an emotion? I’ll explain briefly.

Motivation is related to how much you want things. You always want something, be it to write The Great American Novel or watch the next episode on Netflix. You may even want both of those things simultaneously. Thus your brain needs a way to decide at any moment in time what goal you want the most and what action to take to get it.

That’s what motivation is: a decision-making system in your brain to score and choose certain actions based on the expected reward of those actions. When your brain evaluates an action, its dopamine neurons fire dopamine (the chemical), and the amount of dopamine they fire is proportional to the expected reward of the action. As for what rewards are, those are anything that make you feel good, from social recognition to a good cup of coffee. Oh and one other important detail: future rewards are discounted, so for a future reward to match an immediate reward, it needs to be bigger (and the farther in the future it is, the bigger it needs to be).

What’s the takeaway from this? Well, if you want to write The Great American Novel and you want to watch the next episode on Netflix, but 10 out of 10 days you choose Netflix, then your brain just thinks Netflix will be more rewarding for you. Agree with your brain or not, that’s just how it is.

“But I really do want to write my novel!”

Well, when you’re thinking of writing your novel, what are you imagining the reward to be? Perhaps you’re imagining the self-satisfaction of finishing the last page and holding the book in your hand. Or perhaps you’re imagining yourself on a book tour signing autographed copies. Regardless of what you’re imagining, I have no doubt that those are huge rewards for you, but they’re very far in the future. They’re probably not big enough to overcome the discounting.

There are other reasons why your expected rewards may be lower than they should be too. For example, perhaps you’ve already gotten over the hump of starting your novel. Perhaps you’ve even quit your job to focus on writing full-time, but you can’t stop thinking about future scenarios like “what if I run out of money?” or “what if no one even likes what I write?” Those potential negative rewards are going to count against your total expected reward.

Luckily, there are some easy ways to boost the expected rewards of productive actions. One is enjoying the process or rewarding yourself during the process. You can’t wait til the very end for the first sign of reward. It’s just too far away. Any time you make any progress, you should reward yourself, or better yet, pick a goal where you naturally feel rewarded by the process.

If you are rewarding yourself, it’s helpful to make a ritual of it, like a champagne toast or some other symbolic gesture.

Give yourself a toast even for the smallest wins

Personally, I think a symbolic gesture or something involving social recognition is a much better self-reward than buying yourself a new pair of shoes or treating yourself to an ice cream sundae. Why? Because if guilty pleasures is all you have to look forward to from your ambitious, long-term goals, why do you even need the long-term goals? Why not just go straight to the guilty pleasures?

4. You Don’t Have Enough Autonomy

Ah autonomy, one of my great loves. I discovered the importance of autonomy a couple of years ago when I hated my job. Every day I woke up and played the song Thank You by Dido:

My tea’s gone cold. I’m wondering why I got out of bed at all. The morning rain drops on my window, and I can’t see at all. And even if I could, it would all be grey. I’ve got your picture on my wall. It reminds me that it’s not so bad. It’s not so bad.

Thank You (Dido)

Great song, but very dark. Suffice it to say I wasn’t happy at the time. So I scoured all kinds of literature to figure out why I wasn’t happy, and what I discovered was Self-Determination Theory (SDL), a leading theory of human motivation. The theory resonated with me at the time—and still does—so let me tell you what the theory is.

SDL posits that there are three “basic psychological needs” humans need to satisfy in order to be motivated in a particular job or task or role: (1) autonomy, (2) competence and (3) social connectedness. Researchers typically focus on autonomy since that is the most complicated or counterintuitive of the three, so I’ll also focus on that one (more on competence later though).

So what is autonomy? Well let me start with a puzzle that led to the focus on autonomy in SDL: the “undermining effect.” In a classic study on undermining effect, researchers visited a class of preschool kids and found that some kids intrinsically enjoyed drawing (they would draw in their free time) and other kids didn’t. They then split the kids that intrinsically enjoyed drawing into 3 groups, A, B and C. They told A that if they completed a drawing they would get a reward (a certificate with a gold seal and ribbon—wish I had gotten one of those). For B, they didn’t mention the reward but still gave the reward to them if they completed a drawing. And for C, they didn’t mention the reward and they didn’t give the reward. Then they let the kids go at it.

And what did they find? Initially group A became more motivated than groups B or C, but later, if they stopped giving out the reward for new drawings, group A became less motivated than they were before the study began, whereas with B and C, nothing changed.

This doesn’t seem very rational. If you intrinsically enjoy drawing, you should still intrinsically enjoy it after the reward session is over. But that’s not what happens. It’s a puzzle.

What SDL researchers have found is that there is a fundamental difference between instrinsic motivation and extrinsic motivation, with intrinsic motivation generally being stronger. Additionally, they’ve found that it’s not just two classes, intrinsic and extrinsic, there really is a continuum from fully intrinsic to full extrinsic motivation, and motivation gets progressively stronger the closer you get to fully intrinsic:

For example, if you are a doctor, you may need to a lot of tasks you don’t intrinsically enjoy, such as paperwork, but if you strongly identify with being a doctor and see paperwork as an integral component of being a doctor, you’ll be motivated to do it. The paperwork is an extrinsic requirement, but you still feel you are the one deciding that you want to do it (you have an internal “locus of causality”).

Compare that to being in a role where your boss tells you to do something, you don’t agree that it’s the right thing to do, and you don’t even think you’re the right person to do it. That’s fully extrinsic.

The moral of the story is that you will be much more motivated the closer you are to intrinsic motivation, whether it’s that you enjoy the work itself, you strongly identify with your role, or you feel some high level of control. If you don’t, you better change something now. Though often, if you are lacking control, you may not have enough control to give yourself more control, so the only option may be to make a drastic change of role or environment.

5. You Need to Level-Up Your Skills First

Another aspect of Self-Determination Theory is competence. It’s pretty simple actually: if your skills are too high, you’ll feel bored (imagine Lebron James playing basketball against middle-schoolers) and if they’re too low, you’ll feel stressed or a lack of control (imagine middle-schoolers playing basketball against Lebron James). It’s very similar to Mihaly Csikszentmihalyi’s concept of Flow if you’ve heard of that.

Moral: if you feel unable to do a particular task or project, maybe you need to take a bit of dedicated time to increase your skills and learn more on other tasks or projects, then come back to the original task or project.

6. Trying to be Productive Makes You Less Productive

Anyone who has taken a shower knows that often great ideas come when you’re not trying to produce great ideas. The same is true with productivity. Often when you’re not trying to be productive, you become more productive.

This is exactly the concept of wu wei (doing by not doing) in Daoism, or Mark Manson’s “Backwards Law.” I also like to call this phenomenon the “Office Space Effect,” since in the movie Office Space, the moment the main character stops giving a shit what his bosses think and just acts as he wants to act is the moment he gets promoted.

Of course, you can’t live your whole life according to wu wei or the Office Space Effect, or you may actually never get anything done. But sometimes not doing is better than doing. It’s a subtle and mysterious phenomenon, but I think there is a scientific explanation for why it happens.

The reason is because we’re often crippled by fear. Fear of failing to come up with a new idea, fear of not being promoted, fear of wasting your time or letting your friends or family down. Since fear is equivalent to potential negative rewards, as discussed earlier, it decreases our total expected reward of doing an action, which decreases dopamine and makes us less motivated.

If fear is removed, we become more motivated and more willing to be playful— mentally, physically, however we like—and that can lead to huge rewards.


Real productivity, in my opinion, doesn’t look anything like self-torture. It looks like treating yourself with love and respect and finding an environment where you naturally thrive. So go treat yourself and try a new environment. Good luck!

By the way, if you enjoyed this article, you can follow this blog via email or follow me on Twitter where I post every article.

Thanks to Kan Leung Cheng for philosophizing with me about these ideas.

Lessons from Reading 10,000 AngelList Applications

One of the first people I interviewed off of AngelList was a man from Lagos. He had created a website that displayed Manchester United scores, and it was beautiful. The layout was clean. The colors were bright. It even had elegant animations when you clicked or hovered. Also, the commit log on Github demonstrated (with some degree of confidence) that he was, in fact, the one that made the website. I thought to myself, “this guy must be a talented designer and love Manchester United.”

Three days later, though, I saw someone else—this time from the US—who had created the exact same Manchester United website. Pixel-for-pixel they looked identical.

How did this happen? Did they both copy the code? It’s possible, but I doubt it, based on the fact that their Github commit logs differed. I think what they did was something even worse: they each spent a dozen hours building the exact same thing when they could’ve just as easily spent a dozen hours building something new.

Clark: …I will have a degree, and you’ll be serving my kids fries at a drive-thru on our way to a skiing trip.

Will: [smiles] Yeah, maybe. But at least I won’t be unoriginal.


The beauty of software is that you can create anything. Software may often (but not always) be limited to the virtual world of “bits” rather than “atoms”, but within that world, the sky is the limit. Games, art, scripts to automate chores and tasks, money-making schemes, AI—it can all be created with code. And yet, 4 out of 5 portfolios I see are made up entirely of copied templates.

The first thing I look for now is the intrinsic motivation to build things that you actually want to build. I’d much rather see an applicant create something ugly but novel than something beautiful but replicated. I’d much rather see a video game than a COVID tracker, even if the COVID tracker was not based on a template.

I saw a surprising stat recently: in the 2020-2021 school year at MIT, out of all declared majors, 43% were computer science or had “computer science” somewhere in the name. Almost half. For comparison, physics made up 5%, architecture 1%, and philosophy a measly 0.3%.

In the talent war for atoms vs bits, it seems that bits is winning—at least at MIT. Heck, even their School of Humanities has a computer science major.

At other technical universities, the percentage is smaller. But that may be largely due to rules in place to hold back the flood of wannabe CS students. For example, at Berkeley, even if you’re admitted as a computer science student, get a few too many Bs in your freshman year—God forbid a C—and you’re out.

Given the rapid rise in computer science students and software job applicants around the world, a lot of truly talented and motivated young people have told me that they worry that the field will become too saturated in 10 years and they’ll be left without a job. Will it? My take is that once you “break in,” say by getting a job in big tech or at a prestigious startup, you’re in for good, but it will become harder and harder to break in.

That’s how, for example, universities work, and everything else in the world works. So the same should be true for software engineering jobs (and already is true for that matter).

That said, I do believe those truly talented and motivated individuals will continue to be able to break in. Though, being concerned about saturation is a smart worry. Don’t just hop on the machine learning or crypto hype train. Build what you want to build, and learn what you want to learn.

The number one most important thing for any application—not just software job applications—is standing out. And more importantly, getting inside the mind of your reviewer to know what will stand out.

When I interviewed at Google several years ago on the Pittsburgh campus, I had only one “behavioral interview” (the other four were programming challenges without much chit-chat). The interviewer was with a thin, soft-spoken guy with a goatee, and he came in to the conference room holding my resume. He said, “I looked at your resume and there’s one thing that stands out: you played football. Have you learned anything from playing football that you think will help you in a software engineering role?”

Ironically, just a few days prior, I debated whether to remove the “varsity football” line from my resume. I had only played for one season and had not actually played in the games. Also, I knew that joining the football team had helped me get into the CS program at CMU, and I didn’t want interviewers to suspect that I wasn’t as smart as the other students.

Thankfully, I ignored those thoughts and left the line in. And now I realize that those thoughts were completely off the mark! Google gets a million smart applicants from top programs every year. As a former Google interviewer, I can say that if I saw “Stanford” or “MIT” on your resume, yes I’d notice it, but I would never say “wow” based on that detail alone simply because I saw so many resumes from the same schools. Same goes for seeing a 3.9 GPA (in fact, if I see too high of a GPA, like a 4.0, I tend to make slightly negative assumptions, but that’s a discussion for another post). Quality trumps quantity, unless perhaps the quantity is so ridiculously high that it stands out (rare).

Of course, what stands out completely depends on the mindset of the interviewer and differs drastically from application to application. Applying to one undergrad institution requires something different from applying to another undergrad institution, same for MBA programs, PhD programs, jobs in software, jobs in consulting, you name it. But standing out and having the telepathy to know what will stand out is key.