Really interesting as usual. But the algorithm can’t be as simple as ‘repeat what gives pleasure’. Wasn’t that the big takeaway from Damasio’s work? That we can lose half our brains & make decisions that are proximately optimal - but hopelessly non strategic for longer term decisions? The ability to defer gratification is an important feature of choosing optimal pathways. That may mean risking getting no primary reward (the marshmallow or whatever) in order to get two later. Or taking a pathway that promises pain for uncertain but tasty rewards. For me, depression is a psychological stranding where no pathways seem to be available. But if I recall, this helplessness is not ‘learned’ - it’s the standard operating system for organisms confronted with aversive situations? And it messes up the reward / learning system by over emphasising errors. I’m not sure how a machine would respond to aversiveness? Would it keep trying new pathways until it ran out of energy or would it also adopt a helplessness & throw in the towel after a sub optimal number of attempts? Also, as I’ve previously commented, I find the really fascinating bit is the poorly calibrated people. People with super optimism bias who perceive far more pathways than are available and keep searching for them. But I think these people are also far more likely to get stranded / depressed, because they’re poorly calibrated. Which might explain higher levels of depression among entrepreneurs, sports stars etc; Musk is the perfect example of the massively non-calibrated individual cycling through mood states as he’s forever exploring adaptive pathways. Anyway, as always, a thought provoking post.
Thanks for your comment. You're absolutely right, it’s not "as simple as ‘repeat what gives pleasure’."
The decision model I describe, where we form values through reinforcement learning, is more complex than just repeating what worked. You would be able to identify patterns in what you’ve done to get the right outcome, and you’d be able to predict the future flow of rewards from your actions if your past actions help you foresee the consequences. Therefore, even within a reinforcement learning model, you would behave in a fairly sophisticated way, making intertemporal trade-offs.
On top of that, your predictions about future rewards can also benefit from mental simulations based on models of the world you have. That’s why we often spend time thinking about what we will do or say. These mental simulations give us an estimation of the scenarios that will unfold from different actions, helping us choose the right one. As a result, even in novel situations, we can form estimates and make good decisions.
Regarding your point about aversiveness, the adaptive explanation is that it helps us avoid bad situations. As I mentioned in the post on depression, I believe you’re right that there will likely be variations in personality and a propensity for depression. Personality differences are a very legitimate topic of inquiry. I just tend to focuses on explaining average behaviour under the assumption that psychological traits are functional and useful (except in cases of evolutionary mismatch).
Thanks for the overview. I tend to think that these things are really multilayered in our brains.
For example: depression is on a layer above the dopamine actions on neurons and apply a modulation on the overall happiness processing. That is just my intuition from first-hand experience with depression.
Really interesting as usual. But the algorithm can’t be as simple as ‘repeat what gives pleasure’. Wasn’t that the big takeaway from Damasio’s work? That we can lose half our brains & make decisions that are proximately optimal - but hopelessly non strategic for longer term decisions? The ability to defer gratification is an important feature of choosing optimal pathways. That may mean risking getting no primary reward (the marshmallow or whatever) in order to get two later. Or taking a pathway that promises pain for uncertain but tasty rewards. For me, depression is a psychological stranding where no pathways seem to be available. But if I recall, this helplessness is not ‘learned’ - it’s the standard operating system for organisms confronted with aversive situations? And it messes up the reward / learning system by over emphasising errors. I’m not sure how a machine would respond to aversiveness? Would it keep trying new pathways until it ran out of energy or would it also adopt a helplessness & throw in the towel after a sub optimal number of attempts? Also, as I’ve previously commented, I find the really fascinating bit is the poorly calibrated people. People with super optimism bias who perceive far more pathways than are available and keep searching for them. But I think these people are also far more likely to get stranded / depressed, because they’re poorly calibrated. Which might explain higher levels of depression among entrepreneurs, sports stars etc; Musk is the perfect example of the massively non-calibrated individual cycling through mood states as he’s forever exploring adaptive pathways. Anyway, as always, a thought provoking post.
Hi Claire,
Thanks for your comment. You're absolutely right, it’s not "as simple as ‘repeat what gives pleasure’."
The decision model I describe, where we form values through reinforcement learning, is more complex than just repeating what worked. You would be able to identify patterns in what you’ve done to get the right outcome, and you’d be able to predict the future flow of rewards from your actions if your past actions help you foresee the consequences. Therefore, even within a reinforcement learning model, you would behave in a fairly sophisticated way, making intertemporal trade-offs.
On top of that, your predictions about future rewards can also benefit from mental simulations based on models of the world you have. That’s why we often spend time thinking about what we will do or say. These mental simulations give us an estimation of the scenarios that will unfold from different actions, helping us choose the right one. As a result, even in novel situations, we can form estimates and make good decisions.
Regarding your point about aversiveness, the adaptive explanation is that it helps us avoid bad situations. As I mentioned in the post on depression, I believe you’re right that there will likely be variations in personality and a propensity for depression. Personality differences are a very legitimate topic of inquiry. I just tend to focuses on explaining average behaviour under the assumption that psychological traits are functional and useful (except in cases of evolutionary mismatch).
Thanks for your comments as always.
Thanks for the overview. I tend to think that these things are really multilayered in our brains.
For example: depression is on a layer above the dopamine actions on neurons and apply a modulation on the overall happiness processing. That is just my intuition from first-hand experience with depression.
Found this that seem related: https://www.eurekalert.org/news-releases/1059645
Indeed, it is co-authored by Read Montague who is one of the key contributors to this field.
Good stuff- very helpful!
Thanks Dan!