Saturday, September 29, 2018

On Rewards

"Any rational society would either kill me or give me my books.
-Hannibal

Society only tolerates a certain kind of deviance, the kind that has a great or novel expectation of rewards. AI engineers too reflect these tendencies when treating that certain kind of deviance in agents as a refreshing intelligent behavior, given that it is rewarding to the larger interests. A rudimentary example of this could be a bipedal robot figuring out the use of his hands when the walk involves a steep climb by imitating what it observes in other agents, biological and otherwise. As it is, in swarm the most critical information is always coming from the nearest agents which are also considered to be the most reliable. In humans at least, learning does not always lead to a permanent change of behavior, expectation of rewards and fear of punishments often puts us onto the path of overlooking or even misinterpreting our learnings. Yet while the discovery of which response leads to what effect goes far beyond just observations, observational learning is said to be the most prominent method of learning. It should be evident therefore that the attention control mechanism and a hyper-sensitive selection of stimulus and response can make the difference between a satisfactory outcome and the optimal one.

Unlike naturally occuring intelligence, Artificial Intelligence gets its teeth mostly from nurture, as 'nature' mostly applies to its embodiment and hardware limitations. The case of human multitasking is an interesting study in the nurturing of intelligent behavior. A productive use of downtime fetches a much higher reward value than leisure and inertia. We have effective attention mechanisms to cope with long & complex inputs, we apply controlled attention and carry out continuous switching between tasks to maximize rewards. Our perceptual motor expertise gets better with the task familiarity and an economic attentional distribution is then either always rewarded or is at least never punished. More than that, we consistently acquire new goals and build a hierarchy of goals (correlating with hierarchy of needs) which helps us to multitask better and organize our time and life towards greater reward expectancy.

Now if we are to have agents with the ability to acquire new goals on their own, operating under a predetermined hierarchy of goals based on a cause-effect relationship, AI will need a more purposeful nature of being which risks an even willful acceptance of punishments (or negative rewards) towards the greater goal. A finer sentiment sensitivity, a clear demarcation between mission-specific-sensing and non-mission-specific-sensing, and a causality based actuation that draws from dynamic sets of rules seems to be the way forward.

Any finite set of rules, as frames and boxes generally go, is bound to provide an incomplete approximation of reality. As some philosophers have noted, instinct is the most intelligent among all forms of intelligence, and it is least bound by the concerns and deliberations of rewards and punishments as compared to the non-instinctive actions. With more and more seamless and direct coupling of perception to action and an incomplete knowledge-base like humans, sometimes agents too, like humans, can run into a quagmire best described by a remark which Napoleon had made after losing the war in Russia, that, between the sublime to the ridiculous is one short step. Bonne Guerre.

---