How could our understanding of AI be bettered by better understanding ourselves?
The development of Artificial Intelligence has started us on a path to potentially creating a new version of the human mind in digital form. Bostrom (2014) introduces the danger of what happens when our computers acquire our species’ long-held privilege of superior intelligence. At current, Professor Boden (2015) describes artificial intelligence as a “bag of highly specialised tricks”, seeing the advent of true ‘human-level’ intelligence a distant threat. However, it is precisely generalised AI which can make broad domain based decisions alone which comprises anthropogenic risk (Shanahan 2015). Anthropogenic risk itself poses a more pressing problem than natural extinction risk (Ord 2015). With the possibility of human-level artificial intelligence, the imperative to understand the way in which humans make decisions is stronger than ever. The tenets of decision-making bound the artificial minds which we create, imposing restrictions or freedoms in the way these machines will make decisions of their own. Machines optimise for the best solution against a set of constraints and artificial minds can be faster and more reliable at solving problems than even the most skilled human optimiser. In many situations, the speed of finding solutions to complex problems has driven the rates of technology adoption today. Yet the fastest, the highest value, the rational route to a solution may not be consistent with societal preference. The ruthlessly efficient optimiser does not align with harmonious co-existence. In some domains, ‘a mind like ours’ (Shanahan 2015), capable of nuanced moral-based judgement is preferable. Yudkowsky (2008) considers the nexus of artificial intelligence and existential risk arising from convergent instrument goals, avoiding the trap of anthropomorphising AI. Human-level ≠ human-like. Such inconsistencies between the way humans and machines behave across different domains create a tension between the decisions made by increasingly autonomous machines and the greater interest of human society at large. The prerogative to identify the grey area of decision-making thus becomes apparent.
There is the need for research asking “when is a moral judgement preferable and when is a value judgement required?” In which domains do we prefer predictability and perfect rationality and in which domains should we fall back on our humanness and subjectivity? I propose a method building on Mero’s (1990) advocation “If a man’s thinking mechanisms were understood better, then somehow it would also become possible to model and stimulate them by artificial means” (p.3). How could our understanding of AI be bettered by better understanding ourselves? Perhaps, one starting points lies in conducting experiments testing the transitivity of human preferences across varied moral and value domains to expose where caution must be exercised in the abdication of such decisions to artificial minds.
Starting from the seminal Kahneman and Tversky (1979) paper establishing “losses loom greater than gains”, a wealth of evidence has been experimentally produced and empirically tested highlighting the inconsistencies of our decision-making processes. In a bounded rationally domain, the human mind is only able to optimise over a restricted set of information, using heuristics and biases to ease the decision burden. Rarely do humans make perfectly rational value or time-consistent judgements. We do not consider the price of every coffee mug in every shop worldwide, simply buying one that is “good enough”. Incorporating such deviant preferences into algorithms has important applications for marketing (Evans et al., 2016). In some decision domains, we are able to reach a decision in the fastest way or the most appropriate way exactly by falling back on these instincts. Such a ‘gut instinct’ becomes particularly advantageous when time, informational or financial constraints bind more tightly.
It is clear that in some situations, perfectly functioning algorithms are the only acceptable default. The scope for artificial intelligence in certain domains requires precisely the removal of erroneous human decisions. An aircraft’s autopilot is designed to function better than a human would, taking in enormous quantities of available information, addressing feasibility of solutions and not getting stressed or distracted along the way. Similarly, we rely on a satellite navigation system when we require the shortest route possible and not occasionally taking a wrong turn or stopping to enjoy the scenery. In fact, in the study of Dietvorst et al., (2014), experimental subjects display strong algorithm aversion when a machine is anything less than perfect. If the decision making process coded in the algorithm appears to be at all faulty, we lose trust in delegating our decisions to it. Indeed the research shows that we lose far more trust than is rational when confronted with imperfect algorithms. To quote the authors: “It seems that errors that we tolerate in humans become less tolerable when machines make them”.
The situations where we trust algorithmic not human judgement are highly domain specific (Logg 2017). The difficulty comes not in domains where perfect rationality is clearly optimal for the machine and for mankind, but in domains which require value-based judgements. When a ‘mistake’ is less clearly defined, choice and values introduces the requirement of outcome preference. But how do we specify the means to the end of these outcome preferences? How should they be determined? Under what conditions? Under what domains? In their consideration of “Moral Machines”, Wallach and Allen (2008) question “Does humanity really want computers to make moral decisions?” (p.6), citing many philosophers who advise against this abdication of responsibility. However, I make the call for research considering precisely when we cannot clearly define the domains where we are agreeing of abdication and the situations where we are not.
To address this trade-off between rational value and instinctual moral judgements, consider how machines make decisions over loss of human life: a focus central to existential risk. A self-driving car is a tangible example of AGI machine which must make autonomous decisions of outcome preference in high- cost, crisis situations. Even though a car is not inherently a moral machine, it is exposed to moral dilemmas when choosing between two objects to hit. Most would agree a random decision to swerve left or right is too clinical. Coding this basic command ignores too much of the last minute, crisis management “gut instinct” a human driver might make. The next most ‘rational’ way to code the decision would perhaps be a value judgement based on the ‘greater good’ of a particular decision to society. Indeed, if the correct decision is the one that optimises to the collective benefit of us all, perhaps we have arrived at a decision-making process which prolongs the existence of man in harmonious existence with machine. Yet, the optimisation problem is not so simple, and solutions relying on a value judgement or a moral judgement often do not match.
Return to the useful (albeit somewhat clichéd) self-driving car example. Start with the simplest choice: do you hit a cone or a person? Easy, by any metric the cone is the obvious choice. Even between a hedgehog and a person the ‘correct’ outcome preference is clear. Now consider the creeping grey area of more complex choices. Consider a cancer researcher and a criminal? A cancer researcher and 17 criminals? A cancer researcher versus a CEO of a FTSE-100 company? A criminal versus a CEO of a FTSE-100 company? A baby versus a CEO of a FTSE-100 company? A baby versus a retired CEO of a FTSE-100 company with one month left to live? One can go on.
One ‘rational’ optimisation solution to this problem might be to consider the net value to society the individual or object bequests. Perhaps if we want to consider a further iteration, net present value of the individual or object. But consider a final choice: a Bugatti Veyron or a person? Legislative procedures in states across the globe require a tag to be placed on the cost of human life, with UK government placing it between $7-9 million. Some other estimates are as low as $2 million. The Bugatti Veyron, the Malibu mansion, even the motorway bridge come out tops every time.
Under this value system, the machine kills and it does so ‘rationally’. Assume such information is acquired from an internet search – the algorithm finds the ‘price’ documented by all of human information online and makes decisions accordingly. Crucially, our way of valuing human life is inadvertently laying the foundations for human extinction. Thus, the specificity of where certain decision making processes are appropriate and where they are not arises as a key issue for existential risk.
More complexly, specifying a machine’s outcome preferences, the rules of right and wrong, creates a reflexive problem suggesting humans could be prosecuted for making the ‘wrong’ decision. “It is hard enough for humans to develop their own virtues, let alone developing appropriate virtues for computers” (Wallach and Allen 2008, p.8).
Appreciating how these decisions should be approached requires the identification of the troublesome ‘grey area’. Experimental research could help us learn more about how human subjects behave in different domains so that lessons can be applied to the demarcation of machine outcome preference. Identifying systematic deviations from ‘ruthless optimisation’ allows machine outcome preferences to be updated in a way consistent with complex human domain-specific processes. Considering Husserl’s “notion of a lifeworld”, whereby intelligent decision making relies on a “background of things and relations against which lived experience is meaningful” (from Shanahan 2015). Identifying target areas in which human subjective experience is required helps us make AI more human where it is required.
Decision-making theory assumes homo-economicus has perfectly rational preferences. One requirement of these preferences is transitivity. To have transitive preferences, a person, group, or society that prefers choice option ? to ? and ? to ? must prefer ? to ?:
?≻? and ? ≻? ⇒ ? ≻?
Preference transitivity can be tested by presenting subjects with lots of pairwise choices. Whenever preferences violate transitivity, this is where irrationality is salient in the decision-making process. If across the sample, similar places repeatedly arise, these represent target areas where something ‘human’ is overriding the pure value judgement. In order to greater understand whether nominal value helps or hinders the decision process, one treatment group could face ‘price tags’ on every choice whilst the other group make subjective value choices.
As Mero (1990) notes “those wishing to build artificial intelligence, all pry into the essential nature of reason” (p.1). Yet, this top down development approach of imposing human morals from decision data is not a panacea. Bottom-up approaches have also been proposed, whereby moral capacity evolves from general aspects of increasingly sophisticated intelligence (Wallach et al., 2008). Additionally, the simplicity of a laboratory experiment makes it an imperfect investigation into specifying domain-based machine decisions. Finally, Shanahan (2010) encourages us to consider Metzinger’s proscription. Introducing scope for moral decisions into a machine’s optimisation frame is concomitantly introducing the ability to suffer and the requirement of rights or freedoms.
Despite limitations, approaching AI research from a behavioural and experimental economics allows us to better understand how we make decisions ourselves before trying to understand how an AI does the same. This approach represents a preliminary first take at how we can begin to code the outcome preferences of artificial intelligence to ensure, at worst, we are not inadvertently causing our own extinction by wrongly publishing the value of our own lives online and at best, we align the goals of a human-level artificial mind with human-like decisions.