By David Driscoll
The 1960s was a great time to be a kid with a fondness for science fiction. All forms of sci-fi entertainment proliferated in the decade. Authors such as Isaac Asimov and Ray Bradbury regularly turned out new sci-fi stories, television offered sci-fi-themed series ranging from the ridiculous (Lost in Space) to the sublime (Star Trek), and several motion pictures that are classics of the genre were released. Among these, one stands out—2001: A Space Odyssey.
I was, in fact, a kid with a taste for sci-fi in the late 1960s, and no movie of the era impressed my like-minded friends and me more than 2001. For a time, within my juvenile social circle, you were nobody until you had seen it. It was unlike other science fiction movies we had seen because (among other things) a key character in the story was not a human or an alien but a highly developed artificial intelligence system called HAL 9000. HAL stood for “Heuristically programmed ALgorithmic computer.” Unlike other robotic computers we knew from sci-fi, HAL could conduct conversations in ordinary language in a human tone of voice, lie about its actions, sense threats to its existence (the capability of which was memorably enhanced by an ability to read lips), and commit murder. In the film, after HAL has dispatched the other members of the crew of the spaceship Discovery One, Dr. Dave Bowman, who has survived HAL’s attempt to kill him, disables HAL by disconnecting HAL’s circuits; as its functioning ebbs, HAL recounts its origins in 1992 at the HAL Plant in Urbana, Illinois (widely taken to be a part of the University of Illinois’ Consolidated Sciences Laboratory, which was—and is—a major research center for advanced computing).
Thirty-plus years after HAL emerged from the lab in Urbana, we find ourselves suddenly—and at times alarmingly—with easy access to artificial intelligence (AI) tools that embody many of HAL’s best and worst characteristics. The media these days are full of accounts of the amazing capabilities of AI tools generally and of OpenAI’s ChatGPT in particular. ChatGPT and its rivals (such as Google’s Bard) can conduct extended “conversations” with users, generate comprehensive reports on many topics with remarkable speed, and even pass examinations required for admission to institutions of higher learning and professional licensure. The first versions of ChatGPT could pass examinations but often did so without much distinction; in January, a group of law professors at the University of Minnesota reported that ChatGPT had passed the final exams in four courses they taught but had done so with mediocre grades. In March, when OpenAI released version 4 of ChatGPT, the company reported that it performed far better than its predecessors on many standard exams, including the LSAT, the GRE Verbal and Quantitative sections, and the Uniform Bar Examination. The latest and perhaps most impressive feat performed by ChatGPT, as reported in an upcoming book by Isaac Kohane—a Harvard Medical School faculty member who is both a computer scientist and a physician—was to pass a medical licensing examination with a high score and accurately diagnose a complex illness in a pediatric patient. Kohane reacted with both amazement at what ChatGPT was capable of and concern that its advice—already available to millions without a knowledgeable human intermediary—is not always “safe or effective.”
Those concerns are not trivial. ChatGPT can provide answers that are spectacularly wrong at times. Its knowledge of historical events does not presently extend beyond September 2021, so at this writing ChatGPT will tell you that Elizabeth II is the reigning monarch in the United Kingdom and Boris Johnson is the nation’s prime minister. The latest version of ChatGPT is better than its predecessors at caveating its responses with comments about the limits of its historical information. Still, at times, it provides answers that are simply incorrect, and the inaccuracies are more often subtle than obvious. Also, it has been known to gather information from some of the more squalid precincts on the Internet, with the result that its output can incorporate racist or otherwise offensive content. AI tools have been improving with respect to the accuracy and overall quality of their responses, but their output must still be scrutinized closely before being used. For some humorous examples of the potential of AI tools to produce incorrect or otherwise unhelpful responses, I would refer readers to the blog https://www.aiweirdness.com/.
Despite their lingering imperfections, the new AI tools enjoy increasing use in the workplace, and some writers have speculated—at times in sensationalist tones—about their potential to displace humans from their jobs. Could this be an issue for actuaries? With the increasing skill of AI tools at passing standardized examinations, those in a profession whose primary criterion for admission is success on competitive exams may understandably have such concerns. In 2021, the Society of Actuaries published a report on “Emerging Technologies and their Impact on Actuarial Science,” which discussed the potential uses of machine learning and artificial intelligence models in actuarial applications. The list of such uses presented in this report seems modest compared to what is now apparently possible. At the same time, the report’s observations that “the technology today still requires a great deal of human oversight in the creation and application of models” and that the lack of transparency in some of them “may be too restrictive for certain uses that require explainable models” seem as pertinent as ever.
In an effort to test the readiness of ChatGPT for deployment in actuarial services, I asked it a couple of questions that might come up in the work of an actuary.
First, I asked it to write a report for a client whose pension plans were newly subject to the reporting requirements of Section 4010 of ERISA. Specifically, I asked it to explain what Section 4010 covered, what triggers its application, and how its reporting requirements are fulfilled. ChatGPT generated a nicely written, compact report that accurately addressed all these matters.
Second, I asked ChatGPT to complete a basic actuarial math calculation. I gave it the name of a well-known U.S. pension mortality table and an interest rate, and I asked it to use those two assumptions to calculate the present value of an annuity of $100 per month payable to a 60-year-old male. ChatGPT responded with a formula that readers will likely find familiar from the study of the theory of interest: PV=(1–vn)/i, where PV is the present value, i is the assumed interest rate, and v=1/(1+i). For those whose copies of Kellison’s Theory of Interest are a bit dusty, this is the formula for the present value of an annuity-immediate. It then told me (as ChatGPT said it could not do it for me) to calculate n as the life expectancy of a 60-year-old male under the specified mortality table, and I would then have everything I needed to determine the answer to my question. This answer might well seem plausible to a non-actuary, but of course it is incorrect. As many generations of students of life contingencies have learned, calculating the present value of a life annuity by treating it as an annuity-certain over the payee’s life expectancy does not produce the correct answer. Moreover, there are details regarding the timing of the payments that I did not address in my prompt and would have to be known in order to answer the question correctly, and ChatGPT said nothing about them.
Consequently, while there are ample reasons to think that ChatGPT and the other emerging AI tools can be very helpful in the work of actuaries, there are also ample reasons to believe that their answers must be reviewed carefully before they are accepted and used. This leads to a broader question: What professional standards might apply to actuaries when using AI in the workplace?
In the Code of Professional Conduct, two precepts are particularly relevant. One is Precept 3, which requires that an actuary “shall ensure that Actuarial Services performed by or under the direction of the Actuary satisfy applicable standards of practice.” An actuary using AI tools would be well advised to treat their output as they would the work of a human subordinate who combines great efficiency with imperfect knowledge and judgment. The other is Precept 4, which states that an actuary “who issues an Actuarial Communication shall take appropriate steps to ensure that the Actuarial Communication is clear and appropriate to the circumstances and its intended audience, and satisfies applicable standards of practice.” AI tools have shown a remarkable ability to tailor the language of their responses to prompts that include some direction on that issue. However, because they are inanimate, they do not possess a reliable sense of nuance, of the particular sensitivities of those who read their output, or of their sophistication and ability to understand and act upon what they are told. Even technically correct output from an AI tool may require changes to address these considerations.
Finally, the AI tools that are so frequently in the news these days are known collectively as large language models. Actuaries using models are subject to the requirements of Actuarial Standard of Practice (ASOP) No. 56, Modeling. In the nomenclature of the ASOP, you are using a “model developed by others” if you utilize one of these tools in your work. This requires adherence to section 3.4 of the ASOP, which requires, among other things, that you make a reasonable attempt to understand how the model works and its “key strengths and limitations”—there is a lot in the press these days on these subjects. Additionally, section 4.1 requires that you disclose the extent of your reliance on such models in your reports.
Artificial intelligence is already changing, and will continue to change, how many professions work, and the actuarial profession will be no exception. By adhering to our professional standards while taking advantage of all they have to offer, we will deploy them to the maximum benefit of both ourselves and those who utilize our services.
DAVID L. DRISCOLL, MAAA, FSA, FCA, EA, is a past chairperson of the Actuarial Board for Counseling and Discipline.