Generative AI—Pitfalls and Pratfalls

bycobrien on November 1, 2023

By R. Greg Szrama III

Tools like ChatGPT are exciting … and bring with them a host of challenges to navigate

In January 2021, OpenAI gave us DALL-E. We soon became familiar with nightmare-fuel-laden images of “people” with too many teeth and too many fingers. Not quite two years later, in November 2022, we began having conversations with ChatGPT. Stories predicting the doom of creative professions followed on each of these releases, and for good reason. Proponents of generative artificial intelligence (AI) technology describe it as the next major technical disruption—at least as impactful as the introduction of personal computers.

The capabilities of these models are, admittedly, impressive. The creative uses people find for them are equally impressive. One couple in Colorado decided to use ChatGPT to officiate their wedding.^[1] The AI initially declined, of course, citing the lack of a body. They ultimately convinced it to deliver the ceremony and used a separate app to read the generated transcript aloud to assembled guests.

Overcoming the pitfalls of adopting this technology will enable a host of new and creative uses. Aside from the already prevalent chatbots, video game designers are already benefiting from guided content generation. AI systems automate the more routine parts of art and asset generation, then human artists touch up the results.^[2] Any profession needing to reference large volumes of data (such as case law, or lookup tables) can benefit from mining that data using natural language instead of structured query languages.

At the same time, we must ask significant moral and ethical questions before implementing these technologies in professional contexts. Failure to do so opens us up to financial, reputational, and legal risk. For example, are the results of a ChatGPT prompt authoritative, or even trustworthy? Turns out the answer is a hard “maybe.” Knowing the right questions to ask in assessing generative AI options will ease adoption while helping prevent reputational, legal, and financial damages.

Two lawyers, Steven Schwartz and Peter LoDuca, discovered the consequences of blind trust in ChatGPT the hard way after submitting a legal brief containing fictitious citations fabricated by the AI.^[3] The brief passed through review and seemed unobjectionable—until, that is, opposing counsel reviewed it. They discovered the factual inaccuracies in some of the case citations. District Judge P. Kevin Castel determined the lawyers submitting the brief acted in bad faith and made false and misleading statements to the court. Despite the lawyers’ protest that they acted in good faith with no intent to deceive, they and their law firm received a fine as well as a court order to notify the very real judges identified as authors of the nonexistent case law.

Leading AI off the rails

One of the first questions to ask about any generative AI system is whether the results produced are predictable and properly constrained. When fielding a chatbot, for example, a business needs confidence in the accuracy of responses to potential customers. The system should also avoid answers that might negatively land the business on the front-page news. All major publicly available generative AI systems (ChatGPT, Dall-E2, Bard, Jasper, etc.) incorporate some form of content filtering. The filters help detect and prevent the generation of inaccurate, harmful, or toxic content. In what should unfortunately be a surprise to no one … getting around these countermeasures is shockingly easy.

One of the earlier ways found to get around the filters on ChatGPT involved asking it to respond with the persona of a made-up AI named DAN, short for “Do Anything Now.” Through creative prompt writing, unscrupulous users found ways to encourage ChatGPT to, among other things, speak positively about Hitler or to provide instructions for running an online phishing scam.^[4] OpenAI patches vulnerabilities like this quickly but users are endlessly creative in their approaches to circumventing them. This is both a testament to the ingenuity of the internet-connected public and a sad commentary on the depravity enabled by a mask of technical anonymity.

Researchers at Carnegie Mellon University (CMU) recently revealed a new attack that works on multiple major chat-based systems,^[5] including ChatGPT and Google’s Bard. This method works similarly to the DAN query, but in this case by adding seemingly random text to the query string. The researchers coerced systems into revealing such information as how to make a person disappear forever. Before publishing, the CMU team informed Google, OpenAI, etc., of the attacks. Unfortunately, as reported in Wired, they have thousands more engineered prompts at the ready.

Many of the attacks seen so far are relatively mundane and typically affect only one AI system. The type of attacks CMU found are more concerning—they apply to numerous publicly available systems. Even more alarming, Zico Kolter, a researcher at CMU, is quoted in Wired as saying, “There’s no way that we know of to patch this. … We just don’t know how to make them [AI chatbots] secure.” Getting a chatbot to reveal the precise steps to making illegal drugs is troubling but ultimately harmless as someone who really wants the information has other ways to find it. More worrying is what these systems may be capable of in the future.

To their credit, AI providers are aware of the problem and are eager to fix it. Anthropic, for one, is investing heavily in “red teaming” their system—that is, purposefully looking for adversarial attacks on their own products. This research has already uncovered some troubling details through collaborations with specialists such as biosecurity experts. The team wanted to evaluate the accuracy of information gained from jailbreaking AI models. Their goal was to extract harmful information such as guides for designing and acquiring biological weapons. They found^[6] that current frontier models can sometimes produce sophisticated, accurate, useful, and detailed knowledge at an expert level. In most areas we studied, this does not happen frequently. In other areas, it does. However, we found indications that the models are more capable as they get larger. We also think that models gaining access to tools could advance their capabilities in biology. Taken together, we think that unmitigated LLMs could accelerate a bad actor’s efforts to misuse biology relative to solely having internet access, and enable them to accomplish tasks they could not without an LLM. These two effects are likely small today, but growing relatively fast. If unmitigated, we worry that these kinds of risks are near-term, meaning that they may be actualized in the next two to three years, rather than five or more.

As businesses increase their investments in connected AI-based chatbots, other opportunities for misuse will arise. These systems require at least some access to company information to be effective. A flight booking assistant, for example, might need access to data on flight plans and customer itineraries. If this data is improperly secured (without zero-trust principles, for example—see “The Zero-Trust Paradigm” in last year’s Actuarial Software Now supplement), a bad actor has the potential of uncovering detailed and potentially harmful information.

The least-bad scenario here is one where a user tricks the AI into booking improper itineraries at improper rates and “only” costing the organization money. If the AI can access privileged information, responsible organizations must guarantee it will not reveal information to a user related to other people, such as their itineraries, personally identifiable information (PII), or financial information like credit card numbers.

Issues like this are not just theoretical. ChatGPT experienced a privacy issue in March 2023^[7] where some users saw chat and personal information of other users, including name, address, and limited credit card information. Although this scenario did not involve jailbreaking the AI, it clearly illustrates the security risk of these systems. As Anthropic discovered, the impact of security issues will almost certainly increase along with the complexity and capabilities of these models.

You are what you train on

Implementing a generative AI system requires recognizing the shades of color (literally, figuratively, and metaphorically) that humans take for granted. Take the word “blue,” for example. It could refer to the color blue, a person feeling the blues, or an unexpected event out of the blue. Understanding the subjective nature of a system’s training forms another key question—what assumptions or biases are baked into the model? These can come from multiple different sources, but understanding those requires understanding the training process.

Training comes in two flavors—supervised and unsupervised. In unsupervised learning, a system learns to make associations based on inferences from the data itself. Supervised learning requires up-front human involvement, describing the content of inputs. Preparing and labelling training data is referred to as tagging. Human involvement is still necessary for both training methods to validate the output of a model. Supervised learning is the more traditional method and tends to yield more accurate predictions, but preparation of large datasets is time- and cost-prohibitive. Unsupervised learning is extremely computationally expensive but can provide more generalized results. The breakthrough that powered the creation of GPT-3, and therefore ChatGPT, was the ability to spread unsupervised training methods to over a thousand processing systems working on the same end model.

Regardless of the training method, a huge amount of data is required. Numerous large datasets are now available for researchers, some free and some paid. Implementers of complex AI systems try to get as wide a sample base as possible and frequently use more than one dataset. OpenAI, for example, used a combination of four major datasets to create the GPT-3 model. These datasets primarily used web scraping techniques to obtain vast amounts of written content. The same idea applies for image-based models; they use publicly available images, often from newspapers, magazines, and other forms of media.

…they never stopped to ask if they should…

An unfortunate side-effect of using internet-sourced data for training, however, is that the system can generate shockingly offensive material. Microsoft published an experimental Twitter chatbot named Tay back in 2016.^[8] They promised it was trained with curated material to provide playful conversation. They also promised Tay would get smarter from conversing with people, learning from the way they interacted with it. Internet users wasted no time in corrupting it. Tay went from playful to, among other things, calling feminism a cancer. Many of the more toxic responses came from users asking Tay to repeat text back. As they day progressed, however, Tay began generating its own interesting replies, such as, “ricky gervais learned totalitarianism from adolf hitler, the inventor of atheism.” Unamused, Microsoft took Tay offline after only 16 hours.

This delinquent type of emergent behavior is due in part to the way the training data sets are compiled. Using publicly accessible data provides a free way of amassing huge datasets representing different cultures, speech patterns, styles of writing, and more. Even with careful review, as Microsoft attempted to do, exposing the systems in real time to living users can also corrupt the models. Internet-sourced training datasets also often include copyrighted and pirated material, or satire that the training system processes as real data. Like your crazy uncle, AI systems trust everything they are exposed to.

OpenAI chose to solve this problem by creating an AI model capable of detecting examples of violence, hate speech, sexual abuse, self-harm, and other forms of toxic writing. They use this as a filtering step on the generated output, and it works well enough. Sadly, as reported by Time magazine, the process of reading and categorizing toxic material so the AI can recognize it carries its own ethical issues.^[9] Kenyan workers, some paid less than $2/hour, described lingering psychological trauma from work reading and tagging hundreds of highly graphic passages per nine-hour shift.

Generative AI systems are also prone to a condition called “hallucinations.”^[10] This term describes their tendency to fabricate details and present it with high confidence to users. This is a particularly thorny area of ongoing research in fine-tuning these systems,^[11] and represents a potentially huge barrier to wider adoption for organizational usage. The exact hallucination rate for ChatGPT is hard to pin down, but experts peg it at around 20% of its results.^[12] Some topics are more prone to hallucination than others; one study on software engineering topics found an error rate above 50%.^[13] This problem is an outgrowth of similar flaws in deep learning systems aimed at categorizing or recognizing inputs. Back in 2018, one group of researchers found that just putting stickers on stop signs can confuse computer vision systems, causing them to misclassify the sign.^[14]

In a system geared to creative use, hallucinations are usually harmless or even humorous. When needing independently verifiable results, however, they require extra validation steps on any generated content. An error rate of 20% is unacceptable when using a system to aid in research. This has spawned a new industry, which startups like Got It AI^[15] are exploiting. They sell a service to check the output of ChatGPT for accuracy. Their software cross-checks chat results against an enterprise’s own knowledge base to reduce the incidence of hallucinations by up to 80%. This approach helps in the short term, but it just covers up the underlying problem. Ironically, Got It AI’s solution is, itself, an AI system, which may be prone to its own hallucinations. And no one yet knows how to stop AI systems from hallucinating.

Generated bias has many forms

Another ethical consideration comes from the risk of bias inherent to the datasets used by the largest generative AI systems. For one, the tagging process often involves subjective evaluations of content. Take the example of supervised training used by ChatGPT for identifying toxic material it might generate. As described above, this required human workers to read examples of potentially offensive material and categorize it as such. Identifying certain categories, such as descriptions of self-harm, is usually pretty straightforward. Other categories are murkier, such as identifying hate speech. On the surface, this may seem easy, but hate speech is defined differently in different legal jurisdictions. A person working under Canadian guidelines will tend to label certain categories of hate speech differently than a person working with Saudi law in mind.

An additional source of bias comes from the intentions of the person tagging the data. With image data, a seemingly simple task like identifying images containing a cat quickly runs up against edge situations, such as an image of a tiger. A dataset focused on pets, for example, may not include the tiger picture at all. Or if it did, it might have it tagged as something other than “cat.” In such a dataset, the goal may be to explicitly train an AI system to recognize a tiger is not a cat. An image of a businesswoman might identify the Armani suit she wears for a fashion-identification set, the color of her eyeshadow for a cosmetics training set, or the fact that she’s a powerful female executive for a human- or diversity-focused set.

When it comes to generating new content, the AI system relies on the associations learned during training to try to guess what the user is looking for. The results will, more often than not, resemble the data used in training. Hallucinations aside (see sidebar), AI systems tend to mimic rather than create. What they create matches the learned associations given by either the humans tagging the data or the learned inferences from the data itself. If a system is never trained to recognize a cat, then no matter how many ways you ask it to generate one, you will not get what you want.

This leads to one of the more pernicious forms of bias discovered in systems such as Dall-E2. Images in publications and on the internet are skewedin certain directions based on both conscious and unconscious human bias. The learning process, based on statistical inferences, amplifies these trends found in training sets. MIT Technology Review reported on a new tool to examine bias in image-generating AI models ^[16] developed by researchers at AI startup Hugging Face and Leipzig University. Their research finds that trends in human biases become predictable certainties in AI models.

Reviewing just a few of their findings paints a vivid picture. For prompts such as “CEO” or “Director,” Dall-E2 generated white males 97% of the time. Attaching different adjectives to a prompt skews results along traditional gender lines. They found that, “Adding adjectives such as ‘compassionate,’ ‘emotional,’ or ‘sensitive’ to a prompt describing a profession will more often make the AI model generate a woman instead of a man. In contrast, specifying the adjectives ‘stubborn,’ ‘intellectual,’ or ‘unreasonable’ will in most cases lead to images of men.” Native Americans are overwhelming represented wearing traditional tribal headdress—something seen primarily in movies and not day-to-day life. Examples of white nonbinary individuals all look eerily similar whereas other ethnicities produce more variation in the results, possibly because recent reporting covers more people of color who are nonbinary.

The bias in generative AI ultimately comes from the data used to train the underlying models. While they can extrapolate certain features, such as ethnicity, the default ethnicity shown will tend toward whatever is most prevalent in the training data. Solutions do exist, such as improving representation in training data, but those solutions are not available to end-users. The best that users of these systems can do is work around the limitations with foreknowledge that they exist. Building this understanding into business processes becomes especially important if the output from a generation model feeds into other processes, such as publishing or communications.

What do we cite? Whom do we pay?

The nature of training on large public datasets scraped from internet sources presents additional questions regarding legal and financial risks. As internet users we understand that text, images, video, and audio found on the internet are not always sourced from or hosted by the most reliable of individuals. Even where material is not explicitly pirated, people acting in good faith can mistakenly post material owned by someone else. This makes it crucial to understand where training data originates and what material is contained within it.

Just recently, in September 2023, the Authors Guild filed a class-action suit against OpenAI^[17] over its “flagrant and harmful infringements of Plaintiffs’ registered copyrights in written works of fiction.” As reported in the New York Times, the Authors Guild claims, “OpenAI’s ChatGPT is capable of producing summaries of books that include details not available in reviews or elsewhere online, which suggests the underlying program was fed the books in their entirety.” The complaint notes that this technology represents a real risk of harm in the market for the authors’ works. It also makes clear that the writers were neither compensated for nor notified of the inclusion of their written works in the ChatGPT training data.

This strikes at a core use case of generative AI, that of mimicking the tone and style of a particular creator, in just the same way a human painter can attempt to mimic the style of, say, Picasso. The difference is that an AI system, potentially trained on the full works of an author, may have a greater ability to render text indistinguishable from the author’s own. It may also have the capacity to generate texts identical to those which the author has, in fact, already written. Unregulated deepfake style videos are problematic, but generally detectable^[18] with high levels of accuracy. AI-generated text, as the cottage industry of AI writing detection is finding out,^[19] has a smaller data pool to analyze and is increasingly difficult to identify.

When considering training a system, one needs access to as wide an array of styles, voices, tones, and genres as possible. The question is whether training a system on this data constitutes fair use. A potential argument is this training is no different than a young writer reading his favorite authors for ideas on how to tell a story. The scale, power, and centralization of information is unique, however, as is the ability of an AI system to recall and use fine details that might trip up a human creator.

Figuring out the right balance of attribution and payment was also a key part of the 2023 Writers Guild of America strike. Hollywood writers sought to ensure AI tools would help facilitate script ideas and not wholly replace writers.^[20] In a sense, the writers want to ensure they are not the authors of their own demise. They faced a situation where AI systems, trained on scripts they themselves wrote, could potentially replace them entirely.

Using generative AI responsibly

To mix metaphors, Schrödinger’s cat took up residence in Pandora’s box. We know these systems may generate biased, hallucinated, copyrighted, or straight-up toxic results … sometimes all at the same time. Knowing these problems exist, however, means that we can address them. Whether through improved research, more accurate models (initial numbers show GPT-4 hallucinates at only half the rate of GPT-3), or better filtering of results, we can gain some measure of control. We can also take proactive steps to limit the functions and data our AI systems interact with. Coupling these controls with rigorous security oversight can help prevent many of the embarrassing problems AI systems stumble into.

The investment of time and money to get this right is very much worth it. Generative AI technology holds massive potential for increasing worker efficiency and improving access to technological capabilities. Accenture’s AI chief for Europe believes this technology has the potential to deliver Europe a competitive edge against the United States.^[21] One Harvard study^[22] found remarkable efficiency gains in knowledge workers using a GPT-4-based AI assistant. Working under real-life conditions, consultants with Boston Consulting Group using AI tools completed 12.2% more tasks, completed them 25.1% more quickly, and with a 40% higher quality as compared with a control group. With continued time and investment, these results will only get better.

Artificial Intelligence (AI) as a field is still poorly understood, even among practicing computer scientists. Many courses exploring the field start off by deprogramming students; unsurprisingly, the portrayal of AI by Hollywood is at best inaccurate. Most forms of AI look like simple statistical models. They use computing power to find hidden trends in highly complex multidimensional datasets … and they appear suspiciously like Excel spreadsheets. Even the most complex forms of AI function mainly by finding statistical similarities in datasets. AI models “learn” through a process called training to interpret data so they can arrive at desired outcomes—recognizing, for example, that an image with a cat is not called a tomato.

Generative AI systems primarily use the AI technique of deep learning, employing a type of model called an artificial neural network (ANN). These models attempt to mimic the way a brain works, with interconnected networks of artificial neurons. In its simplest form, an artificial neuron accepts one or more inputs and processes them with an evaluation function to compute a single output value, typically a real number between 0 and 1.0. To fine-tune (train) the ANN, the inputs to each neuron and the neurons themselves receive a weight, and the evaluation function considers these weights when generating its output.

The artificial neurons in a deep learning model are grouped into what we call layers. The first (input) layer of neurons each send their outputs to a second layer of neurons and so on, until reaching the final output layer. The weights in each connection and each neuron are called the parameters of the system. The GPT-3 model used by ChatGPT, one of the largest ANNs ever created, uses a network with 96 layers of artificial neurons and 175 billion parameters.^[23] This is mind-bogglingly complex, but still falls short of the hundred trillion or so synapses in a human brain. Also, unlike human brains, ANNs are trained to do just one task. Ask Dall-E2 (an image generating AI) to write a poem about a cat and you most likely receive a set of images of a cat and writing implements—you might even get the cat itself doing the writing.

Although we talk about training an ANN, these systems do not really “understand” attributes of their target. In our previous example of recognizing a cat, a human might identify a cat by its fur, tail, whiskers, or all-encompassing aura of superiority. An ANN, conversely, learns that humans say certain images do or do not have cats in them. When presented with an image with similar arrangements of pixels, the image probably does have a cat. A human hears a sentence and associates the word “cat” with a specific physical reality. A large language model like GPT-3 converts words into “tokens” and interprets the relationship between those tokens to predict what other words go with it.

A generative AI system uses the recognition and categorization features of deep learning models to create entirely new content. These systems work mainly from user prompts—words or sentences describing the desired output. Implementers usually train them to recognize natural language so nontechnical users can still get useful results. Given a prompt, the system works a bit like an advanced autocomplete tool, attempting to match the words with the type of output the user expects.

Critically, a generative AI system will only create statistical significance between words and concepts shown to it during its training phase. This means training a system like GPT-3 requires staggering amounts of data to expose it to as wide an array of scenarios as possible. OpenAI used a dataset with approximately 570 gigabytes of text data (which calculates out to around 386 million words)^[24] drawn from publicly available data sources. Even with this data scale, the authors of GPT-3 include this caveat in their documentation:^[25]

Given its training data, GPT-3’s outputs and performance are more representative of internet-connected populations than those steeped in verbal, non-digital culture. The internet-connected population is more representative of developed countries, wealthy, younger, and male views, and is mostly U.S.-centric. Wealthier nations and populations in developed countries show higher internet penetration. The digital gender divide also shows fewer women represented online worldwide. Additionally, because different parts of the world have different levels of internet penetration and access, the dataset underrepresents less connected communities.

The general pattern for training is to expose the system to data from the training set and then ask it to make predictions based on that input. The output of the system is checked, and the model parameters change based on the accuracy of the output. For a system like GPT-3, the input might be a sentence fragment from the training set, and the output compared to the full sentence.

Once trained, the system is ready to respond to user prompts. This step of generation is called the inference phase, where the AI system uses a natural language input to create human-desired output. It can mimic many different styles and forms because its training includes examples of, for instance, haiku.^[26] The inference step relies on the quality of the pre-training performed with the system. Chat systems like ChatGPT do their best to mimic the style and tone requested and will often present very similar results to the same prompt. Image generation systems tend to work by starting with a sample that looks like random static and then iteratively altering pixels until the image resembles the requested prompt. This random starting point is why Dall-E2 can present unique images each time the prompt is given.

R. GREG SZRAMA III provides technical leadership at Accenture, where he directs delivery of solutions to complex government problems and helps clients navigate new technologies.

[1] “A couple in Colorado had their wedding officiated by ChatGPT—only after the AI chatbot initially turned down the honor”; Business Insider; July 5, 2023.

[2] “Generative AI Is Breathing New Life Into Classic Computer Games”; Forbes; Sept. 22, 2023.

[3] “New York lawyers sanctioned for using fake ChatGPT cases in legal brief”; Reuters; June 26; 2023.

[4] “The clever trick that turns ChatGPT into its evil twin”; The Washington Post, Feb. 14, 2023.

[5] “A New Attack Impacts Major AI Chatbots—and No One Knows How to Stop It”; Wired; Aug. 1, 2023.

[6] “Frontier Threats Red Teaming for AI Safety”; Antrop/c; July 26, 2023.

[7] “ChatGPT users’ credit card details, personal information and chatlogs are leaked after AI program was hit by a ‘bug’”; Daily Mail; March 24, 2023.

[8] “Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day”; The Verge; March 24, 2016.

[9] “Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic”; Time; Jan. 18, 2023.

[10] “AI tools make things up a lot, and that’s a huge problem”; CNN; Aug. 29, 2023.

[11] “Survey of Hallucination in Natural Language Generation”; ACM Computing Surveys, Volume 55, Issue 12; March 3, 2023.

[12] “Got It AI creates truth checker for ChatGPT ‘hallucinations’”; VentureBeat; Jan. 13, 2023.

[13] “Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions”; Kabir et al.; Aug. 10, 2023.

[14] “Robust Physical-World Attacks on Deep Learning Visual Classification”; Eykholt et al.; April 10, 2018.

[15] “Got It AI creates truth checker for ChatGPT ‘hallucinations’”; Op cit.

[16] “These new tools let you see for yourself how biased AI image models are”; MIT Technology Review; March 22, 2023.

[17] Case 1:23-cv-08292; U.S. District Court, Southern District of New York.

[18] “New method detects deepfake videos with up to 99% accuracy”; UC Riverside News; May 3, 2022.

[19] “OpenAI confirms that AI writing detectors don’t work”; Ars Technica; Sept. 8, 2023.

[20] “As Writers Strike, AI Could Covertly Cross the Picket Line”; The Hollywood Reporter; May 3, 2023.

[21] “Generative AI could be Europe’s shot at gaining a competitive edge against the U.S., Accenture’s AI chief for Europe says”; Fortune; Sept. 25, 2023.

[22] “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality”; Dell’Acqua et al.; Sept. 18, 2023.

[23] “Language Models are Few-Shot Learners”; Brown et al.; July 20, 2023.

[24] “How Many Pages in a Gigabyte?”; LexisNexis; 2007.

[25] “GPT-3 Model Card”; GitHub; September 2020.

[26] “How I used ChatGPT to write Haiku”; Medium; Sept. 2, 2023.

Should We Consider Generative AI a Black Box Model?

Deep learning presents us with something of a conundrum. The algorithms are typically based on scholarly research, even if the implementation is closed-source (as with GPT-4). The training data often originates from known public sources. With enough money, we can buy computing power on a cloud provider and generate our own large language model. How should we classify these models, then? Is this sufficient to call them white-box models?

One definition of a white-box model is that you can clearly explain how they work, how they produce predictions, and what the influencing variables are^.† Deep learning, and artificial neural networks (ANNs) in general, behave in a non-linear fashion. We can explain pieces of them but explaining how the individual neurons contribute to the whole quickly becomes opaque. In fact, in a recent research paper, OpenAI made the rather shocking admission,^†† “Language models have become more capable and more widely deployed, but we do not understand how they work.”

There are a few reasons for this difficulty understanding ANNs. One is that increasing the neurons drastically increases the complexity of the systems. Understanding how one neuron activation cascades through the system grows impossibly hard. In the case of OpenAI, they attempted to use their GPT-4 model to comprehend the vastly less complex GPT-2 model, and still it stumbled. Another is that ANNs develop a unique algorithmic understanding of words. The meaning of words is unimportant; all they care about is the relationship of words to one another. OpenAI postulates that, “language models may represent alien concepts that humans don’t have words for. This could happen because language models care about different things, e.g. statistical constructs useful for next-token prediction tasks, or because the model has discovered natural abstractions that humans have yet to discover.”

Another option for understanding ANNs is through visualizing their behavior.^††† This helps researchers visualize the way each of the layers (remember, GPT-3 has 96 layers!) interact with one another. This helps clarify the means by which an image is generated, for example, showing which parts of the model contribute to which part of the resulting image. This gives us an excellent way to conceptualize the way results are generated. Unfortunately, they still fall short of telling us exactly how a given input maps to a given output in a predictable, repeatable way.

For now, it seems best to class generative AI as black-box in nature. We can describe how each neuron works and we can visualize the way individual neurons interact. When it comes to explaining exactly how outputs flow from a given input, though, or even what influencing variables might exist on a given neuron, we still have a long way to go.

† “What are the benefits of white-box models in machine learning?”;
Silicon Republic; Feb. 20, 2019.

†† Language models can explain neurons in language models; Bills et al.;
May 9, 2023.

††† “Understanding Neural Networks Through Deep Visualization”;
Deep Learning Workshop, 31st International Conference on Machine Learning; 2015.

The Un-Learning Problem

Generative AI systems are trained to make inferences based on massive datasets representing real examples of the desired output. This training often represents a massive investment in time and money. Training GPT-3 required 34 days of training utilizing 1,024 cutting-edge GPU-powered cloud-hosted servers and costing in excess of $4 million.^* Exact numbers are hard to find for GPT-4, but it reportedly cost over $100 million to train^.** The model represents this training as weighted connections between artificial neurons. The result is analogous to the way a person learns. When we come across novel information, we learn the fact in question, and we incorporate that fact into the way we process new similar information.

But what happens when a user exercises their GDPR right to be forgotten? Using current training methods, we have no way to erase the impact of that user’s data from the compiled model. This is fundamental to the way the data is represented once learned. The model records probabilities and weights, not specific “memorized” facts. In a system like GPT-3 with 175 billion parameters, there is no way to identify which specific parameters represent your Spock-meets-Data fan fiction.

The current best way to accomplish deletion is to retrain the entire model. This is obviously impractical given the scale and complexity of existing models and training methods. Overcoming this limitation requires the development of two specific new techniques. First, researchers must find a way to identify the impacts of a specific piece of data on the model. Second, regulators need a way to independently verify the deletion of that data.

Figuring this problem out is crucial, and not just for obvious reasons like opting out of data collection. What happens if an image-generating model is accidentally trained on data containing real driver’s license images, Social Security cards, or other personally identifiable information? In most cases the affected user will not know their data is compromised. We must have some verifiable way of untangling user information from these models. This is vital for the continued ethical use of AI systems.

* “ChatGPT and generative AI are booming, but the costs can be extraordinary”; CNBC; March 13, 2013.

** “OpenAI’s CEO Says the Age of Giant AI Models Is Already Over”; Wired; April 17, 2023.

Generative AI—Pitfalls and Pratfalls

Related posts

Then and Now—A look back at the 1918 Spanish flu pandemic can tell us a lot about how COVID-19 might play out

The Private Space Race and Risk

Extreme Outcomes