December 16, 2024 in Prompt Engineering
Enhance Innovation by Boosting Idea Generation with Large Language Models
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/orms.2024.04.03
Generative AI (GenAI) large language models (LLMs) are getting ever more powerful in their core function of answering questions. The models, which are based on neural network architectures, display nuanced understanding informed by context, tap into a vast human knowledge base obtained from massive training data, and generate existing and new patterns of information. These abilities are useful for many tasks including business applications such as automating customer service, generating reports and enhancing business communication. There is a double-edged sword, however, in that the new patterns of information can include hallucinations (i.e., incorrect answers without error awareness), but also creative and innovative ideas that are novel and useful.
The latter capability raises the question of whether LLMs can be a tool that helps businesses and other organizations enhance innovation by boosting idea generation. Several recent research studies have addressed this question by comparing innovative idea generation by humans alone to that of LLMs working independently after an initial request for information (also known as a prompt) by a human. The studies adopted creativity outcome metrics that cognitive psychologists and cognitive neuroscientists have used in their research on human creativity as an information process in the brain, including measures of novelty, originality, diversity, productivity and usefulness, as revealed in individual and group brainstorming. Here, we will take a closer look at these studies and consider ways that LLM performance may be enhanced through “prompt” engineering (i.e., improving the user input to the LLM) and other recent trends in GenAI LLMs.
Retail Product Innovation
Girotra and colleagues [1] asked product design students to generate ideas for retail products that could be sold to college students at a low cost of less than $50, including but not limited to ideas for products that do not yet exist, without needing the products to be clearly feasible. They then had the LLM GPT-4 perform the same task using the same prompt that the students received in one LLM condition and expanded that prompt in a second LLM condition by giving it a sample of well-received ideas from the students. A separate group of college students provided independent, crowd-sourced ratings of idea quality and novelty. Idea quality was rated by purchase intent.
The results revealed that the LLM was far more productive and efficient in idea generation than the students. In 15 minutes, the LLM generated 200 ideas for less than $1 (U.S. dollars) per idea, whereas during the same time period, students produced only five ideas for $25 per idea. Crucially, the LLM also outperformed the students in idea quality, showing greater average idea quality and producing most of the best ideas. However, for novelty, it was the students who showed a slight advantage. Finally, giving the LLM examples of high-quality ideas did not impact the quality and novelty of the LLM-generated ideas.
Novel Business Opportunities
Boussioux and colleagues [2] examined idea generation in the area of sustainable, circular business opportunities, an area that requires expertise from various relevant disciplines such as environmental science, economics, design and engineering. They compared human experts from these disciplines with GPT-4 in terms of the ability to generate ideas that are novel (i.e., different from existing solutions) and diverse (i.e., reflecting a variety of categories of information), have environmental and financial value, and are feasible to implement and scale. Both the human experts and GPT-4 were instructed about these metrics of success at the start of their ideation task. The human- and LLM-generated ideas were rated independently by a separate large group of human evaluators who came from diverse backgrounds and had passed a basic knowledge test about the circular economy.
The major findings were that AI outperformed humans on the environment value and financial value of the ideas, whereas human experts outperformed the AI on novelty and diversity. While staying below human levels of idea novelty, it turned out that the AI’s baseline novelty performance could be improved in two ways. A version of the LLM that mimicked a single human aiming for unique, distinct solutions turned out to produce ideas with greater novelty than a version of the LLM that mimicked multiple humans working independently in parallel. An improvement in novelty was also obtained by a diversity manipulation that involved enhancing the idea generation prompt so that the LLM would consider the expertise of personas from as many as 23 problem-relevant industries.
Divergent-Thinking Performance
Haase and Hanel [3] conducted a study that compared humans with various LLMs, including GPT4, in terms of their performance on the alternate uses task (AUT) – a frequently used cognitive test of individual differences in divergent thinking. The AUT is a timed test that instructs test takers to generate as many alternative uses as possible for everyday objects, for example, a brick (e.g., doorstop). Performance on the test is of much practical interest because it is a predictor of creative achievement in domains like art, science, math and technology. The human- and LLM-generated ideas were rated for originality. These ratings were given blind to condition (i.e., human- vs. LLM-generated ideas) and were performed separately by third-party human assessors and an LLM model trained with data from such assessors.
The major finding was that LLMs matched the humans in the originality of their responses. However, the LLMs produced 2 to 3 times more ideas than the humans. Finally, the human- and LLM-produced originality ratings were highly correlated in a statistically reliable manner, suggesting that AI could be used to assess this facet of idea generation. The use of AI not only for idea generation but also for idea evaluation is highly relevant for scaling innovation, especially because the use of LLMs for idea generation will result in a far greater number of to-be-evaluated ideas, pushing the human capacity for idea evaluation to the limit or necessitating crowdsourcing.
Creative Narration Assistance
Doshi and Hauser [4] recruited a diverse group of 300 adults from the U.K. for a well-controlled online experiment to find out whether they produce more creative and higher-quality stories with or without LLM assistance (using GPT-4). Study participants were first grouped into more versus less creative individuals based on scores from a standard test of verbal creativity, requiring them to generate a list of 10 words that are as different from each other as possible. They then wrote a short, eight-sentence story on a preselected topic in the non-LLM-enabled writing condition or an LLM-enabled one. In the latter condition, they could ask the LLM once to generate either one or five story ideas, each consisting of three-sentence story summaries. A separate group of assessors, who were regular readers and blinded to condition, evaluated story creativity as indicated by novelty, originality and rareness, and story usefulness as indicated by being appropriate, feasible and publishable.
Findings were that more creative individuals did not benefit from LLM assistance. However, less creative individuals did benefit from the assistance in terms of the creativity and quality of their stories. This effect was somewhat higher when writers could ask for more story ideas. However, across writers, there was a reduction in the diversity of the content of LLM-enabled stories compared with the stories by humans alone. Thus, there was a trade-off between individual and collective creativity.
Promising LLM Ideation
The results from these four empirical studies indicate that LLM-enabled generation of innovative ideas can outperform humans alone by vastly exceeding the latter in productivity and usefulness. However, results were mixed for idea novelty, with Study 4 on creative story writing showing greater novelty in the LLM-enabled than human-alone condition, studies 1 and 2 on business idea generation showing the reverse pattern, and Study 3 on creativity test performance reporting matched levels of novelty in the LLM-enabled and alone condition.
These results are rather promising for several reasons. First, new, more powerful LLMs continue to be released. Second, general pretrained LLMs can be enhanced with retrieval-augmented generation (RAG) involving giving the application programming interface (API) access to a domain-specific database, which could involve databases storing the latest analytics regarding what conditions produce optimal results in operations. Third, these LLMs could also be enhanced with a chatbot function that would allow the user to interact with a synthetic expert in innovation methodology, which in turn could be provided in the form of a domain-specific LLM that is trained in that specific area and could work in tandem with the general pretrained LLM. Fourth, there is an opportunity for empirical studies that consider the facilitative role of cognitive and brain factors in human-AI innovation teaming. Finally, in my experience, it should be possible to improve that teaming through better prompt engineering.
Prompt Engineering Recommendations
- DEFINITION: Ask for the generation of ideas that are creative as indicated by their high originality and/or diversity and define these two facets for the LLM, as necessary.
- DIFFERENTIATION: Ask for the generation of new ideas that are low versus high in originality and/or diversity.
- EXPLANATION: Ask the LLM not only for the generation of creative ideas but also for an explanation as to why they are creative.
- EXCHANGE: Expand a one-shot approach with a few-shot approach involving not one but several Q&A cycles that involve the human user instructing the LLM on how the creativity of its answers can be improved. Such a cyclic exchange mimics how a human innovation expert would work with their human assistant.
- FEEDBACK: Consider that automatic AI-based scoring of the novelty of ideas generated during each cycle could serve a useful feedback function for evaluating answers to successive prompts.
- COGNITION: Instruct the LLM in the application of evidence-based cognitive creativity strategies, such as (1) combining two familiar ideas that have not been combined before, (2) solving a problem in one domain of expertise by analogy to another domain of expertise or (3) disassembling objects into their component parts and reassembling them into useful novel object.
- INCUBATION: Mimic positive effects of problem incubation on insight problem solving by reinstating the LLM after each burst of ideas. This type of problem solving engages creativity through a novel restructuring of the problem, which helps us overcome mental fixation and is more likely to occur when prior ideas that got us stuck are less active or no longer active in our minds. It’s worth exploring whether this starting with a clean slate approach may also be achieved by alternating the target innovation task with a distractor task.
- NUDGING: Remind the LLM during each prompt cycle what ideas it has already generated and continue to request new ideas.
[exec_prompt_engineering]
Conclusion
The use of LLMs for innovative ideation has demonstrated potential, especially in terms of its ability to enhance the productivity of the idea generation and the usefulness of the produced ideas. It remains to be explored to what extent these benefits generalize to conditions beyond those of the few current published experiments on this topic. However, it is very likely that within the context of secure, responsible and transparent human-AI teaming, it will be possible to enhance and broaden these benefits. Improvements in the novelty of the ideas will be crucial in this regard and will likely be obtained through better RAG – AI chatbots specialized in innovation and prompt engineering. An interdisciplinary approach to this effort with contributions from operations research and cognitive science is likely to help businesses and other organizations be more competitive through more efficient and effective scaling of innovation, reinforced by the increased understanding of innovation as a creative individual and social process that the endeavor of human-AI innovation teaming will yield.
References
- Girotra, K., Meincke, L., Terwiesch, C., & Ulrich, K.T., 2023, “Ideas are Dimes a Dozen: Large Language Models for Idea Generation in Innovation,” The Wharton School Research Paper, https://ssrn.com/abstract=4526071.
- Boussioux, L., Lane, J.N., Zhang, M., Jacimovic, V., & Lakhani, K.R., 2024, “Generative AI and Creative Problem Solving,” Harvard Business School Technology and Operations Management, Unit Working Paper 24-005.
- Haase, J. & Hanel, P.H.P, 2023, “Artificial Muses: Generative Artificial Intelligence Chatbots Have Risen to Human-level Creativity,” Journal of Creativity, Vol. 33, No. 3, https://doi.org/10.1016/j.yjoc.2023.100066.
- Doshi, A.R. & Hauser, O., 2023, “Generative Artificial Intelligence Enhances Creativity but Reduces the Diversity of Novel Content,” https://ssrn.com/abstract=4535536.
Hendrik Haarmann is a senior cognitive scientist currently working at the Office of Innovation of the National Security Agency. Drawing from insights of his research on human thinking, language, memory and its brain basis, he provides the U.S. federal government and its partners with consultancy on the enhancement of learning, performance and human-AI teaming for new and sustained mission value.
