February 6, 2024 in Five-Minute Analyst

What If…Jarvis Were Real?

Nick Ulmer, CAP

SHARE: PRINT ARTICLE:

https://doi.org/10.1287/LYTX.2024.01.09

How do you define artificial intelligence, or AI? For some, it is just a smart thermostat on their wall. Others think about self-driving cars. I think about the many science fiction examples, including “Jarvis” in the Marvel Cinematic Universe (MCU). One might believe that the precursor to that future is being born in the form of large language models (LLMs), in which the world has seen massive growth this past year.

The field of LLMs got a lot of press with the release of proprietary models from companies like OpenAI, Google, Microsoft and others. However, many open-source options are also flooding the internet. They typically address various niche areas and increasingly look at smaller proprietary sources of training data. This has further led to a proliferation of ways to combine tools and generate countless tutorials on prompt engineering.

Personally, I’ve used various new LLM tools to act as a writing or coding co-pilot. I then take the recommendations and tweak them as necessary to fit my needs and conduct useful analysis on my data. This was in keeping with the generally good practice of not putting sensitive information into GPT chat spaces. If we are going to achieve the “Jarvis” experience, then we need to cut out the steps in the middle. We need to provide data to an AI, whether directly or through its own connections and collection efforts. Then we need to be able to just ask for the answer we seek – and interactively keep asking for more information and insights as we explore our data with our new AI analyst partner.

Rather than feed in sensitive data, I reached back to a previous “Five-minute Analyst” article and decided to attempt the same analysis, but with AI assistance. The article was an attempt to take a table of data on MCU movies and characters, with associated revenue and appearances, to quantify the worth of the various superheroes.

My first attempts were to directly utilize OpenAI ChatGPT and Google Bard. Both will allow you, for small data sets, to pass in CSV (comma-separated value) files. You literally copy/paste your data into the prompt. ChatGPT was able to provide basic statistics, such as movies with the most revenue, overlap of characters, etc., and opens the door for the user to ask more specific questions. It will tell you that ensemble movies usually generate more revenue but that there are some solo-character movies that also do well, such as “Black Panther” and “Doctor Strange.” I then became more direct with asking for ranking of characters based on their impact on movie revenue. This is where things began to go sideways.

ChatGPT did indeed give a ranking, providing a caveat that the ranking is subjective. But what it is really saying is that there are other factors, such as frequency of appearances in a movie. Its answer was effectively a sum of total movie revenue for each character for all the movies in which they make an appearance, ranked highest to lowest. Unsurprisingly, that puts Tony Stark at the top. I know this because I asked for the numeric values it used for the ranking and checked against the data set. I could do this because it was a small data set. Of note, ChatGPT refused to provide any visual representation such as a simple bar chart. When I explained that I wanted to weight the revenue for each character based on the number of characters in each respective film and redo the ranking, I found myself in a co-pilot situation. ChatGPT realized that this was better done through a Python script and even wrote a sample of one. But wait, there’s more.

After failing to get ChatGPT to give me what I wanted with direct questions, I tried to lead it through the math. I asked to do counts of characters with each movie and then, the number of movies in which each character appeared. And much to my disappointment, I got completely wrong answers. What I was effectively asking ChatGPT to do was count the number of binary responses, yes or no, in a matrix. It couldn’t. Oh, it did provide numbers, but they were wrong. Which led me to doubt all the prior calculations. This soured me on the idea that I had found a path to “Jarvis.” To its credit, when I provide my data to Google Bard, it simply told me it couldn’t do what I was asking. No wrong answer. No hallucinating. Just a simple “Sorry, I’m not the right tool for the job.” What now?

I chose to go back to ChatGPT, but this time through an API. I’m not the first person out there to want to use LLM prompts to conduct analysis of data. I found that there were many options, and I will leave it to someone with more than five minutes to explore them. I quickly landed on PandasAI, which is related to the Pandas package in Python. I set up an API balance to pay usage of ChatGPT and wrote some code to link that to a Python script in which I used PandasAI.

Figure 1: Quickly generated bar chart and table created using PandasAI, a small data file in CSV format, and a prompt using an API connection to OpenAI ChatGPT. The table is a count of appearances by character across the movies in the data set.

It is important to note that there are other open-source LLMs that won’t charge you for usage. I think I spent a total of 8 cents running various prompts through the ChatGPT API. But was it 8 cents well spent?

First, PandasAI was able to quickly provide bar charts of whatever I wanted, as well as nicely formatted tables. See Figure 1, in which I got a plot of the top five movies by revenue and an accurate count of movie appearances by character – emphasis on “accurate!” With a little more effort, I was also able to count the characters that showed up in each movie. Through some leading questions for the math, I was able to get to a weighted ranking (of sorts) as previously discussed.

Although I didn’t quite get there the same way, the results were useful and similar to those in the previous five-minute article. And even more useful was that with simply good prompt engineering, I was also able to quickly create more interesting plots (as shown in Figure 2) of the characters by movie and comparison of revenue and ensemble casts. It is clear from the plot that there is a relationship, but movies like “Black Panther” can break the rules. And in the opposite way, movies like “Avengers: Civil War” can fail to live up to expectations. The tool’s quick exploration combined with more time, and pennies, could easily generate more insights and visuals. But no one is paying me to talk about the MCU (yet!).

two plots created using ChatGPT prompts — Figure 2: Two plots created using OpenAI ChatGPT prompts passed via API using PandasAI package. Very quickly helping visualize the small data set of MCU movies vs. characters is a positive of the increasing reach of LLMs into the everyday life of the analyst. Unlike my past efforts, this was not co-pilot code, but simply providing the data and writing a request for the desired visuals.

In the end, we don’t yet have “Jarvis,” but we are getting closer. For now, rest assured that analysts will still need to bring more to the table than being savvy prompt engineers.

Nick Ulmer, CAP
([email protected])

Nick Ulmer, CAP, has been an operations research analyst since 2014. He is the inaugural chair of the INFORMS Military Veterans Interest Forum and a Principal Operations Research Analyst for CANA LLC, leading teams of analytics professionals to produce high level analytics products across federal and commercial domains.

Keywords: