Are You Ready for a ChatGPT World?
The hottest AI chatbot around probably writes better than you. Too bad it can’t always get its facts straight
A few weeks ago, a YouTuber named Greg Kocis posted a video in which he asked ChatGPT to write a song like Taylor Swift. In seconds, the artificial intelligence-powered chatbot penned two verses, a chorus and a bridge.
Die-hard Swifties may argue whether the lyrics are truly Taylor. But it’s easy to imagine the superstar singing them in a way reminiscent of her country hit “Our Song”. Consider ChatGPT’s first verse: “I’m driving down this lonely road; Trying to forget the past; But every song on the radio; Brings back memories that don’t last.”
And the Grammy award goes to . . . ChatGPT?
Since it launched in November, ChatGPT has become the talk of the town. It can write articles, messages, essays and, as Greg Kocis demonstrated, songs, using machine-learning algorithms. Users can ask it questions. ChatGPT answers quickly in clear, conversational sentences that might take humans many minutes to compose and perfect.
ChatGPT’s creator, the American research lab OpenAI, is attracting blue-chip investors such as Microsoft, which aims to add more artificial intelligence to its search functions, Office suite and other products. OpenAI may soon sell shares, and experts have pegged the company’s potential worth at US$29 billion. That would make it one of the most valuable startups ever.
But ChatGPT has also induced hand-wringing. Critics worry it will stifle creativity. Schools and universities, meanwhile, must grapple with how students will use the technology and whether it will hinder critical thought. ChatGPT, after all, can do more than write simple sentences. It passed both a Wharton business school MBA exam and the U.S. Medical Licensing Examination, according to two recent research papers.
So how good is ChatGPT? What is it lousy at? And how will such high-powered AI applications (including Google’s new Bard chatbot) fit into everyday business operations? To answer these questions and more, we spoke with Stephen Thomas, Distinguished Professor of Management Analytics at Smith School of Business and executive director of the school’s Analytics and AI Ecosystem.
ChatGPT is a chatbot, and chatbots aren’t entirely new. So why is there so much buzz about ChatGPT in particular?
ChatGPT presents a big step forward compared to previous chatbots. Namely, ChatGPT’s ability to produce human-like responses is most impressive. In addition, ChatGPT can follow directions when it stylizes and formats answers. If you ask it for a list of six things, it will provide you with a list of six things. If you ask it to sound like a pirate, it will sound like a pirate.
OpenAI trained ChatGPT on over one billion documents, so it has “learned”—or at least seen—a staggering amount of facts and relationships about the real world. That training allows it to sound intelligent on almost any given topic. What sets ChatGPT apart from other chatbots is that humans manually fine-tuned ChatGPT during the development process, a highly unusual and expensive task. So a team of human labellers asked ChatGPT a question and then manually wrote the desired response, from which ChatGPT would learn. They did this thousands of times.
What are the potential business applications for this type of AI?
We’re still in the early days of ChatGPT and similar chatbots, and the exact list of use cases is still being explored. I don’t think many business users are leaning on ChatGPT daily quite yet.
Still, there is excitement that ChatGPT could help in the creation of any boilerplate text: summarizing meeting notes, summarizing articles, generating ideas for marketing copy, writing standard email responses, writing reference letters, summarizing existing research on a given topic, producing code snippets, creating possible names of products, books or articles, and, of course, building a human-like chatbot for customer service.
Naturally, users in other domains are exploring ChatGPT for use in higher education, game development, the arts, music, health care and more.
What are the limitations of this technology at the moment?
Like most tech advances, especially in AI, the hype around ChatGPT has been tremendous and, in some cases, unjustified. While ChatGPT presents a big step in the right direction regarding sounding human-like—a goal that has eluded researchers for decades—ChatGPT still suffers from several significant limitations that would need to be resolved before business users can genuinely count on it.
The most critical issue, in my view, is that ChatGPT is often factually wrong, an issue sometimes called the “hallucination effect.” For example, ChatGPT will confidently tell you the distance from Chicago to Tokyo is 7,600 miles. The distance is actually 6,300 miles. It will tell you that the best ski hill near Kingston, Ontario is Mad River Mountain in Ohio, which is more than nine hours away. It will give you a detailed, step-by-step answer to a calculus question that looks entirely feasible but is incorrect. ChatGPT is not always wrong, but you never know when it is or isn’t without consulting some other source.
These correctness issues arise because ChatGPT does not try to be correct and couldn’t check its correctness even if it wanted to. ChatGPT is not a “fact machine”, it is a word re-arranger. ChatGPT is not connected to any knowledge bases, datasets, lists or other curated knowledge sources. Instead, ChatGPT generates a sequence of statistically probable words that it thinks it might have seen somewhere in its training data. Whereas humans have a mental model of the world and use words to communicate that model, ChatGPT only works at the word-manipulation level without an underlying model of the world. It has been described as a statistical parrot. It doesn’t know the meaning behind any of the words it generates, only that the words seem to “sound about right.”
These issues are not lost on the research community. Fortunately, many teams—including researchers here at Smith—are working to address the correctness issue. We must be patient, though. These problems are challenging and will take time.
There’s a fear that ChatGPT will take the human element out of writing and make communications bland and robotic. Is that a concern?
The fear is justified. ChatGPT will take the human element out of writing if that’s how we use it. But we don’t have to. A similar fear has been around since the beginning of AI and technology in general. Very seldom, however, has the fear been realized. Usually, technology automates tasks that humans don’t want to perform, allowing humans to spend more time on human activities: being creative, innovating, managing human relationships, showing emotion, planning and thinking strategically.
I predict the same for ChatGPT. No one will use ChatGPT to write a fantasy novel. To be clear, people will try, but the books will be awful. No one will use ChatGPT to write sensitive emails to clients. No one will use ChatGPT to create their oh-so-important sales pitch deck. No one will use ChatGPT to create a new cancer treatment. No one will use ChatGPT to write anything of vast business importance.
Instead, ChatGPT will shine at automating the simple, redundant, boring things: responding to customer service questions like “when will my shipment arrive?”, writing quick responses to standard email questions, summarizing documents into two or three bullet points, or coming up with ideas that a human later edits.
What has your experience been like using ChatGPT and what have you learned?
I’ve been in the natural language processing R&D space for over 15 years, so I approached ChatGPT more skeptically than most. After all, it hasn’t been that long since Meta pulled its chatbot for being offensive or that a Google engineer claimed that a Google chatbot was sentient.
Upon using ChatGPT, I was genuinely impressed with the human-like responses and ability to follow directions. After a bit of careful prompt engineering, I got it to draft an introduction to a newsletter I was writing. I even asked ChatGPT to give me a murder mystery version and a sports announcer version of that newsletter, and it did a great job. It was tons of fun. I had tasted the Kool-Aid.
Next, out of curiosity, I asked ChatGPT to generate a course outline for my upcoming graduate course. It took about three seconds. And to a layperson, the outline might pass muster. The output certainly looked like a course outline and had many of the appropriate keywords and formatting. To me, though, the outline was complete rubbish. It contained made-up textbooks and references, topics that either didn’t make sense, were too vague or in the wrong order and other glaringly obvious, even embarrassing, mistakes. It was also unclear that the outline would have even been a helpful starting point.
As I continued to play around with ChatGPT, I quickly noticed that ChatGPT’s engineers have put several filters around certain sensitive and inappropriate topics. While seeming laudable at first glance—releasing a chatbot with no filters has given Microsoft trouble in the past—such filters promptly raise difficult questions about whether OpenAI, a private U.S.-based company, should be in the position to decide what is and is not appropriate and correct for the rest of us.
My big takeaway thus far is that ChatGPT is fantastic at sounding human, summarizing and formatting its output just the way I want. But I cannot yet trust the output for factual correctness.