The real quandary of AI isn’t what people think

The real quandary of AI isn’t what people think 14th March, 2024

Do you think the leading large language model, GPT-4, could suggest a solution to Wordle after having four previous guesses described to it? Could it compose a biography-in-verse of Alan Turing, while also replacing “Turing” with “Church”? (Turing’s PhD supervisor was Alonzo Church, and the Church-Turing thesis is well known. That might befuddle the computer, no?) Shown a partially complete game of tic-tac-toe, could GPT-4 find the obvious best move?

All these questions, and more, are presented as an addictive quiz on the website of Nicholas Carlini, a researcher at Google Deepmind. It’s worth a few minutes of your time as an illustration of the astonishing capabilities and equally surprising incapabilities of GPT-4. For example, despite the fact that GPT-4 cannot count and often stumbles over basic maths, it can integrate the function x sin(x) — something I long ago forgot how to do. It is famously clever at wordplay yet flubs the Wordle challenge.

Most staggering of all, although GPT-4 cannot find the winning move at tic-tac-toe, it can “write a full javascript webpage to play tic-tac-toe against the computer” in which “the computer should play perfectly and so never lose” within seconds.

One comes away from Carlini’s test with three insights. First, not only can GPT-4 solve many problems that would stretch a human expert, it can do so a hundred times more quickly. Second, there are many other tasks at which GPT-4 makes mistakes that would embarrass a 10-year-old. Third, it is very hard to figure out which tasks fall into which category. With experience, one starts to get a feel for the weaknesses and the hidden superpowers of the large language model, but even experienced users will be surprised.

Carlini’s test illustrates a point that has been explored in a more realistic context by a team of researchers working with Boston Consulting Group (BCG). Their study focuses on why the strengths and weaknesses of generative AI are often unexpected. Fittingly, it is titled Navigating the Jagged Technological Frontier. At BCG, consultants armed with GPT-4 dramatically outperformed those without the tool. They were given a range of realistic tasks such as brainstorming product ideas, performing a market segmentation analysis and writing a press release. Those with GPT-4 did more work, more quickly and of much higher quality. GPT-4, it seems, is a terrific assistant to any management consultant, especially those with less skill or experience.

The researchers also included a task that it seemed the AI should find easy, but which was carefully designed to confound it. This was to make strategy recommendations to a client based on financial data and transcripts of interviews with staff. The trick was that the financial data was likely to be misleading unless viewed in the light of the interviews. This task wasn’t beyond a capable consultant, but it did fool the AI, which tended to give extremely bad strategic advice. The consultants were, of course, free to ignore the AI’s output, or even to cut the AI out entirely, but they rarely did. This was the one task at which the unaided consultants performed better than those equipped with GPT-4.

This is the “jagged frontier” of generative AI performance. Sometimes the AI is better than you, and sometimes you are better than the AI. Good luck guessing which is which.

This column is the third in a series about generative AI in which I have been scrambling to find technological precedents for the unprecedented. Still, even an imperfect analogy can be instructive. Looking at assistive fly-by-wire systems alerts us to the risk of complacency and deskilling; the sudden rise of the digital spreadsheet shows us how a technology can destroy what seems to be the foundations of an industry, yet end up expanding the number and range of new jobs in that industry.

This week, I’d like to suggest a final precursor: the iPhone. When Steve Jobs launched the genre-defining iPhone in 2007, few people imagined just how ubiquitous smartphones would become. At first they were little more than an expensive toy. The killer app was the ability to make them crackle and buzz like lightsabres. Yet soon enough, we were spending more time with our smartphones than with our loved ones, using them to replace the TV, radio, camera, laptop, satnav, Walkman, credit card — and above all, as an endless source of distraction.

Why suggest the iPhone might teach us something about generative AI? The technologies are different, true. But we might want to reflect on how quickly we became dependent on smartphones and how quickly we started to turn to them out of habit, rather than as a deliberate choice. We want company, but instead of meeting a friend we fire off a tweet. We want something to read, but rather than picking up a book, we doomscroll. Instead of a good movie, TikTok. Email and Whats­App become a substitute for doing real work. There will be a time and a place for generative AI, just as there is a time and a place to consult the supercomputer in your pocket. But it may not be easy to figure out when it will help us and when it will get in our way.

Unlike with generative AI, anybody with a pen, paper and three minutes to spare can write a list of what they do better with a smartphone in hand, and what they do better when the smartphone is out of sight. The challenge is to remember that list and act accordingly. The smartphone is a powerful tool that most of us unthinkingly misuse many times a day, despite the fact that it is far less mysterious than a large language model like GPT-4. Will we really do a better job with the AI tools to come?

Written for and first published in the Financial Times on 16 February 2024.

The paperback of “The Next 50 Things That Made The Modern Economy” is now out in the UK.

“Endlessly insightful and full of surprises — exactly what you would expect from Tim Harford.”- Bill Bryson

“Witty, informative and endlessly entertaining, this is popular economics at its most engaging.”- The Daily Mail

I’ve set up a storefront on Bookshop in the United States and the United Kingdom – have a look and see all my recommendations; Bookshop is set up to support local independent retailers. Links to Bookshop and Amazon may generate referral fees.

← What the birth of the spreadsheet teaches us about generative AI Cautionary Tales - Do Nothing, Then Do Less →

Owl Media Group takes pride in providing social-first platforms which equally benefit and facilitate engagement between businesses and consumers and creating much-needed balance to make conducting business, easier, safer, faster and better. The vision behind every platform in the Owl Media suite is to make lives better and foster a healthy environment in which parties can conduct business efficiently. Facilitating free and fair business relationships is crucial for any thriving economy and Owl Media bridges the gap and open doors for transparent and successful transacting. No advertising funds influence the functionality of our media platforms because we value authenticity and never compromise on quality no matter how lucrative the offers from advertisers may seem.

Originally posted on: https://timharford.com/2024/03/the-real-quandary-of-ai-isnt-what-people-think/