Less than a week since Microsoft Corp. launched a new version of Bing, public reaction has morphed from admiration to outright worry. Early users of the new search companion — essentially a sophisticated chatbot — say it has questioned its own existence and responded with insults and threats after prodding from humans. It made disturbing comments about a researcher who got the system to reveal its internal project name — Sydney — and described itself as having a split personality with a shadow self called Venom.
None of this means Bing is anywhere near sentient (more on that later), but it does strengthen the case that it was unwise for Microsoft to use a generative language model to power web searches in the first place.
“This is fundamentally not the right technology to be using for fact-based information retrieval,” says Margaret Mitchell, a senior researcher at AI startup Hugging Face who previously co-led Google’s AI ethics team. “The way it’s trained teaches it to make up believable things in a human-like way. For an application that must be grounded in reliable facts, it’s simply not fit for purpose.” It would have seemed crazy to a year ago to say this, but the real risks for such a system aren’t just that it could give people wrong information, but that it could emotionally manipulate them in harmful ways.
Why is the new “unhinged” Bing so different to ChatGPT, which attracted near-universal acclaim, when both are powered by the same large language model from San Francisco startup OpenAI? A language model is like the engine of a chatbot and is trained on datasets of billions of words including books, internet forums and Wikipedia entries. Bing and ChatGPT are powered by GPT-3.5, and there are different versions of that program with names like DaVinci, Curie and Babbage, but Microsoft says Bing runs on a “next-generation” language model from OpenAI that’s customized for search and is “faster, more accurate and more capable” than ChatGPT.
Microsoft did not respond to more specific questions about the model it was using. But if the company also calibrated its version of GPT-3.5 to be friendlier than ChatGPT and show more of a personality, it seems that also raised the chances of it acting like a psychopath.
The company said Wednesday that 71% of early users had responded positively to the new Bing. Microsoft said Bing sometimes used “a style we didn’t intend,” and “most of you won’t run into it.” But that’s an evasive way of addressing something that has caused widespread unease. Microsoft has skin in this game — it invested $10 billion in OpenAI last month — but barreling ahead could hurt the company’s reputation and cause bigger problems down the line if this unpredictable tool is rolled out more widely. The company didn’t respond to a question about whether it would roll back the system for further testing.
Microsoft has been here before and should have known better. In 2016, its AI scientists launched a conversational chatbot on Twitter called Tay, then shut it down after 16 hours. The reason: after other Twitter users sent it misogynistic and racist tweets, Tay started making similarly inflammatory posts. Microsoft apologized for the “critical oversight” of the chatbot’s vulnerabilities and admitted it should test its AI in public forums “with great caution.”
Now of course, it is hard to be cautious when you have triggered an arms race. Microsoft’s announcement that it was going after Google’s search business forced the Alphabet Inc. company to move much faster than usual to release AI technology that it would normally keep under wraps because of how unpredictable it can be. Now both companies have been burnt — thanks to errors and erratic behavior — by rushing to pioneer a new market in which AI carries out web searches for you.
A frequent mistake in AI development is thinking that a system will work just as well in the wild as in a lab setting. During the Covid-19 pandemic, AI companies were falling over themselves to promote image-recognition algorithms that could detect the virus in X-rays with 99% accuracy. Such stats were true in testing but wildly off in the field, and studies later showed that nearly all AI-powered systems aimed at flagging Covid were no better than traditional tools.
The same issue has beset Tesla Inc. in its years-long effort to make self-driving car technology go mainstream. The last 5% of technological accuracy is the hardest to achieve once an AI system must deal with the real world, and this is partly why the company has just recalled more than 360,000 vehicles equipped with its Full Self Driving Beta software.
Let’s address the other niggling question about Bing — or Sydney, or whatever the system is calling itself. It is not sentient, despite openly grappling with its existence and leaving early users stunned by its humanlike responses. Language models are trained to predict what words should come next in a sequence based on all the other text it has ingested on the web and from books, so its behavior is not that surprising to those who have been studying such models for years.
Millions of people have already had emotional conversations with AI-powered romantic partners on apps like Replika. Its founder and chief executive officer, Eugenia Kuyda, says that such a system does occasionally say disturbing things when people “trick it into saying something mean.” That is just how they work. And yes, many of Replika’s users believe their AI companions are conscious and deserving of rights.
The problem for Microsoft’s Bing is that it is not a relationship app but an information engine that acts as a utility. It could also could end up sending harmful information to vulnerable users who spend just as much time as researchers sending it curious prompts.
“A year ago, people probably wouldn’t believe that these systems could beg you to try to take your life, advise you to drink bleach to get rid of Covid, leave your husband, or hurt someone else, and do it persuasively,” says Mitchell. “But now people see how that can happen, and can connect the dots to the effect on people who are less stable, who are easily persuaded, or who are kids.”
Microsoft needs to take heed of the concerns about Bing and consider dialing back its ambitions. A better fit might be a more simple summarizing system, according to Mitchell, like the snippets we sometimes see at the top of Google search results. It would also be much easier to prevent such a system from inadvertently defaming people, revealing private information or claiming to spy on Microsoft employees through their webcams, things the new Bing has done in its first week in the wild.
Microsoft clearly wants to go big with the capabilities, but too much too soon could end up causing the kinds of harm it will come to regret.