AI makes us stupid

Written by on 15 Nov 2021

Neural networks The expert view Generative Adversarial Networks (GANs) Linguistics Speech time is ordered

ANDREW ORLOWSKI LOOKS AT THE LATEST APPLICATION OF ARTIFICIAL INTELLIGENCE TO LINGUISTICS, WITH SPECIFIC REFERENCE TO THE SCIENCE OF SPEECH PROCESSING, AND ASKS IF THE RUSH TO DISPENSE WITH HARD-EARNED DOMAIN KNOWLEDGE COMES AT A PRICE.

“Where is the wisdom we have lost in knowledge?” asked the poet TS Eliot. “Where is the knowledge we have lost in information?”

In recent years, developments in neural networks, known as deep learning, have revived interest in ‘Artificial Intelligence’ and captured the imagination of policy makers. New techniques proliferate, but the success may come at a cost. Both the ‘deep’ and the ‘learning’ are misleading; an increasing reliance on deep learning means researchers lose the fundamental knowledge required to understand the problem.

Neural networks

Neural networks are the third wave of interest in AI in fifty years. The first became retrospectively known as the ‘symbolic’ era of AI, or GOFAI, for Good, Old-Fashioned AI. This approach reflected the prevailing view of cognitive science: that human cognition is analogous to symbolic computation in digital computers. So AI research attempted to abstract the real world’s processes and relationships into symbols that could be manipulated by a computer. Hopes had been high. In 1960, Herbert Simon had predicted that, “Machines will be capable, within twenty years, of doing any work that a man can do.”

But with little to show for their efforts a decade later, enthusiasm waned; the 1973 Lighthill Report into AI resulted in a major reduction in UK funding.

Atari Games By contrast, the current wave of AI does not attempt to model or to abstract the world into symbolic logic. Instead it attempts to approximate results, a largely imitative approach. using statistical probability. However, this can lead to ‘brittleness’: since the AI doesn’t understand what it is doing, and a small change can break it completely.

For example, in a lauded example by DeepMind of beating an Atari game, the game broke if the paddle changed size. The AI didn’t “understand” the concept of a paddle, or even a game - merely that it responded to stimuli and that repeated operations could “win”. This is illustrated by the story of a novel UK approach to speech analysis that found itself marooned by the rapid progress of AI.

The expert view

Speech technology veteran Matthew Karas and Josh Greifer, an expert in real-time processing, founded a startup Speech Engineering Limited to exploit it. Underlying their software platform was a new idea: ‘a compiler for algorithms’, that decides in real-time how to tackle a processing problem.

Greifer had ported the original Cubase software to the Mac at Steinberg, and then taken some time out as a hedge fund manager before returning to the industry. Karas had studied speech processing at Cambridge University’s computer science department before setting up the world’s first industrial-strength CMS (Content Management System). He had devised the technical architecture of the BBC’s News Online – a skunkworks startup within the BBC that became the corporation’s biggest success. The technical system Karas devised cost a third of commercial rivals, but did much more.

bbc_news_online

What the pair had realised that the multicore low-latency processors in today’s smartphones and laptops were going largely unused. Focusing on one approach, such as machine learning, was putting the cart before the horse – it was letting the tool define the problem. Why not let the platform choose which techniques to use, depending on the problem?

Speech Engineering Ltd set out to do just that. They focused on the waveform directly, allowing the platform to decide what algorithms to use - machine learning example, being able to switch between up to a hundred times a second. In effect the platform was optimising the pipeline.

“You only ever need to extract features at the time as signal comes in - take whatever features of the signal you need,” Karas explained. “If the process doesn’t need full bandwidth audio and just needs linguistic features, then you just take that part.” Another advantage was chaining together multiple processes.

The result of their work was a speech application, Eloqute, that demonstrated the merit of the new approach. Eloqute improved a user’s pronunciation. But unlike established rivals, it did so on a local device – a smartphone or PC, rather than in the cloud - giving the user real-time feedback, and allowing them to use any text they wanted, rather than a rote set of phrases.

“We created a new way of learning language,” Karas says. “Each time you have a lesson and it improves you, it does it in such a way that the impact on intelligibility is the greatest. So the impediment that is most annoying is fixed first”.

Potential investors loved the application, but didn’t care much about the platform, even though the potential was revolutionary, going far beyond speech. One investor who did take an interest was Mike Lynch, who gave SEL space and investment to prove their mettle. Neither of the founders expected what happened next.

GET CW JOURNAL ARTICLES STRAIGHT TO YOUR INBOX Subscribe now

Generative Adversarial Networks (GANs)

In 2016 Google published the details of WaveNet, a feed forward neural network, and one of the first Generative Adversarial Networks, GANs. This attempted to improve on the laborious and (human) labour intensive ‘training’ process by which a model is tuned to improve. One stack of neural networks is pitched against another, hence the name.

It was too complicated to yield real world results, Google cautioned, but within two years, performance had improved 1000 fold. GANS began to be applied to ever more generalised problems.

“People taking a more ‘deep learning’ type approach, were moving ahead faster than us. It isn't that the two things are incompatible, but we just didn't have the bandwidth to do both, so we ended up working on WaveNet style systems, because people like Google, NVIDIA were just giving the stuff away.

As defined by Google, the generator learns to generate plausible data, and the generated instances become negative training examples for the discriminator. The discriminator learns to distinguish the generator's fake data from real data. The discriminator penalizes the generator for producing implausible results.

“A GAN with multiple discriminators means that a generator learns to generate better and better and the discriminator learns to discriminate better and better,” he explains. Speech recognition based on phonemes and language models couldn’t compete with the results. “It makes what we were doing irrelevant.”

Linguistics

This has had consequences, though. In a vivid analogy, the linguist Noam Chomsky illustrated the loss of deep understanding that a reliance on statistical interpretation.

He compared the neural net AI to a meterological institute that decided to dismantle its physics department, and rely on sensors and statistical analysis to record and predict the weather instead. On many days, Chomsky pointed out, the institute would make a decent forecast. But it had lost the ability to understand the weather systems. Chomsky maintained that the new brute force AI methods were a considerable regression. Raw power obscured the fundamental lack of understanding – even a loss of expertise.

Karas sees the parallel.

“Knowledge no longer matters. I first realised this in little ways,” he says. “The point Chomsky made is that the AI is a black box. With GANs, even the experts can’t predict if it will succeed, as the network is designing itself.”

“Creating deterministic processes which learn highly complex skills by practising them, is a new field of engineering. That's why it appears to be mysterious. Right now, one would have to be an expert to design the right kind of network, and decide which parameters to feed it, to succeed.”

Speech time is ordered

He did note that one speech problem stumped the GAN teams, and that’s something a specialist would have picked up earlier.

Speech is time ordered, and causality matters. A neural network that can clean up a photo by indiscriminately scanning backwards and forwards – a job it can do very well - didn’t know these two factors were important when it came to linear speech. Not all data is equivalent: the forward pass is more important than the backward pass in training the network.

“Deep Learning is more like science than engineering,” Karas reflects. “You do experiments and find out new facts about human speech, or indeed, Go. Speech scientists and game theorists have done this for years, and now have new tools. The aim of our project was to teach people to speak more clearly, and now, the best way to do that, would not be to hire speech scientists, but it would still be to ‘do speech science’, just using a new kind of lab.”

Interested in this topic?

LEARN ABOUT THIS SPECIAL INTEREST GROUP

AI makes us stupid

This site uses cookies.

CW TEC 2024: Engineering AI for Critical Systems

Cambridge Tech Week is back!

Cambridge Wireless connects the wireless community

CW Techsters is back for its third year

CW announces their participation in UK Telecoms Innovation Network

CW Academy