the ai detectives · deep neural network, with an architetucre loosely inspired by the bairn. and...

NEWS | FEATURES | AI IN SCIENCE

22 7 JULY 2017 • VOL 357 ISSUE 6346 sciencemag.org SCIENCE

Jason Yosinski sits in a small glass

box at Uber’s San Francisco, Cali-

fornia, headquarters, pondering

the mind of an artificial intelli-

gence. An Uber research scientist,

Yosinski is performing a kind of

brain surgery on the AI running

on his laptop. Like many of the

AIs that will soon be powering

so much of modern life, including self-

driving Uber cars, Yosinski’s program is a

deep neural network, with an architecture

loosely inspired by the brain. And like the

brain, the program is hard to understand

from the outside: It’s a black box.

This particular AI has been trained, us-

ing a vast sum of labeled images, to rec-

ognize objects as random as zebras, fire

trucks, and seat belts. Could it recognize

Yosinski and the reporter hovering in front

of the webcam? Yosinski zooms in on one of

the AI’s individual computational nodes—

the neurons, so to speak—to see what is

prompting its response. Two ghostly white

ovals pop up and float on the screen. This

neuron, it seems, has learned to detect the

outlines of faces. “This responds to your

face and my face,” he says. “It responds to

different size faces, different color faces.”

No one trained this network to identify

faces. Humans weren’t labeled in its train-

ing images. Yet learn faces it did, perhaps

as a way to recognize the things that tend

to accompany them, such as ties and cow-

boy hats. The network is too complex for

humans to comprehend its exact decisions.

Yosinski’s probe had illuminated one small

part of it, but overall, it remained opaque.

“We build amazing models,” he says. “But

we don’t quite understand them. And every

year, this gap is going to get a bit larger.”

Each month, it seems, deep neural net-

works, or deep learning, as the field is also

called, spread to another scientific disci-

pline. They can predict the best way to syn-

thesize organic molecules (see box, p. 27).

They can detect genes related to autism

risk (see box, p. 25). They are even chang-

ing how science itself is conducted (see

p. 18). The AIs often succeed in what they

do. But they have left scientists, whose very

enterprise is founded on explanation, with

a nagging question: Why, model, why?

That interpretability problem, as it’s

known, is galvanizing a new generation

of researchers in both industry and aca-

demia. Just as the microscope revealed the

cell, these researchers are crafting tools

that will allow insight into the how neu-

ral networks make decisions. Some tools

probe the AI without penetrating it; some

are alternative algorithms that can com-

pete with neural nets, but with more trans-

parency; and some use still more deep

learning to get inside the black box. Taken

together, they add up to a new discipline.

Yosinski calls it “AI neuroscience.”

THE URGENCY COMES not just from science.

According to a directive from the European

Union, companies deploying algorithms

that substantially influence the public

must by next year create “explanations”

for their models’ internal logic. The De-

fense Advanced Research Projects Agency,

the U.S. military’s blue-sky research arm, is

pouring $70 million into a new program,

called Explainable AI, for interpreting

the deep learning that powers drones and

intelligence-mining operations. The drive

to open the black box of AI is also coming

from Silicon Valley itself, says Maya Gupta,

a machine-learning researcher at Google

in Mountain View, California. When she

joined Google in 2012 and asked AI engi-

neers about their problems, accuracy wasn’t

the only thing on their minds, she says. “I’m

As neural nets push into science, researchers probe back

By Paul Voosen

THE AI DETECTIVES

PH

OT

OS

: A

NH

NG

UY

EN

DA_0707NewsFeatures.indd 22 7/5/17 11:22 AM

Published by AAAS

on May 31, 2020

http://science.sciencem

ag.org/D

ownloaded from

http://science.sciencemag.org/

7 JULY 2017 • VOL 357 ISSUE 6346 23SCIENCE sciencemag.org

not sure what it’s doing,” they told her. “I’m

not sure I can trust it.”

Rich Caruana, a computer scientist at

Microsoft Research in Redmond, Washing-

ton, knows that lack of trust firsthand. As

a graduate student in the 1990s at Carn-

egie Mellon University in Pittsburgh, Penn-

sylvania, he joined a team trying to see

whether machine learning could guide the

treatment of pneumonia patients. In gen-

eral, sending the hale and hearty home is

best, so they can avoid picking up other in-

fections in the hospital. But some patients,

especially those with complicating factors

such as asthma, should be admitted imme-

diately. Caruana applied a neural network

to a data set of symptoms and outcomes

provided by 78 hospitals. It seemed to

work well. But disturbingly, he saw that a

simpler, transparent model trained on the

same records suggested sending asthmatic

patients home, indicating some flaw in the

data. And he had no easy way of knowing

whether his neural net had picked up the

same bad lesson. “Fear of a neural net is

completely justified,” he says. “What really

terrifies me is what else did the neural net

learn that’s equally wrong?”

Today’s neural nets are far more power-

ful than those Caruana used as a graduate

student, but their essence is the same. At

one end sits a messy soup of data—say,

millions of pictures of dogs. Those data are

sucked into a network with a dozen or more

computational layers, in which neuron-

AI IN ACTION

Researchers have created neural networks that,

in addition to filling gaps left in photos, can identify

flaws in an artificial intelligence.

With billions of users and hundreds of bil-

lions of tweets and posts every year, social

media has brought big data to social sci-

ence. It has also opened an unprecedented

opportunity to use artificial intelligence (AI)

to glean meaning from the mass of human

communications, psychologist Martin

Seligman has recognized. At the University

of Pennsylvania’s Positive Psychology

Center, he and more than 20 psychologists,

physicians, and computer scientists in the

World Well-Being Project use machine learn-

ing and natural language processing to sift

through gobs of data to gauge the public’s

emotional and physical health.

That’s traditionally done with surveys.

But social media data are “unobtrusive, it’s

very inexpensive, and the numbers you get

are orders of magnitude greater,” Seligman

says. It is also messy, but AI offers a power-

ful way to reveal patterns.

In one recent study, Seligman and his

colleagues looked at the Facebook updates

of 29,000 users who had taken a self-

assessment of depression. Using data from

28,000 of the users, a machine-learning

algorithm found associations between words

in the updates and depression levels. It could

then successfully gauge depression in the

other users based only on their updates.

In another study, the team predicted

county-level heart disease mortality rates by

analyzing 148 million tweets; words related

to anger and negative relationships turned

out to be risk factors. The predictions from

social media matched actual mortality rates

more closely than did predictions based on

10 leading risk factors, such as smoking and

diabetes. The researchers have also used

social media to predict personality, income,

and political ideology, and to study hospital

care, mystical experiences, and stereotypes.

The team has even created a map coloring

each U.S. county according to well-being,

depression, trust, and five personality traits,

as inferred from Twitter (wwbp.org).

“There’s a revolution going on in the

analysis of language and its links to psycho-

logy,” says James Pennebaker, a social

psychologist at the University of Texas in

Austin. He focuses not on content but style,

and has found, for example, that the use of

function words in a college admissions essay

can predict grades. Articles and preposi-

tions indicate analytical thinking and predict

higher grades; pronouns and adverbs

indicate narrative thinking and predict lower

grades. He also found support for sugges-

tions that much of the 1728 play Double

Falsehood was likely written by William

Shakespeare: Machine-learning algorithms

matched it to Shakespeare’s other works

based on factors such as cognitive complex-

ity and rare words. “Now, we can analyze

everything that you’ve ever posted, ever

written, and increasingly how you and Alexa

talk,” Pennebaker says. The result: “richer

and richer pictures of who people are.”

—Matthew Hutson

How algorithms can analyze the mood of the masses


Published by AAAS

on May 31, 2020


ag.org/D

ownloaded from




like connections “fire” in response to fea-

tures of the input data. Each layer reacts

to progressively more abstract features, al-

lowing the final layer to distinguish, say,

terrier from dachshund.

At first the system will botch the job.

But each result is compared with labeled

pictures of dogs. In a process called back-

propagation, the outcome is sent back-

ward through the network, enabling it to

reweight the triggers for each neuron. The

process repeats millions of times until the

network learns—somehow—to make fine

distinctions among breeds. “Using modern

horsepower and chutzpah, you can get these

things to really sing,” Caruana says. Yet that

mysterious and flexible power is precisely

what makes them black boxes.

MARCO RIBEIRO, a graduate student at the

University of Washington in Seattle, strives

to understand the black box by using a class

of AI neuroscience tools called counter-

factual probes. The idea is to vary the in-

puts to the AI—be they text, images, or

anything else—in clever ways to see which

changes affect the output, and how. Take

a neural network that, for example, in-

gests the words of movie reviews and flags

those that are positive. Ribeiro’s program,

called Local Interpretable Model-Agnostic

Explanations (LIME), would take a re-

view flagged as positive and create subtle

variations by deleting or replacing words.

Those variants would then be run through

the black box to see whether it still consid-

ered them to be positive. On the basis of

thousands of tests, LIME can identify the

words—or parts of an image or molecular

structure, or any other kind of data—most

important in the AI’s original judgment.

The tests might reveal that the word

“horrible” was vital to a panning or that

“Daniel Day Lewis” led to a positive review.

But although LIME can diagnose those sin-

gular examples, that result says little about

the network’s overall insight.

New counterfactual methods like LIME

seem to emerge each month. But Mukund

Sundararajan, another computer scientist

at Google, devised a probe that doesn’t re-

quire testing the network a thousand times

over: a boon if you’re trying to understand

many decisions, not just a few. Instead of

varying the input randomly, Sundararajan

and his team introduce a blank reference—

a black image or a zeroed-out array in

place of text—and transition it step-by-step

toward the example being tested. Running

each step through the network, they watch

the jumps it makes in certainty, and from

that trajectory they infer features impor-

tant to a prediction.

Sundararajan compares the process to

picking out the key features that identify

the glass-walled space he is sitting in—

Neuron

VOLCANO!

ResultEdge

POTATO?

Color

Opening up the black boxLoosely modeled after the brain, deep neural networks are spurring innovation across science. But the mechanics of the models are mysterious:

They are black boxes. Scientists are now developing tools to get inside the mind of the machine.

GR

AP

HIC

: G

. G

RU

LL

ÓN

/SCIENCE

Inside the black boxA neural network, such as this one taught to

perform image recognition, is made out of

layers of triggers, or “neurons.” The neurons fire

when given data that cross certain thresholds,

and pass that information to a new layer.

Path 1: Wrong

With its triggers

set randomly at

first, the network

is wrong.

Path 2: Training

Shown many correct

“volcanoes,” the

network adjusts

its triggers.

Path 3: Right

After repeating many

times, the network

can correctly identify

a volcano.

Black box

Transparent layer

0.9 0.8 0.3 0.2 0.1

Generator ClassiferConfdence in label

Into the darknessResearchers have developed three broad classes of tools to look inside neural networks.

Controlling the black boxSome models guarantee relationships between two

variables, like square footage and house price. These

models are more transparent and can be wired into a

neural network, helping control it.

Probing the black boxResearchers perturb the inputs to a trained neural

network to see what most affects its decision-making.

The probing can reveal the cause for one decision,

but not the overall logic.

Embracing the darknessNeural networks can be used to help understand other

neural networks. Combining an image generator with

an image classifier can expose knowledge gaps, such as

accurate labels learned for the wrong reasons.


Published by AAAS

on May 31, 2020


ag.org/D

ownloaded from



outfitted with the standard medley of

mugs, tables, chairs, and computers—as a

Google conference room. “I can give a zil-

lion reasons.” But say you slowly dim the

lights. “When the lights become very dim,

only the biggest reasons stand out.” Those

transitions from a blank reference allow

Sundararajan to capture more of the net-

work’s decisions than Ribeiro’s variations

do. But deeper, unanswered questions are

always there, Sundararajan says—a state of

mind familiar to him as a parent. “I have a

4-year-old who continually reminds me of

the infinite regress of ‘Why?’”

GUPTA HAS A DIFFERENT TACTIC for coping

with black boxes: She avoids them. Several

years ago Gupta, who moonlights as a de-

signer of intricate physical puzzles, began a

project called GlassBox. Her goal is to tame

neural networks by engineering predict-

ability into them. Her guiding principle is

monotonicity—a relationship between vari-

ables in which, all else being equal, increas-

ing one variable directly increases another,

as with the square footage of a house and

its price.

Gupta embeds those monotonic relation-

ships in sprawling databases called inter-

polated lookup tables. In essence, they’re

like the tables in the back of a high school

trigonometry textbook where you’d look up

the sine of 0.5. But rather than dozens of en-

tries across one dimension, her tables have

millions across multiple dimensions. She

wires those tables into neural networks, ef-

fectively adding an extra, predictable layer

of computation—baked-in knowledge that

she says will ultimately make the network

more controllable.

Caruana, meanwhile, has kept his pneu-

monia lesson in mind. To develop a model

that would match deep learning in accuracy

but avoid its opacity, he turned to a com-

munity that hasn’t always gotten along with

machine learning and its loosey-goosey

ways: statisticians.

In the 1980s, statisticians pioneered a

technique called a generalized additive

model (GAM). It built on linear regression,

a way to find a linear trend in a set of data.

But GAMs can also handle trickier relation-

ships by finding multiple operations that

together can massage data to fit on a regres-

sion line: squaring a set of numbers while

taking the logarithm for another group of

variables, for example. Caruana has super-

charged the process, using machine learn-

ing to discover those operations—which can

then be used as a powerful pattern-detecting

model. “To our great surprise, on many

problems, this is very accurate,” he says.

And crucially, each operation’s influence on

the underlying data is transparent.

Combing the genome for the roots of autismFor geneticists, autism is a vexing chal-

lenge. Inheritance patterns suggest it has

a strong genetic component. But variants

in scores of genes known to play some

role in autism can explain only about 20%

of all cases. Finding other variants that

might contribute requires looking for clues

in data on the 25,000 other human genes

and their surrounding DNA—an over-

whelming task for human investigators. So

computational biologist Olga Troyanskaya

of Princeton University and the Simons

Foundation in New York City enlisted the

tools of artificial intelligence (AI).

“We can only do so much as bio-

logists to show what underlies diseases

like autism,” explains collaborator Robert

Darnell, founding director of the New York

Genome Center and a physician scientist

at The Rockefeller University in New York

City. “The power of machines to ask a

trillion questions where a scientist can ask

just 10 is a game-changer.”

Troyanskaya combined hundreds

of data sets on which genes are active

in specific human cells, how proteins

interact, and where transcription factor

binding sites and other key genome

features are located. Then her team used

machine learning to build a map of gene

interactions and compared those of the

few well-established autism risk genes

with those of thousands of other unknown

genes, looking for similarities. That flagged

another 2500 genes likely to be involved in

autism, they reported last year in

Nature Neuroscience.

But genes don’t act in isolation, as

geneticists have recently realized. Their

behavior is shaped by the millions of

nearby noncoding bases, which interact

with DNA-binding proteins and other fac-

tors. Identifying which noncoding variants

might affect nearby autism genes is an

even tougher problem than finding the

genes in the first place, and graduate stu-

dent Jian Zhou in Troyanskaya’s Princeton

lab is deploying AI to solve it.

To train the program—a deep-learning

system—Zhou exposed it to data collected

by the Encyclopedia of DNA Elements

and Roadmap Epigenomics, two projects

that cataloged how tens of thousands

of noncoding DNA sites affect neighbor-

ing genes. The system in effect learned

which features to look for as it evaluates

unknown stretches of noncoding DNA for

potential activity.

When Zhou and Troyanskaya described

their program, called DeepSEA, in Nature

Methods in October 2015, Xiaohui Xie,

a computer scientist at the University of

California, Irvine, called it “a milestone

in applying deep learning to genomics.”

Now, the Princeton team is running the

genomes of autism patients through

DeepSEA, hoping to rank the impacts of

noncoding bases.

Xie is also applying AI to the genome,

though with a broader focus than autism.

He, too, hopes to classify any mutations by

the odds they are harmful. But he cautions

that in genomics, deep learning systems

are only as good as the data sets on which

they are trained. “Right now I think people

are skeptical” that such systems can

reliably parse the genome, he says. “But

I think down the road more and more

people will embrace deep learning.”

—Elizabeth Pennisi

AI IN ACTION

Artificial intelligence tools are helping reveal thousands of genes that may contribute to autism.

PH

OT

O:

BS

IP S

A/A

LA

MY

ST

OC

K P

HO

TO


Published by AAAS

on May 31, 2020


ag.org/D

ownloaded from




Caruana’s GAMs are not as good as AIs

at handling certain types of messy data,

such as images or sounds, on which some

neural nets thrive. But for any data that

would fit in the rows and columns of a

spreadsheet, such as hospital records,

the model can work well. For example,

Caruana returned to his original pneu-

monia records. Reanalyzing them with

one of his GAMs, he could see why the AI

would have learned the wrong lesson from

the admission data. Hospitals routinely

put asthmatics with pneumonia in inten-

sive care, improving their outcomes. See-

ing only their rapid improvement, the AI

would have recommended the patients be

sent home. (It would have made the same

optimistic error for pneumonia patients

who also had chest pain and heart disease.)

Caruana has started touting the GAM

approach to California hospitals, includ-

ing Children’s Hospital Los Angeles, where

about a dozen doctors reviewed his model’s

results. They spent much of that meeting

discussing what it told them about pneu-

monia admissions, immediately under-

standing its decisions. “You don’t know

much about health care,” one doctor said,

“but your model really does.”

SOMETIMES, YOU HAVE TO EMBRACE the dark-

ness. That’s the theory of researchers pur-

suing a third route toward interpretability.

Instead of probing neural nets, or avoiding

them, they say, the way to explain deep

learning is simply to do more deep learning.

Like many AI coders, Mark Riedl, direc-

tor of the Entertainment Intelligence Lab at

the Georgia Institute of Technology in At-

lanta, turns to 1980s video games to test his

creations. One of his favorites is Frogger, in

which the player navigates the eponymous

amphibian through lanes of car traffic to an

awaiting pond. Training a neural network

to play expert Frogger is easy enough, but

explaining what the AI is doing is even

harder than usual.

Instead of probing that network, Riedl

asked human subjects to play the game and

to describe their tactics aloud in real time.

Riedl recorded those comments alongside

the frog’s context in the game’s code: “Oh,

there’s a car coming for me; I need to jump

forward.” Armed with those two languages—

the players’ and the code—Riedl trained a

second neural net to translate between the

two, from code to English. He then wired

that translation network into his original

game-playing network, producing an overall

AI that would say, as it waited in a lane, “I’m

waiting for a hole to open up before I move.”

The AI could even sound frustrated when

pinned on the side of the screen, cursing and

complaining, “Jeez, this is hard.”

This past April, astrophysicist Kevin

Schawinski posted fuzzy pictures of four

galaxies on Twitter, along with a request:

Could fellow astronomers help him classify

them? Colleagues chimed in to say the

images looked like ellipticals and spirals—

familiar species of galaxies.

Some astronomers, suspecting trickery

from the computation-minded Schawinski,

asked outright: Were these real galaxies?

Or were they simulations, with the relevant

physics modeled on a computer? In truth

they were neither, he says. At ETH Zurich

in Switzerland, Schawinski, computer

scientist Ce Zhang, and other collaborators

had cooked the galaxies up inside a neural

network that doesn’t know anything about

physics. It just seems to understand, on a

deep level, how galaxies should look.

With his Twitter post, Schawinski just

wanted to see how convincing the net-

work’s creations were. But his larger goal

was to create something like the techno-

logy in movies that magically sharpens

fuzzy surveillance images: a network that

could make a blurry galaxy image look like

it was taken by a better telescope than it

actually was. That could let astronomers

squeeze out finer details from reams of

observations. “Hundreds of millions or

maybe billions of dollars have been spent

on sky surveys,” Schawinski says. “With

this technology we can immediately

extract somewhat more information.”

The forgery Schawinski posted on

Twitter was the work of a generative

adversarial network, a kind of machine-

learning model that pits two dueling

neural networks against each other. One

is a generator that concocts images, the

other a discriminator that tries to spot any

flaws that would give away the manipula-

tion, forcing the generator to get better.

Schawinski’s team took thousands of real

images of galaxies, and then artificially

degraded them. Then the researchers

taught the generator to spruce up the

images again so they could slip past the

discriminator. Eventually the network could

outperform other techniques for smooth-

ing out noisy pictures of galaxies.

Schawinski’s approach is a particularly

avant-garde example of machine learn-

ing in astronomy, says astrophysicist

Brian Nord of Fermi National Accelerator

Laboratory in Batavia, Illinois, but it’s far

from the only one. At the January meet-

ing of the American Astronomical Society,

Nord presented a machine-learning strat-

egy to hunt down strong gravitational

lenses: rare arcs of light in the sky that

form when the images of distant galaxies

travel through warped spacetime on the

way to Earth. These lenses can be used to

gauge distances across the universe and

find unseen concentrations of mass.

Strong gravitational lenses are visu-

ally distinctive but difficult to describe

with simple mathematical rules—hard

for traditional computers to pick out,

but easy for people. Nord and others

realized that a neural network, trained

on thousands of lenses, can gain similar

intuition. In the following months, “there

have been almost a dozen papers, actu-

ally, on searching for strong lenses using

some kind of machine learning. It’s been

a flurry,” Nord says.

And it’s just part of a growing realiza-

tion across astronomy that artificial

intelligence strategies offer a powerful

way to find and classify interesting objects

in petabytes of data. To Schawinski, “That’s

one way I think in which real discovery is

going to be made in this age of ‘Oh my

God, we have too much data.’”

Joshua Sokol is a journalist in Boston.

AI IN ACTION

AI that “knows” what a galaxy should look like transforms a fuzzy image (left) into a crisp one (right).

PH

OT

OS

: K

IYO

SH

I T

AK

AH

AS

E S

EG

UN

DO

/AL

AM

Y S

TO

CK

PH

OT

O

Machines that make sense of the sky


Published by AAAS

on May 31, 2020


ag.org/D

ownloaded from



Riedl calls his approach “rationaliza-

tion,” which he designed to help everyday

users understand the robots that will soon

be helping around the house and driving

our cars. “If we can’t ask a question about

why they do something and get a reason-

able response back, people will just put it

back on the shelf,” Riedl says. But those

explanations, however soothing, prompt

another question, he adds: “How wrong

can the rationalizations be before people

lose trust?”

BACK AT UBER, Yosinski has been kicked

out of his glass box. Uber’s meeting rooms,

named after cities, are in high demand, and

there is no surge pricing to thin the crowd.

He’s out of Doha and off to find Montreal,

Canada, unconscious pattern recognition

processes guiding him through the office

maze—until he gets lost. His image classi-

fier also remains a maze, and, like Riedl, he

has enlisted a second AI to help him under-

stand the first one.

First, Yosinski rejiggered the classifier to

produce images instead of labeling them.

Then, he and his colleagues fed it colored

static and sent a signal back through it to re-

quest, for example, “more volcano.” Eventu-

ally, they assumed, the network would shape

that noise into its idea of a volcano. And to an

extent, it did: That volcano, to human eyes,

just happened to look like a gray, featureless

mass. The AI and people saw differently.

Next, the team unleashed a generative

adversarial network (GAN) on its images.

Such AIs contain two neural networks.

From a training set of images, the “genera-

tor” learns rules about imagemaking and

can create synthetic images. A second “ad-

versary” network tries to detect whether

the resulting pictures are real or fake,

prompting the generator to try again. That

back-and-forth eventually results in crude

images that contain features that humans

can recognize.

Yosinski and Anh Nguyen, his former

intern, connected the GAN to layers in-

side their original classifier network. This

time, when told to create “more volcano,”

the GAN took the gray mush that the clas-

sifier learned and, with its own knowledge

of picture structure, decoded it into a vast

array of synthetic, realistic-looking volca-

noes. Some dormant. Some erupting. Some

at night. Some by day. And some, perhaps,

with flaws—which would be clues to the

classifier’s knowledge gaps.

Their GAN can now be lashed to any net-

work that uses images. Yosinski has already

used it to identify problems in a network

trained to write captions for random im-

ages. He reversed the network so that it can

create synthetic images for any random cap-

tion input. After connecting it to the GAN,

he found a startling omission. Prompted

to imagine “a bird sitting on a branch,” the

network—using instructions translated by

the GAN—generated a bucolic facsimile of

a tree and branch, but with no bird. Why?

After feeding altered images into the origi-

nal caption model, he realized that the cap-

tion writers who trained it never described

trees and a branch without involving a bird.

The AI had learned the wrong lessons about

what makes a bird. “This hints at what will

be an important direction in AI neurosci-

ence,” Yosinski says. It was a start, a bit of a

blank map shaded in.

The day was winding down, but Yosinski’s

work seemed to be just beginning. Another

knock on the door. Yosinski and his AI were

kicked out of another glass box conference

room, back into Uber’s maze of cities, com-

puters, and humans. He didn’t get lost this

time. He wove his way past the food bar,

around the plush couches, and through the

exit to the elevators. It was an easy pattern.

He’d learn them all soon. j

AI IN ACTION

“If we can’t ask … why they do something and get a reasonable response back, people will just put it back on the shelf.”Mark Riedl, Georgia Institute of Technology

Organic chemists are experts at working

backward. Like master chefs who start with

a vision of the finished dish and then work

out how to make it, many chemists start

with the final structure of a molecule they

want to make, and then think about how to

assemble it. “You need the right ingredients

and a recipe for how to combine them,”

says Marwin Segler, a graduate student at

the University of Münster in Germany. He

and others are now bringing artificial intel-

ligence (AI) into their molecular kitchens.

They hope AI can help them cope with

the key challenge of moleculemaking:

choosing from among hundreds of

potential building blocks and thousands

of chemical rules for linking them. For

decades, some chemists have painstak-

ingly programmed computers with known

reactions, hoping to create a system that

could quickly calculate the most facile

molecular recipes. However, Segler says,

chemistry “can be very subtle. It’s hard to

write down all the rules in a binary way.”

So Segler, along with computer scientist

Mike Preuss at Münster and Segler’s

adviser Mark Waller, turned to AI. Instead

of programming in hard and fast rules for

chemical reactions, they designed a deep

neural network program that learns on its

own how reactions proceed, from millions

of examples. “The more data you feed it

the better it gets,” Segler says. Over time

the network learned to predict the best

reaction for a desired step in a synthesis.

Eventually it came up with its own recipes

for making molecules from scratch.

The trio tested the program on 40 dif-

ferent molecular targets, comparing it with

a conventional molecular design program.

Whereas the conventional program came

up with a solution for synthesizing target

molecules 22.5% of the time in a 2-hour

computing window, the AI figured it out

95% of the time, they reported at a meet-

ing this year. Segler, who will soon move

to London to work at a pharmaceutical

company, hopes to use the approach to

improve the production of medicines.

Paul Wender, an organic chemist at

Stanford University in Palo Alto, California,

says it’s too soon to know how well Segler’s

approach will work. But Wender, who is also

applying AI to synthesis, thinks it “could

have a profound impact,” not just in build-

ing known molecules but in finding ways to

make new ones. Segler adds that AI won’t

replace organic chemists soon, because

they can do far more than just predict how

reactions will proceed. Like a GPS naviga-

tion system for chemistry, AI may be good

for finding a route, but it can’t design and

carry out a full synthesis—by itself.

Of course, AI developers have their

eyes trained on those other tasks as well.

—Robert F. Service

Neural networks learn the art of chemical synthesis


Published by AAAS

on May 31, 2020


ag.org/D

ownloaded from


The AI detectivesPaul Voosen

DOI: 10.1126/science.357.6346.22 (6346), 22-27.357Science

ARTICLE TOOLS http://science.sciencemag.org/content/357/6346/22

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

is a registered trademark of AAAS.ScienceScience, 1200 New York Avenue NW, Washington, DC 20005. The title (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement ofScience

Science. No claim to original U.S. Government WorksCopyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of

on May 31, 2020


ag.org/D

ownloaded from

http://science.sciencemag.org/content/357/6346/22

http://www.sciencemag.org/help/reprints-and-permissions

http://www.sciencemag.org/about/terms-service


the ai detectives · deep neural network, with an architetucre loosely inspired by the bairn. and...

Documents