[House Hearing, 115 Congress]
[From the U.S. Government Publishing Office]
BIG DATA CHALLENGES AND
ADVANCED COMPUTING SOLUTIONS
=======================================================================
JOINT HEARING
BEFORE THE
SUBCOMMITTEE ON ENERGY &
SUBCOMMITTEE ON RESEARCH AND TECHNOLOGY
COMMITTEE ON SCIENCE, SPACE, AND TECHNOLOGY
HOUSE OF REPRESENTATIVES
ONE HUNDRED FIFTEENTH CONGRESS
SECOND SESSION
__________
JULY 12, 2018
__________
Serial No. 115-69
__________
Printed for the use of the Committee on Science, Space, and Technology
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Available via the World Wide Web: http://science.house.gov
_________
U.S. GOVERNMENT PUBLISHING OFFICE
30-879 PDF WASHINGTON : 2018
COMMITTEE ON SCIENCE, SPACE, AND TECHNOLOGY
HON. LAMAR S. SMITH, Texas, Chair
FRANK D. LUCAS, Oklahoma EDDIE BERNICE JOHNSON, Texas
DANA ROHRABACHER, California ZOE LOFGREN, California
MO BROOKS, Alabama DANIEL LIPINSKI, Illinois
RANDY HULTGREN, Illinois SUZANNE BONAMICI, Oregon
BILL POSEY, Florida AMI BERA, California
THOMAS MASSIE, Kentucky ELIZABETH H. ESTY, Connecticut
RANDY K. WEBER, Texas MARC A. VEASEY, Texas
STEPHEN KNIGHT, California DONALD S. BEYER, JR., Virginia
BRIAN BABIN, Texas JACKY ROSEN, Nevada
BARBARA COMSTOCK, Virginia CONOR LAMB, Pennsylvania
BARRY LOUDERMILK, Georgia JERRY McNERNEY, California
RALPH LEE ABRAHAM, Louisiana ED PERLMUTTER, Colorado
GARY PALMER, Alabama PAUL TONKO, New York
DANIEL WEBSTER, Florida BILL FOSTER, Illinois
ANDY BIGGS, Arizona MARK TAKANO, California
ROGER W. MARSHALL, Kansas COLLEEN HANABUSA, Hawaii
NEAL P. DUNN, Florida CHARLIE CRIST, Florida
CLAY HIGGINS, Louisiana
RALPH NORMAN, South Carolina
DEBBIE LESKO, Arizona
------
Subcommittee on Energy
HON. RANDY K. WEBER, Texas, Chair
DANA ROHRABACHER, California MARC A. VEASEY, Texas, Ranking
FRANK D. LUCAS, Oklahoma Member
MO BROOKS, Alabama ZOE LOFGREN, California
RANDY HULTGREN, Illinois DANIEL LIPINSKI, Illinois
THOMAS MASSIE, Kentucky JACKY ROSEN, Nevada
STEPHEN KNIGHT, California JERRY McNERNEY, California
GARY PALMER, Alabama PAUL TONKO, New York
DANIEL WEBSTER, Florida BILL FOSTER, Illinois
NEAL P. DUNN, Florida MARK TAKANO, California
RALPH NORMAN, South Carolina EDDIE BERNICE JOHNSON, Texas
LAMAR S. SMITH, Texas
------
Subcommittee on Research and Technology
HON. BARBARA COMSTOCK, Virginia, Chair
FRANK D. LUCAS, Oklahoma DANIEL LIPINSKI, Illinois, Ranking
RANDY HULTGREN, Illinois Member
STEPHEN KNIGHT, California ELIZABETH H. ESTY, Connecticut
BARRY LOUDERMILK, Georgia JACKY ROSEN, Nevada
DANIEL WEBSTER, Florida SUZANNE BONAMICI, Oregon
ROGER W. MARSHALL, Kansas AMI BERA, California
DEBBIE LESKO, Arizona DONALD S. BEYER, JR., Virginia
LAMAR S. SMITH, Texas EDDIE BERNICE JOHNSON, Texas
C O N T E N T S
July 12, 2018
Page
Witness List..................................................... 2
Hearing Charter.................................................. 3
Opening Statements
Statement by Representative Randy K. Weber, Chairman,
Subcommittee on Energy, Committee on Science, Space, and
Technology, U.S. House of Representatives...................... 4
Written Statement............................................ 6
Statement by Representative Marc A. Veasey, Ranking Member,
Subcommittee on Energy, Committee on Science, Space, and
Technology, U.S. House of Representatives...................... 8
Written Statement............................................ 9
Statement by Representative Barbara Comstock, Chairwoman,
Subcommittee on Research and Technology, Committee on Science,
Space, and Technology, U.S. House of Representatives........... 10
Written Statement............................................ 11
Statement by Representative Lamar Smith, Chairman, Committee on
Science, Space, and Technology, U.S. House of Representatives.. 12
Written Statement............................................ 13
Written Statement by Representative Eddie Bernice Johnson,
Ranking Member, Committee on Science, Space, and Technology,
U.S. House of Representatives.................................. 15
Written Statement by Representative Daniel Lipinski. Ranking
Member, Subcommittee on Research and Technology, Committee on
Science, Space, and Technology, U.S. House of Representatives.. 17
Witnesses:
Dr. Bobby Kasthuri, Researcher, Argonne National Laboratory;
Assistant Professor, The University of Chicago
Oral Statement............................................... 19
Written Statement............................................ 22
Dr. Katherine Yelick, Associate Laboratory Director for Computing
Sciences, Lawrence Berkeley National Laboratory; Professor, The
University of California, Berkeley
Oral Statement............................................... 31
Written Statement............................................ 34
Dr. Matthew Nielsen, Principal Scientist, Industrial Outcomes
Optimization, GE Global Research
Oral Statement............................................... 47
Written Statement............................................ 49
Dr. Anthony Rollett, U.S. Steel Professor of Materials Science
and Engineering, Carnegie Mellon University
Oral Statement............................................... 57
Written Statement............................................ 59
Discussion....................................................... 66
Appendix I: Answers to Post-Hearing Questions
Dr. Bobby Kasthuri, Researcher, Argonne National Laboratory;
Assistant Professor, The University of Chicago................. 92
Dr. Katherine Yelick, Associate Laboratory Director for Computing
Sciences, Lawrence Berkeley National Laboratory; Professor, The
University of California, Berkeley............................. 97
Dr. Matthew Nielsen, Principal Scientist, Industrial Outcomes
Optimization, GE Global Research............................... 104
Dr. Anthony Rollett, U.S. Steel Professor of Materials Science
and Engineering, Carnegie Mellon University.................... 113
Appendix II: Additional Material for the Record
Document submitted by Representative Neal P. Dunn, Committee on
Science, Space, and Technology, U.S. House of Representatives.. 120
BIG DATA CHALLENGES
AND ADVANCED COMPUTING SOLUTIONS
----------
THURSDAY, JULY 12, 2018
House of Representatives,
Subcommittee on Energy and
Subcommittee on Research and Technology,
Committee on Science, Space, and Technology,
Washington, D.C.
The Subcommittees met, pursuant to call, at 10:15 a.m., in
Room 2318, Rayburn House Office Building, Hon. Randy Weber
[Chairman of the Subcommittee on Energy] presiding.
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. The Committee on Science, Space, and
Technology will come to order.
Without objection, the Chair is authorized to declare
recess of the Subcommittees at any time.
Good morning, and welcome to today's hearing entitled ``Big
Data Challenges and Advanced Computing Solutions.'' I now
recognize myself for five minutes for an opening statement.
Today, we will explore the application of machine-learning-
based algorithms to big-data science challenges. Born from the
artificial intelligence--AI--movement that began in the 1950s,
machine learning is a data-analysis technique that gives
computers the ability to learn directly from data without being
explicitly programmed.
Generally speaking--and don't worry; I'll save the detailed
description for you all, our expert witnesses--machine learning
is used when computers are trained--more than husbands are
trained, right, ladies--on large data sets to recognize
patterns in that data and learn to make future decisions based
on these observations.
Today, specialized algorithms termed ``deep learning'' are
leading the field of machine-learning-based approaches. These
algorithms are able to train computers to perform certain tasks
at levels that can exceed human ability. Machine learning also
has the potential to improve computational science methods for
many big-data problems.
As the Nation's largest federal sponsor of basic research
in the physical sciences with expertise in big-data science,
advanced algorithms, data analytics, and high-performance
computing, the Department of Energy is uniquely equipped to
fund robust fundamental research in machine learning. The
Department also manages the 17 DOE national labs and 27 world-
leading scientific user facilities, which are instrumental to
connecting basic science and advanced computing.
Machine learning and other advanced computing processes
have broad applications in the DOE mission space from high
energy physics to fusion energy sciences to nuclear weapons
development. Machine learning also has important applications
in academia and industry. In industry, common examples of
machine-learning techniques are in automated driving, facial
recognition, and automated speech recognition.
At Rice University near my home district, researchers seek
to utilize machine-learning approaches to address challenges in
geological sciences. In addition, the University of Houston's
Solutions Lab supports research that will use machine learning
to predict the behavior of flooding events and aid in
evacuation planning. This would be incredibly beneficial for my
district and all areas that are prone to hurricanes and to
flooding. In fact, in Texas we're still recovering from
Hurricane Harvey, the wettest storm in United States history.
The future of scientific discovery includes the
incorporation of advanced data analysis techniques like machine
learning. With the next generation of supercomputers, including
the exascale computing systems that DOE is expected to field by
2021, American researchers utilizing these technologies will be
able to explore even bigger challenges. With the immense
potential for machine-learning technologies to answer
fundamental scientific questions, provide the foundation for
high-performance computing capabilities, and to drive future
technological development, it's clear that we should prioritize
this research.
I want to thank our accomplished panel of witnesses for
their testimony today, and I look forward to hearing what role
Congress should play in advancing this critical area of
research.
[The prepared statement of Chairman Weber follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. I now recognize the Ranking Member for an
opening statement.
Mr. Veasey. Thank you, Chairman Weber. Thank you,
Chairwoman Comstock, and also, thank you to the distinguished
panel for being here this morning.
As you know, there are a growing number of industries today
that are relying on generating and interpreting large amounts
of data to overcome new challenges. The new--the energy sector
in particular is making strides in leveraging these new
technologies and techniques. Today, we're going to hear more
about the advancements that we're going to see in the upcoming
years.
Sensor-equipped aircraft engines, locomotive, gas, and wind
turbines are now able to track production efficiency and the
wear and tear on vital machinery. This enables significant
reductions in fuel consumption, as well as carbon emissions.
The technologies are also significantly improving our ability
to detect failures before they occur and prevent disasters, and
by doing so will save money, will save time, and lives. And by
using analytics, sensors, and operational data, we can manage
and optimize systems ranging from energy storage components to
power plants and to the electric grid.
As digital technologies revolutionize the energy sector, we
also must ensure the safe and responsible use of these
processes. With our electric grid always in under persistent
threats from everything from cyber to other modes of
subterfuge, the security of these connected systems is of the
utmost importance. Nevertheless, I'm excited to learn more
about the value and benefits that these technologies may be
able to provide for our economy and our environment alike.
I'm looking forward to hearing what we can do in Congress
to help guide and support the responsible development of these
new data-driven approaches to the management of these evermore
complex systems that our society is very dependent on.
Thank you, and, Mr. Chairman, I yield back the balance of
my time.
[The prepared statement of Mr. Veasey follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. Thank you, Mr. Veasey.
I now recognize the Chairwoman of the Research and
Technology Subcommittee, the gentlewoman from Virginia, Mrs.
Comstock, for an opening statement.
Mrs. Comstock. Thank you, Chairman Weber.
A couple of weeks ago, our two Subcommittees joined
together on a hearing to examine the state of artificial
intelligence and the types of research being conducted to
advance this technology. The Committee learned about the
nuances of the term artificial intelligence, such as the
difference between narrow and general AI and implications for a
world in which AI is ubiquitous.
Today, we delve deeper into disciplines originating from
the AI movement of the 1950s that include machine learning,
deep learning, and neural networks. Until recently, machine
learning and especially deep-learning technologies were only
theoretical because deep-learning models require massive
amounts of data and computing power. But advances in high-
performance graphics, processing units, cloud computing, and
data storage have made these techniques possible.
Machine learning is pervasive in our day-to-day lives from
tagging photos on Facebook to protecting emails with spam
filters to using a virtual assistant like Siri or Alexa for
information. Machine-learning-based algorithms have powerful
applications that ultimately help make our lives more fun,
safe, and informative.
In the federal government, the Department of Energy stands
out for its work in high-performance computing and approaches
to big-data science challenges. The Energy Department
researchers are using machine-learning approaches to study
protein behavior, to understand the trajectories of patient
health outcomes, and to predict biological drug responses. At
Argonne National Laboratory, for example, researchers are using
intensive machine-learning-based algorithms to attempt to map
the human brain.
A program of particular interest to me involves a DOE and
Department of Veterans Affairs venture known as the MVP-
CHAMPION program. This joint collaboration will leverage DOE's
high-performance computing and machine-learning capabilities to
analyze health records of more than 20 million veterans
maintained by the VA. The goal of this partnership is to arm
the VA with data it can use to potentially improve health care
offered to our veterans by developing new treatments and
preventive strategies and best practices.
The potential for AI to help humans and further scientific
discoveries is obviously immense. I look forward to what our
witnesses will testify to today about their work and--which may
give us a glimpse into the revolutionary technologies of
tomorrow that we're here to discuss.
So I thank you, Mr. Chairman, and I yield back.
[The prepared statement of Mrs. Comstock follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. I thank the gentlelady.
And let me introduce our witnesses. Our first witness is
Dr. Bobby--Mr. Chairman, are you going to----
Chairman Smith. Mr. Chairman, thank you. In the interest of
time, I just ask unanimous consent to put my opening statement
in the record.
Chairman Weber. Without objection.
[The prepared statement of Chairman Smith follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
[The prepared statement of Ranking Member Johnson follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
[The prepared statement of Mr. Lipinski follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. Thank you. I appreciate that.
Now, I will introduce the witnesses. Our first witness is
Dr. Bobby Kasthuri, the first neuroscience researcher at
Argonne National Lab and an Assistant Professor in the
Department of Neurobiology at the University of Chicago. You're
busy. Dr. Kasthuri's current research focuses on innovation and
new approaches to brain mapping, including the use of high-
energy x-rays from synchrotron sources for mapping brains in
their entirety.
He holds a Bachelor of Science from Princeton University,
an M.D. from Washington University School of Medicine, and a
Ph.D. from Oxford University where he studied as a Rhodes
scholar. Welcome, Doctor.
Our second witness today is Dr. Katherine Yelick, a
Professor of Electrical Engineering and Computer Sciences at
the University of California, Berkeley, and the Associate
Laboratory Director for Computing at Lawrence Berkeley National
Laboratory. Her research is in high-performance computing,
programming languages, compilers, parallel algorithms, and
automatic performance tuning.
Dr. Yelick received her Bachelor of Science, Master of
Science, and Ph.D. all in computer science at the Massachusetts
Institute of Technology. Welcome, Dr. Yelick.
Our next witness is Dr. Matthew Nielsen, Principal
Scientist at the GE Global Research Center. Dr. Nielsen's
current research focuses on digital twin and computer modeling
and simulation of physical assets using first-principle physics
and machine-learning methods.
He received a Bachelor of Science in physics at Alma
College in Alma, Michigan, and a Ph.D. in applied physics from
Rensselaer.
Dr. Nielsen. Rensselaer.
Chairman Weber. Rensselaer, okay, Polytechnic Institute in
Troy, New York. Welcome, Dr. Nielsen.
And our final witness today is Dr. Anthony Rollett, the
U.S. Steel Professor of Metallurgical Engineering and Materials
Science at Carnegie Mellon University, a.k.a. CMU. Dr. Rollett
has been a Professor of Materials Science Engineering at CMU
for over 20 years and is the Co-Director of CMU's
NextManufacturing Center. Dr. Rollett's research focuses on
microstructural evolution and microstructure property
relationships in 3-D.
He received a Master of Arts in metallurgy and materials
science from Cambridge University and a Ph.D. in materials
engineering from Drexel University. Welcome, Dr. Rollett.
I now recognize Dr. Kasthuri for five minutes to present
his testimony. Doctor?
TESTIMONY OF DR. BOBBY KASTHURI, RESEARCHER,
ARGONNE NATIONAL LABORATORY;
ASSISTANT PROFESSOR,
THE UNIVERSITY OF CHICAGO
Dr. Kasthuri. Thank you. Chairman Smith, Chairman Weber,
Chairwoman Comstock, Ranking Members Veasey and Lipinski, and
Members of the Subcommittees, thank you for this opportunity to
talk and appear before you. My name is Bobby Kasthuri. I'm a
Neuroscientist at Argonne National Labs and an Assistant
Professor in the Department of Neurobiology at the University
of Chicago.
And the reason I'm here talking to you today is because I
think we are at a pivotal moment in our decades-long quest to
understand the brain. And the reason we're at this pivotal
moment is that we're actually witnessing in real time is the
collision of two different disciplines, two different worlds,
the worlds of computer science and neuroscience. And if we can
nurture and develop this union, it could fundamentally change
many things about our society.
First, it could fundamentally change how we think about
understanding the brain. It could change and revolutionize how
we treat mental illness, and perhaps even more significantly,
it can change how we think and imagine and build our future
computers and our future robots based on how brains solve
problems.
The major obstacle between us and realizing this vision is
that, for many neuroscientists, modern neuroscience is
extremely expensive and extremely resource-intensive. To give
you an idea of the scale, I thought it might help to give you
an example of the enormity of the problem that we're trying to
do.
The human brain, your brains, probably contain on order 100
billion brain cells or neurons, and the main thing that neurons
do is connect with each other. And so in your brain there's
probably--each neuron connects on average 10,000 times with
10,000 other neurons. That means in your brain there are orders
of magnitude more connections between neurons than stars in the
Milky Way galaxy. And what's even more important for
neuroscientists is that we believe that this map, this map of
you, this map of connections contains all of the things that
make us human. Our creativity, our ability to think critically,
our fears, our dreams are all contained in that map.
But unfortunately, that map, if we were to do it, wouldn't
be one gigabyte of data; it wouldn't be 100 gigabytes of data.
It could be on order a billion gigabytes of data, perhaps the
largest data set about anything ever collected in the history
of humanity. The problem is that for many neuroscientists even
analyzing a fraction of this map is beyond their resources, the
resources of their laboratory, the resources of the
universities, and perhaps the resources of even large
institutions. And if we don't address this gap, then what will
happen is that only the richest neuroscientists will be able to
answer their questions, and we would like every neuroscientist
to have access to answer the most important questions about
brains and ultimately promote this fusion of computer science
and neuroscience.
Luckily, there is a potential solution, and the potential
solution is the Department of Energy and the national lab
system, which is part of the Department of Energy. As stewards
of our scientific architecture, as stewards of some of the most
advanced technological and computing capabilities available,
the Department of Energy and the national labs can address this
gap, and in fact, they do address this gap in many different
sciences.
If I was a young astrophysicist or a young materials
scientist, no one would expect me to get money and build my own
space telescope. Instead, I would leverage the amazing
resources of the national lab system to answer my fundamental
questions. And although many fields of science have learned how
to leverage the expertise and the resources available in the
national lab system, neuroscientists have not.
A national center for brain mapping situated within the DOE
lab system could actually be a sophisticated clearinghouse to
ensure that the correct physics and engineering and computer
science tools are vetted and accessible for measuring brain
structure and brain function. Since the national labs are also
the stewards of our advanced computing infrastructure, they're
ideally suited to incubate these revolutions in computer and
neurosciences.
Decades earlier, as a biologist, I just recently learned
that the DOE and the national labs helped usher in humanity's
perhaps greatest scientific achievement of the 20th century,
the mapping of the human genome and the understanding of the
genetic basis of life. We believe that the DOE and the national
lab system can make a similar contribution to understanding the
human brain.
Other countries like Japan, South Korea, and China,
cognizant of the remarkable benefits to economic and national
security that understanding brains and using them to make
computer science better have already invested in national
efforts in artificial intelligence and national efforts to
understand the brain. The United States has not yet, and I
think it's important at the end of my statement for everyone to
remember that we are the ones who went to the moon, we are the
ones who harnessed the power of nuclear energy, and we are the
ones that led the genomic revolution. And I suspect it's the
moment now for the United States to lead again, to map and help
reverse engineer the physical substrates of human thought,
arguably the most challenging quest of the 21st century and
perhaps the last great scientific frontier.
Thank you for your time and attention today. I welcome any
questions you might have.
[The prepared statement of Dr. Kasthuri follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. Thank you, Doctor.
Dr. Yelick, you're recognized for five minutes.
TESTIMONY OF DR. KATHERINE YELICK,
ASSOCIATE LABORATORY DIRECTOR
FOR COMPUTING SCIENCES,
LAWRENCE BERKELEY NATIONAL LABORATORY;
PROFESSOR, THE UNIVERSITY OF CALIFORNIA, BERKELEY
Dr. Yelick. Chairman Smith, Chairman Weber, Chairwoman
Comstock, Ranking Members Veasey and Lipinski, distinguished
Members of the Committee, thank you for holding this hearing
and for the Committee's support for science. And thank you for
inviting me to testify.
My name is Kathy Yelick and I'm the Associate Laboratory
Director for Computing Sciences at Lawrence Berkeley National
Laboratory, a DOE Office of Science laboratory managed by the
University of California. I'm also Professor of Electrical
Engineering and Computer Sciences at the University of
California, Berkeley.
Berkeley Lab is home to five national scientific user
facilities serving over 10,000 researchers covering all 50
States. The combination of experimental, computational, and
networking facilities puts Berkeley Lab on the cutting edge of
data-intensive science.
In my testimony today, I plan to do four things: first,
describe some of the large-scale data challenges in the DOE
Office of Science; second, examine the emerging role of machine
learning; third, discuss some of the incredible opportunities
for machine learning in science, which leverage DOE's role as a
leader in high-performance computing, applied mathematics,
experimental facilities, and team-based science; and fourth,
explore some of the challenges of machine learning and data-
intensive science.
Big-data challenges are often characterized by the four
``V's,'' the volume, that is the total size of data; the
velocity, the rate at which the data is being produced;
variability, the diversity of different types of data; and
veracity, the noise, errors, and the other quality issues in
the data. Scientific data has all of these.
Genomic data, for example, has grown by over a factor of
1,000 in the last decade, but the most abundant form of life,
microbes, are not well-understood. Microbes can fix nitrogen,
break down biomass for fuels, or fight algal blooms. DOE's
Joint Genome Institute has over 12 trillion bases--that is DNA
characters A, C, T, and G--of microbial DNA, enough to fill the
Library of Congress if you printed them in very boring books
that only contain those four characters.
But genome sequencers produce only fragments with errors,
and the DNA of the entire microbial community is all mixed
together. So it's like taking the Library of Congress,
shredding all of the books, throwing in some junk, and then
asking somebody to reconstruct the books from them. We use
supercomputers to do this, to assemble the pieces, to find the
related genes, and to compare the communities.
DOE's innovations are actually helping to create some of
these data challenges. The detectors used in electron
microscopes, which were developed at Berkeley Lab and since
commercialized, have produced data that's almost 10,000 times
faster than just ten years ago.
Machine learning is an amazingly powerful strategy for
analyzing data. Perhaps the most well-known example is
identifying images such as cats on the internet. A machine-
learning algorithm is fed a large set of, say, ten million
images of which some of them are labeled as having cats, and
the algorithm uses those images to build a model, sort of a
probability of which images are likely to contain cats. Now, in
science we're not looking for cats, but images arise in many
different scientific disciplines from electron microscopes to
light sources to telescopes.
Nobel laureate Saul Perlmutter used images of supernovae--
exploding stars--to measure the accelerating expansion of the
universe. The number of images produced each night from
telescopes has grown from tens per night to tens of millions
per night over the last 30 years. They used to be analyzed
manually by scientific experts, and now, much of that work has
been replaced by machine-learning algorithms. The upcoming LSST
telescope will produce 15 terabytes of data every night. If you
watch that, one night's worth of data as a movie, it would take
over ten years, so you can imagine why scientists are
interested in using machine learning to help them analyze that
data.
Machine learning can be used to find patterns that cluster
similar items or approximate complicated experiments. A recent
survey at Berkeley lab found over 100 projects that are using
some form of machine learning. They use it to track subatomic
particles, analyze light source data, search for new materials
for better batteries, improve crop yield, and identify abnormal
behavior on the power grid.
Machine learning, it does not replace the need for high-
performance computing simulations but adds a complementary tool
for science. Recent earthquake simulations of the bay area show
that just a 3-mile difference in location of an identical
building makes a significant difference in the safety of that
building. It really is all about location, location, location.
And the team that did this work is looking at taking data from
embedded sensors and eventually even from smart meters to give
even more detailed location-specific results.
There is tremendous enthusiasm for machine learning in
science but some cautionary notes as well. Machine-learning
results are often lacking in explanations, interpretations, or
error bars, a frustration for scientists. And scientific data
is complicated and often incomplete. The algorithms are known
to be biased by the data that they see. A self-driving car may
not recognize voices from Texas if it's only seen data from the
Midwest.
Chairman Weber. Hey, hey.
Dr. Yelick. Or we may miss a cosmic event in the southern
hemisphere if they've only seen data from telescopes in the
northern hemisphere. Foundational research in machine learning
is needed, along with the network to move the data to the
computers and share it with the community and make it as easy
to search for scientific data as it is to find a used car
online.
Machine learning has revolutionized the field of artificial
intelligence and it requires three things: large amounts of
data, fast computers, and good algorithms. DOE has all of
these. Scientific instruments are the eyes, ears, and hands of
science, but unlike artificial intelligence, the goal is not to
replicate human behavior but to augment it with superhuman
measurement control and analysis capabilities, empowering
scientists to handle data at unprecedented scales, provide new
scientific insights, and solve important societal challenges.
Thank you.
[The prepared statement of Dr. Yelick follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. Thank you, Doctor.
Dr. Nielsen, you're recognized for five minutes.
TESTIMONY OF DR. MATTHEW NIELSEN,
PRINCIPAL SCIENTIST,
INDUSTRIAL OUTCOMES OPTIMIZATION,
GE GLOBAL RESEARCH
Dr. Nielsen. Chairman Smith, Chairman Weber, and Chairwoman
Comstock, Ranking Members Veasey and Lipinski, and Members of
the Subcommittee, it is an honor to share General Electric's
perspective on innovative machine-learning-based approaches to
big-data science challenges that promote a more resilient,
efficient, and sustainable energy infrastructure. I am Matt
Nielsen, a Principal Scientist at GE's Global Research Center
in upstate New York.
The installed asset base of GE's power and renewable
businesses generates roughly 1/3 of the planet's power, and 40
percent of the world's electricity is managed by our software.
GE Energy's assets include everything from gas and steam power,
nuclear, grid solutions, energy storage, onshore and offshore
wind, and hydropower.
The nexus of physical and digital technologies is
revolutionizing what industrial assets can do and how they are
managed. One of the single most important questions industrial
companies such as GE are grappling with is how to most
effectively integrate the use of AI and machine learning into
their business operations to differentiate the products and
services they offer. GE has been on this journey for more than
a decade.
A key learning for us--and I can attest to this as being a
physicist--has been the importance of tying our digital
solutions to the physics of our machines and to the extensive
knowledge on how they are controlled. I'll now highlight a few
industrial applications of AI machine learning where GE is
collaborating with our customers and federal agencies like the
U.S. Department of Energy.
At GE, digital twins are a chief application of AI and
machine learning. Digital twins are living digital models of
industrial assets, processes, and systems that use machine
learning to see, think, and act on big data. Digital twins
learn from a variety of sources, including sensor data from the
physical machines or processes, fleet data, and industrial-
domain expertise. These computer models continuously update as
new data becomes available, enabling a near-real-time view of
the condition of the asset.
To date, GE scientists and engineers have created nearly
1.2 million digital twins. Many of the digital twins are
created using machine-learning techniques such as neural
networks. The application of digital twins in the energy sector
is enabling GE to revolutionize the operation and maintenance
of our assets and to drive new innovative approaches in
critical areas such as services and cybersecurity.
Now onto digital ghosts. Cyber threats to industrial
control systems that manage our critical infrastructure such as
power plants are growing at an alarming rate. GE is working
with the Department of Energy on a cost-shared program to build
the world's first industrial immune system for electric power
plants. It cannot only detect and localize cyber threats but
also automatically act to neutralize them, allowing the system
to continue to operate safely.
This effort engages a cross disciplinary team of engineers
from the global research and our power business. They are
pairing the digital twins that I mentioned of the power plants
machines, industrial controls knowledge, and machine learning.
The key again for this industrial immune system is the
combination of advanced machine learning with a deep
understanding of the machines' thermodynamics and physics.
We have demonstrated to date the ability to rapidly and
accurately detect and even localize simulated cyber threats
with nearly 99 percent accuracy using our digital ghost
techniques. We're also making significant progress now in
automatically neutralizing these threats. It is a great example
of how public-private research partnerships can advance
technically risky but universally needed technologies.
Along with improving cyber resiliency, AI and machine-
learning technologies are enabling us to improve GE's energy
services portfolio, helping our customers optimize and reduce
unplanned downtime for their assets. Through GE's asset
performance management platform, we help our customers avoid
disruptions by providing deep, real-time data insights on the
condition and operation of their assets. Using AI, machine
learning, and digital twins, we can better predict when
critical assets require repair or have a physical fault. This
allows our customers to move from a schedule-based maintenance
system to a condition-based maintenance system.
The examples I have shared and GE's extensive developments
with AI and machine learning have given us a first-hand
experience into what it takes to successfully apply these
technologies into our Nation's energy infrastructure. My full
recommendations are in my written testimony, and I'll only
summarize them here.
Number one, continue to fund opportunities for public-
private partnerships to expand the application and benefits of
AI and machine learning across the energy sector.
Two, encourage the collaboration between AI, machine
learning, and subject matter experts, engineers, and
scientists.
And number three, continue to invest in the Nation's high-
performance computing assets and expand opportunities for
private industry to work with the national labs.
I appreciate the opportunity to offer our perspective on
how the development of AI and machine-learning technologies can
meet the shared goals of creating a more efficient and
resilient energy infrastructure.
One final thought is to reinforce a theme that I've
emphasized throughout my testimony, and that is the importance
of having teams of physical and digital experts involved in
driving the future of AI and machine-learning solutions.
Thank you, and I look forward to answering any questions.
[The prepared statement of Dr. Nielsen follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. Thank you, Dr. Nielsen.
Dr. Rollett, you're recognized for five minutes.
TESTIMONY OF DR. ANTHONY ROLLETT,
U.S. STEEL PROFESSOR OF
MATERIALS SCIENCE AND ENGINEERING,
CARNEGIE MELLON UNIVERSITY
Dr. Rollett. So my thanks to Chairman Weber, Chairman
Smith, Chairwoman Comstock, Ranking Members Veasey and
Lipinski, and all the Members for your interest.
Speaking as a metallurgist, it's my pleasure and privilege
to testify before you because I've found big data and machine
learning, which depend on advanced computing, to be a never-
ending source of insight for my research, be it on additive
manufacturing or in developing new methods of research on
structural materials.
My bottom line is that there are pervasive opportunities,
as you've heard, to benefit from big data and machine learning.
Nevertheless, there are many challenges to be addressed in
terms of algorithm development, learning how to apply the
methods to new areas, transforming data into information,
upgrading curricula, and developing regulatory frameworks.
New and exciting manufacturing technologies such as 3-D
printing are coming on stream that generate big data, but they
need further development, especially for qualification, in
other words, the science that underpins the processes and
materials needed to satisfy requirements.
So consider that printing a part with a powder bed machine,
standard machine, requires 1,000-fold repetition of spreading a
hair's-breadth layer of powder, writing that desired shape in
each layer, shifting the part by that same hair's breadth, and
repeating. So if you think about taking a part and dividing the
dimension of that part by a hair's breadth, multiplied by yards
of laser-melting track, you can easily estimate that each part
contains miles and miles of tracks, hence, the big data.
The recent successes with machine learning have used data
that is already information-rich, as you've heard, cats, dogs,
and so on. And so to advanced manufacturing and basic science,
however, we have to find better ways to transform the data,
stream into a big information stream.
Another very important context is that education in all
STEM subjects needs to include the use of advanced computing
for data analysis and machine learning. And I know that this
Committee has focused on expanding computer science education,
so thank you for that.
So for printing, please understand that the machines are
highly functional and produce excellent results. Nevertheless,
if we're going to be able to qualify these machines to produce
reliable parts that can be used in, for example, commercial
aviation, we've got some work to do.
If I might ask for the video, Daniel, if you can manage to
get that to play. So I'd like to illustrate the challenges in
my own research.
[Video shown.]
Dr. Rollett. I often used the light sources, in other
words, x-rays from synchrotrons, most of which are curated by
the Department of Energy. I use several modes of
experimentation such as computer topography, diffraction
microscopy, and dynamic x-ray radiography. So this DXR
technique produces movies of the melting of the powder layers
exactly as it occurs in 3-D printing with the laser. And again,
at the micrometer scale you can see about a millimeter there.
And you can also see that the dynamic nature of the process
means that one must capture this at the same rate as, say, the
more familiar case of a bullet going through armor.
Over the last couple of years, we've gotten many deep
insights as to how the process works, but again, for the big-
data aspect, each of these experiments lasts about a
millisecond. That's about 500 times faster than you can blink.
And it provides gigabytes of images, hence, the big data.
Storing and transmitting such large amounts of data, which are
arriving at ever-increasing rates, is a challenge for this
vital public resource. I should say that the light sources
themselves are well aware of this challenge. Giving more
serious attention to such challenges requires funding agencies
to adopt the right vision in terms of recognizing the need for
fusion of data science with the specific applications.
I also want to say that cybersecurity is widely understood
to be an important problem with almost weekly stories about
data leaks and hacking efforts. What's not quite so well
understood is exactly how we're going to interface
manufacturing with cybersecurity.
So, in summary, I suggest that there are three areas of
opportunity. First, federal agencies should continue to support
the application of machine learning to advanced manufacturing,
particularly for the qualification of new technologies and
materials. I thank and commend all of my funders for supporting
these advances and particularly want to call out the FAA for
providing strong motivation here.
In the future, research initiatives should also seize the
potential for moonshot efforts on objectives such as
integrating artificial intelligence capabilities directly into
advanced manufacturing machines and advancing synergy between
technologies such as additive manufacturing and robotics.
Second, we need to continue to energize and revitalize STEM
education at all levels to reflect the importance of the data
in learning and computing with a focus on manufacturing. I
myself have had to learn these things as I've gone along.
Third, based on the evidence that machine learning is being
successfully applied in many areas, we should encourage
agencies to seek programs in areas where it's not so obvious
how to apply the new tools and to instantiate programs in
communities where data, machine learning, and advanced
computing are not yet prevalent.
Having traveled abroad extensively, I can assure you that
the competition is serious. Countries that we used to dismiss
out of hand, they're publishing more than we are and securing
more patents than we do.
Again, I thank you for the opportunity to testify and share
my views on this vital subject. I know that we will all be glad
to answer your questions.
[The prepared statement of Dr. Rollett follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Weber. Thank you, Doctor. I now recognize myself
for five minutes.
This question is for all the witnesses. You've all used
similar terminology in your testimonies like artificial
intelligence, machine learning, and deep learning. So that we
can all start off on the same page, I'll start with Dr.
Kasthuri. But could you explain what these terms mean and how
they relate to each other?
In the interest of time, I'm going to divvy these up. Dr.
Kasthuri, you take artificial intelligence. Dr. Yelick, you
take machine learning. Dr. Nielsen, you take deep learning. All
right? Doctor, you're up.
Dr. Kasthuri. Thank you, Chairman Weber. That's an
excellent question. In the interest of time I'm not going to
speak about artificial intelligence. There are clearly experts
sitting next to me. I'm interested in the idea of finding
natural intelligence wherever we can, and I would say that the
confusion that exists in these terminologies also exist when we
think about intelligence beyond the artificial space. And I'm
happy to--maybe perhaps after I let the other scientists speak
to talk about how we define natural intelligence different
ways, which might help elucidate the ways we define artificial
intelligence.
Chairman Weber. All right. Fair enough. Dr. Yelick, do you
feel that monkey on your back?
Dr. Yelick. Yes. Thank you very much for the question. So
let me try to cover a little bit of all three. So artificial
intelligence is a very long-standing subfield of computer
science looking at how to make computers behave with humanlike
behavior. And one of the most powerful techniques for some of
the subproblems in artificial intelligence such as computer
vision and speech processing are machine-learning algorithms.
These algorithms have been around for a long time, but the
availability of large amounts of labeled data and large amounts
of computing have really made them take off in terms of being
able to solve those artificial intelligence problems in certain
ways.
The specific type of machine learning is a broad class of
algorithms that come from statistics and computer science, but
the specific classes called deep learning algorithms, and I
won't go into the details. I will defer that if somebody else
wants to try to explain deep learning algorithms, but they are
used for these particular breakthroughs in artificial
intelligence.
I would say that the popular press often equates the word
artificial intelligence with the term deep learning because the
algorithms have been so powerful, and so that can create some
confusion.
Chairman Weber. All right. Thank you. Dr. Nielsen?
Dr. Nielsen. Yes, I'm not an expert in deep learning, but
we are practitioners of deep learning at GE. And really it's
taken off in, I would say, the last several years as we've seen
a rise in big data. So we have nearly 300,000 assets spread
globally and each one generating gigabytes of data. Now,
processing that gigabytes of data and trying to make sense of
it we're using deep learning techniques. It's a subfield, as
you mentioned, of machine-learning algorithms but allows us to
extract more information, more relationships if you will.
So, for example, we use deep learning to help us build a
computer model of a combined-cycle power plant, very complex
system, very complex thermodynamics. And it's only because we
have been able to collect now years and years of historical
data and then process it through a deep-learning algorithm. So,
for us, deep learning is a breakthrough enabled by advances in
computing technology, advances in big-data science, and it's
allowing us to build what we think is more complex models of
not only our assets but the processes that they perform.
Chairman Weber. And, Dr. Rollett, before you answer, you
issued a warning quite frankly in your statement that there's
been more patents filed by some of the foreign countries than
we are. Do you attribute that to what we're talking about here?
Go ahead.
Dr. Rollett. In very simple terms, I think what I'm calling
attention to is investment level in the science that underpins
all kinds of things, so whether it be the biology of the brain,
the functioning of the brain or how you make machines work, how
you construct machines, control algorithms, so on, and so
forth. That's really what I'm trying to get at.
Chairman Weber. Okay.
Dr. Rollett. And I'm trying to give you some support, some
ammunition that what you're doing as a committee, set of
Subcommittees is really worthwhile.
Chairman Weber. Yes, well, thank you. I appreciate that.
I'm going to move on to the second question. Several of you
mentioned your reliance on DOE facilities, which is, again,
what you're talking about, particularly light sources and
supercomputing which we are focused on, have been to a couple
of those for the types of big-data research that you all
perform and my question is how necessary is it for the United
States to keep up to date? You've already address that with the
patents statement, a warning that you issued, but what I want
to know is have any of you all--would you opine on who the
nearest competitor is? And have you interfaced with any
scientists or individuals from those companies? And if so, in
what field and in what way? Doctor?
Dr. Kasthuri. I would say that, internationally, sort of
the nearest two competitors to us are Germany and China. And in
general in the scientific world there is a tension between
collaboration and competition independent of whether the
scientist lives in America or doesn't live in America.
I think the good news is that for us at least in
neuroscience we realize that the scale of the problem is so
enormous and has so much opportunity, there's plenty of food
for everyone to eat. So right now, we live at the world of
cooperation between individual scientists where we share data,
share problems, and share solutions back and forth unless of
course familiar with what happens at levels much higher than
that.
Chairman Weber. Thank you. Dr. Yelick?
Dr. Yelick. Yes, in the area of high-performance computing
I would say the closest competitor at this point is China. And
in science we also like to look at derivatives, so what we
really see is that China is growing very, very rapidly in terms
of their leadership. At this point we do have the fastest
computer and the top-500 list in the United States, but of
course until recently that was the top two--the number-one and
-three machines were from China. But perhaps more importantly
than that there are actually more machines manufactured in
China on that list than there are machines that are fractured
in the United States, so there is a huge and growing interest,
and certainly a lot of research, a lot of funding in China for
artificial intelligence, machine learning, and all of that
applied to science and other problems.
Chairman Weber. Have you met with anybody from over in
China involved in the field?
Dr. Yelick. Yes. Last summer, I actually did a tour of all
of the major supercomputing facilities in China, so I got to
see what were the number-one and number-three machines at that
time--and was very impressed by the scientists. I think one of
the things that you see--and a lot of, by the way, very junior
scientists, the students that they are training in these areas,
they use these machines to also draw talent back to China from
the United States or to keep talent that was trained in China
in the United States. And they have very impressive people in
terms of the computer scientists and computational scientists.
Chairman Weber. And, Dr. Nielsen, very quickly because I'm
out of time.
Dr. Nielsen. Yes, I would just like to echo that, like Dr.
Rollett, we follow publications and patents, and we're seeing a
growing number from China, so I'd like to echo that just from
that statement. We're seeing growing interest in the use of
high-performance computing to go look at things like
cybersecurity from China, so obviously, that's the number-one
location we're looking at.
Chairman Weber. Good. Thank you, Dr. Rollett. I'm happy to
move on now. So I'm now going to recognize the gentlelady from
Oregon for five minutes.
Ms. Bonamici. Thank you very much, Mr. Chairman.
What an impressive panel and what a great conversation and
an important one.
I represent northwest Oregon where Intel is developing the
foundation for the first exascale machines. We know the
potential of high-performance computing and all energy
exploration, predicting climate weather, predictive and
preventive medicine, emergency response, just a tremendous
amount of potential. And we certainly recognize on this
Committee that investment in exascale systems and high-
performance computing is important for our economic
competitiveness, national security, and many reasons.
And we know--I also serve on the Education Committee, and I
know that our country has some of the best scientists and
programmers and engineers, but what really sets our country
apart is entrepreneurs and innovation. And those
characteristics require creative and critical thinking, which
is fostered through a well-rounded education, including the
arts.
I don't think anyone on this Committee is going to be
surprised to hear me mention the STEAM Caucus, which is--I'm
cochairing with Representative Stefanik from New York, working
on integrating arts and design into STEM, learning to educate
innovators. We have out in Oregon this wonderful organization
called Northwest Noggin, which is a collaboration of our
medical school, Oregon Health Sciences University, Portland
State University, Pacific Northwest College of Art, and the
Regional Arts and Culture Council. And they go around exciting
the public about ongoing taxpayer-supported neuroscience
research. And they're doing great work and expanding the number
of people who are interested in science and also communicating
with all generations and all people about the benefits of
science.
So, Dr. Rollett, in your testimony you talked about the
role of data analytics across manufacturing--the manufacturing
sector. And you noted that it's not necessarily going to be
important for all data analytic workers to have a computer
science degree, so what skills are most important for
addressing the opportunities? You did say in your testimony
that technology forces us to think differently about how to
make things, so talk about the next manufacturing center at
Carnegie Mellon and what you're doing to prepare students for
evolving fields? And we know as technology changes we need
intellectual flexibility as well, so how do you educate people
for that kind of work?
Dr. Rollett. So thank you for the opportunity to address
that. The way that we're approaching that is telling our
students don't be afraid of these new techniques. Jump in, try
them, and lo and behold, almost every time they're trying it--
sometimes it's a struggle, but almost every time that they try
it they're discovering, oh, this actually works. Even if it's
not big data in quite the sense that, say, Kathy would tell us,
even small data works.
So, for example, in these powder bed machines you spread a
layer. Well, if you just take a picture of that layer and then
another picture and you keep analyzing it and you use these
computer vision techniques, which are sort of a subset of
machine learning, lo and behold, you can figure out whether
your part is building properly or not. That's the kind of thing
that we've got to transmit to all of our students to say it's
not that bad, jump in and try it and little by little, you'll
get there.
Ms. Bonamici. I think over the years many students have
been very risk-averse and they don't want to risk taking
something where they might not get the best grade possible, so
we have to work on overcoming that because there's so much
potential out there until students have the opportunity to get
in and have some of that hands-on learning.
Dr. Yelick, I'm in the Northwest and it's not a question of
if but when we have an earthquake off the Northwest coast, and
a tsunami could be triggered of course by that earthquake along
the Cascadia subduction zone. So in your testimony you discuss
the research at Berkeley Lab to simulate a large magnitude
earthquake, and I listened very carefully because you were
talking about the effects on an identical building in different
areas. This data could be really crucial as we are assessing
the need for more resilient infrastructure not only in Oregon
but across the country. So what technical challenges are you
facing and sort of curating, sharing, and labeling and
searching that data? And what support can the federal
government provide to accelerate a resolution of these issues?
Dr. Yelick. Well, thank you very much for the question.
Yes, this is very exciting work that's going on, and simulating
earthquakes is currently at a regional scale. There are
technology challenges to trying to even get that to larger-
scale simulations, but I think even more importantly the work
that I talked about is trying to use information about the
geology to try to give you much more precise information about
the safety of a particular location.
And the challenge is to try to collect this data and then
to actually invert it, that is turn it into a model so you
collect the data and then in some sense you're trying to
develop a set of equations that say how that area--based on the
data that's been collected from little tiny seismic events,
it'll tell you something about how that particular subregion,
even a yard or a city block or something like that, how that
city block is going to behave in an earthquake. And you can use
the information from tiny seismic events and then to infer how
it will behave in a large significant earthquake. And so
there's technical challenge, mathematical challenges of doing
that, as well as the scale of computing for both doing the
data, inverting the data but also then doing the simulation.
And I think you bring up a very good point about the
community needs for these community data sets because you
really want to make it possible for many groups of people, not
just, for example, a power company that has smart meter data
but for other people to access that kind of data.
Ms. Bonamici. Thank you. And I want to follow up with that.
I'm running out of time, but as we talk about infrastructure
and investment in infrastructure, we know that by making better
decisions at the outset we can save lives and save property, so
the more information we have about where we're building and how
we're building is going to be a benefit to people across this
country, as well as in northwest Oregon. So thank you again to
this distinguished panel. I yield back.
Chairman Weber. Thank you, ma'am.
The gentlelady from Virginia, Mrs. Comstock, is recognized.
Mrs. Comstock. Thank you, Mr. Chairman, and thank all of
you here. This has been very interesting once again.
Now, I guess I'd ask to all of you, what are the unexamined
big-data challenges that could benefit from machine learning?
And what are the consequences for the United States for not
being the world leader in that if we aren't going forward in
the future? Maybe, Dr. Rollett, if you'd like to start. You
look like you had an answer ready to go, so----
Dr. Rollett. I'll give you a small example from my own
field. So when we deal with materials, then we have to look
inside the materials. So we typically take a piece of steel and
we cut it and we polish it and we take pictures of it. So
traditionally, what we've done is play the expert witness as it
were. You look at these pictures, which I often say resemble
more of a Jackson Pollock painting than anything that remotely
as a simple as a cat, and so the excitement in our field is
that we now have the tools that we can start to tease things
out of these pictures, that we go from something where we are
completely dependent on sort of gray-bearded experts to let the
computer do a lot of the job for you. And that speeds things up
and it automates them and it allows companies to detect
problems that they're running across. So it's just one example.
Dr. Kasthuri. Congresswoman Comstock, thank you for the
question. I have two sort of answers specifically to thinking
about brains and then to thinking about education. I think
these are the potential things that we can lose. One of the
things that I find fascinating about how our brains work is
that whether you are Einstein thinking up relativity or Mozart
making a concerto or you're just at home watching reality TV,
all brains operate at about 20 watts of energy. These light
bulbs in this room are probably at 60 watts of energy. And
although you might already think some of your colleagues are
dim bulbs, in this sense, what's amazing about the things that
they can accomplishes that they accomplish them at energy
efficiencies that are currently unheard of for any type of
algorithm.
So I feel like if we can leverage machine learning, deep
analytics, and understand how the brain passes information and
processes information for energies that are really energy
efficiencies unheard of in our current algorithms and robots,
that's a huge benefit to both the national and economic
securities of our country. That's the first.
And the second thing I'd like to add, the other reason that
it's important for us to lead now--and I'll do it by example--
is that in 1962 at Rice University John F. Kennedy announced
that we were going to the moon. And he announced it and in his
speech he said we're going to go to the moon--and I
paraphrase--not because it's easy but because it's hard and
because hard things test our mettle and test our capabilities.
The other interesting fact about that is that in 1969 when
we landed on the moon, the average age of a NASA scientist was
29 years old, so quick math suggests that when Kennedy
announced the moonshot, many of these people were in college.
They were students. And there was something inspirational about
positing something difficult, positing something visionary. And
I suspect that this has benefited us--in recruiting this
generation of scientists to the moonshot has benefited this
country in ways that we yet haven't calculated. And I suspect
that if we don't move now, we lose both of these opportunities,
among many others.
Mrs. Comstock. So it's really a matter of getting that
focus and attention and commitment so that you have that next
generation understanding this is really a long-term investment,
and we have a passion for it, so they will.
Dr. Kasthuri. Exactly.
Dr. Yelick. I'll just add briefly that I think we really
want to--in terms of the threat associated with this is really
about continuing to be a leader in computing but also about the
control and use of information. And you can see the kinds of
examples we've given are really important, and you hear about
it in the news about the control and use of information. We
need leaders in understanding how to do that and make sure that
information is used wisely.
We teach our freshmen at Berkeley a course in data science,
so whether they're going to go off and become English majors or
art majors or engineers, we think it's really important for
people to understand data.
Dr. Nielsen. And just real briefly, I'd like to build a
little bit on Dr. Rollett's comments. For us, we're seeing
tremendous benefit in big data for things like trying to better
predict when an aircraft engine part has to be repaired, when
it needs to be inspected, very critical for the safety of that
engine. For gas turbines, same thing. Wind parts need to be
inspected and repaired.
So where does big data come in? It comes in with
computational fluid dynamics, which we leverage--actually, the
high-performance computing infrastructure of the United States
materials science, material knowledge, trying to understand
grain structure, et cetera. So for us, that nexus of the
digital technologies with the physics, understanding the
thermodynamics of our assets are leading us into what I think
is just a better place to be from maintenance scheduling,
safety, resiliency, et cetera.
Mrs. Comstock. Thank you. I really appreciate all of your
answers.
I yield back, Mr. Chairman.
Chairman Weber. The gentleman from Virginia, Mr. Beyer, is
recognized for five minutes.
Mr. Beyer. Mr. Chairman, thank you very much, and thank you
all very much for doing this.
Dr. Kasthuri, so on the BRAIN Initiative I think obviously
the most--maybe the most exciting thing happening in the world
today, I was fascinated by this whole notion of the Connectome,
1 billion neurons with 1 quadrillion connections, you talk
about it being if you took--of all the written material in the
world into one data set, it'd just be a small fraction of the
size of this brain map. Is it possible that it's simpler than
that, that it sort of strains my understanding that there are
few things in nature that are as complex as that. Why in
evolution have we developed something that--and every human
being on the planet has a brain that's already--contains more
connections than every bit of written material?
Dr. Kasthuri. Congressman Beyer, that's a great question,
and like most scientists I'm going to do a little bit of
handwaving and a little bit of conjecture because the question
that you're asking is the question that we are trying to
accomplish. We know reasonably well that there are, as you
said, 100 billion brain cells, neurons, that make on order 1
quadrillion connections in the brain. Now, that--when I say the
data of that, I'm really talking about the raw image data. What
will it take to take a picture of every part of the brain and
if you added up all the data of all those pictures together, it
would be the largest data set ever collected.
Now, I suspect we have to do that at least once and then it
might be possible that there are patterns within that data that
then simplify the next time that we have to map your brain. One
way to think about this is that before we had a map of DNA, we
didn't realize that there was a pattern within DNA, meaning
every three nucleotides--A, C, T, et cetera--codes for a
protein. And that essentially simplifies the data structure to,
let's say, 1/3. I don't need to know, I just need to know that
these three things are an internal pattern that then gets
repeated again and again and again. And that was a fundamental
insight. We have no similar insight into the brain. Is there a
repetitive pattern that would actually reduce the amount of
data that we had to collect?
So, you're right, it might be that the second brain or the
third brain isn't going to be that much data, but now let me
give you the counter because as a scientist I have to do both
sides or all sides. The other thing we know is that each human
brain is unique, very much like a snowflake. Your brain, the
connectivity, the connections in your brain at some level have
to represent your life history, what your brain has
experienced.
And so the question for me--and I think it's really one of
the most important questions--is even within the snowflake
there are things that are unique to snowflakes but they're the
same. They either have seven arms are eight arms or six arms. I
get them confused with spiders, but it's one of those is the
answer. So there's regularity in a snowflake at the level of
the arms, but there is uniqueness at the level of the things
that jut out of the seven arms of the snowflake. And the
fundamental question is what is unique, what is the part that
makes each of us a neurological snowflake and what is common
between all of us? And that would be one of the very first
goals of doing a map is to discover the answer to your
question.
Mr. Beyer. Yes, well, thank you for a very thoughtful
answer. And I keep coming back to the Einstein notion that
always looking for the simplest answers, things that unify it
altogether. So here's another simple question. You talked in
your very first paragraph about reverse engineering human
cognition into our computers, good idea? At our most recent AI
hearing here a lot of the controversy was, you know, dealing
with Elon Musk and others and their concerns about what happens
when consciousness emerges in machines.
Dr. Kasthuri. Again, a fantastic question. Here's my
version of an answer. We deal with smarter things every day.
Many of our children, especially mine, wind up getting
consciousness and being smarter than us, certainly smarter than
me, but yet we don't worry about the fact that this next
generation of children, forever the next generation of children
will always be smarter than us because we've developed ways as
a society to instill in them the value systems that we have.
And there are multiple avenues for how we can instill in our
children the value systems that we have.
I suspect we might use the same things when we make smart
algorithms, the same way we make smart children. We won't just
produce smart algorithms but we'll instill in them the values
that we have the same way that we instill our values in our
children.
Now, that didn't answer your question of whether reverse
engineering the brain is a specific good idea for AI or not.
The only thing I would say is that no matter what we can
imagine AI--artificial intelligence doing, there is a
biological system that does that at more energy efficiency and
its speed for which that AI physical silicon system does not.
But I suspect these answers are probably best debated amongst
you and then you could tell us.
Mr. Beyer. Well, that was a very optimistic thing. I want
to say one of the things we do is we keep the car keys in those
circumstances.
Mr. Chairman, I yield back.
Chairman Weber. Thank you. The gentleman from Kansas is
recognized for five minutes.
Mr. Marshall. Well, thank you, Mr. Chairman.
Speaking of Kansas, I'm sure you all remember President
Eisenhower is the one who started NASA in 1958, but it was
President Kennedy, as several of you have stated, that, you
know, gave us the definitive goal to get to the moon. And as a
young boy I saw that before my eyes, the whole country wrapped
around that.
Each of you get one minute. What's your big, hairy,
audacious goal, your idea, it took 11 years, '58 to '69 to get
to the Moon. Where are we going to be in 11 years? Dr. Rollett,
we'll start with you and you each get one minute.
Dr. Rollett. I think we're going to see that manufacturing
is a much more clever operation. It understands the materials.
It understands how things are going to last, and it draws in a
much wider set of disciplines than it currently does. I have to
admit I don't exactly have an analogy to going to the moon, but
that's a very good challenge.
Mr. Marshall. What I like about your idea is that's going
to add to the GDP. Our GDP grows when we become more efficient,
not when federal government sends dollars to States for social
projects, so I love adding to GDP.
Dr. Nielsen, I guess you're next.
Dr. Nielsen. So I would love it if every one of our
assets--and I mentioned there are about 300,000 globally--had
their own digital twin, so every aircraft engine had its own
digital twin. A digital twin is a computer model that when the
asset is operating, we're collecting data. So imagine an
aircraft engine taking off. As soon as that aircraft engine
takes off, we pull the data back from the aircraft engine and
we update the computer model. That computer model becomes a
digital twin of the physical asset. If every one of our
300,000-plus assets had a digital twin, we'd be able to know
with very good precision when it needed to be maintained, when
it needed to be pulled off wing, what kind of repairs when it
went to a repair shop, what kind of repairs need to occur.
Mr. Marshall. You can do that with satellites and a whole
bunch of things.
Dr. Nielsen. We can pull back data from a whole variety of
different pathways. It's then utilizing that data in the most
efficient way, which we use machine learning and AI-type
technologies----
Mr. Marshall. Maybe get internet to rural places by doing
that, right?
Dr. Nielsen. Yes.
Mr. Marshall. Okay. We better go on. Dr. Yelick?
Dr. Yelick. So I think one of the biggest challenges is
understanding the microbiome and being able to use that
information about the microbiome in both health applications
and agriculture, in engineering, materials, and other areas.
So I think that we already know that your microbiome, your
own personal microbiome is associated with things like obesity,
diabetes, cardiovascular disease, and many other disorders. We
don't understand it as well in agriculture, but we're looking
at things like taking images of fields, putting biosensors into
the fields and putting all this information together to
understand how to make--to improve the microbiome to improve
crop yield and reduce other problems. So I think it's about
both understanding and controlling the microbiome, which is a
huge computational problem.
Mr. Marshall. Okay. Dr. Kasthuri?
Dr. Kasthuri. The thing I would really like to have done in
11 years is understand how brains learn. And actually it
reminds me of something that I should've said earlier about the
differences between artificial intelligence, machine learning,
deep learning, and how brains learn. The main difference is
that for many of these algorithms you have to provide them
thousands of examples, millions of examples, billions of
examples before they can then produce inferences or predictions
that are based on those examples.
For those of you with children, you know that that's not
the way children learn. They can learn in one example. They can
learn in half an example. Sometimes I don't even know where
they're learning these things. And when they learn something,
they learn not only the very specific details of that thing,
they can immediately abstract it to a bunch of other examples.
For me, this happened with my son the first time he learned
what a tiger was. An image of a tiger he could see, and then as
soon as he learned that, he could see a cartoon of a tiger, he
could see a tiger upside down, he could see the back of a tiger
or the side of a tiger, and from the first example be able to
infer, learn all of these other general applications.
If in 11 years we could understand how the brain does that
and then reverse engineer that into our algorithms and our
computers and robots, I suspect that will influence our GDP in
ways that we hadn't yet imagined.
Mr. Marshall. Okay. Thank you so much. I yield back.
Chairman Weber. I thank the gentleman.
The gentleman from the great State of Texas is recognized.
Mr. Veasey. Thank you, Mr. Chairman.
Dr. Rollett, am I pronouncing that right?
Dr. Rollett. It'll do.
Mr. Veasey. Okay. In your testimony you talk about the huge
amounts of data that are generated by experiments using light
sources to examine the processes involved in additive
manufacturing. You also highlight the need for more advanced
computing algorithms to help researchers extract information
from this data. And you state that we are essentially building
the infrastructure for digital engineering and manufacturing. I
was hoping that you'd be able to expand on that a little bit
and tell us also what are the necessary components of such
infrastructure.
Dr. Rollett. Right. So one of the things that I didn't have
time to talk about is where does the data go? And so, you know,
one's generating terabytes, the standard story is you go to a
light source, you do an experiment, all of that data has to go
on disk drives, and then you literally carry the disk drives
back home. So despite the substantial investments in the
internet and the data pipe so to speak, from the perspective of
an experiment, it's still somewhat clumsy. So even that
infrastructure could do with some attention.
It's also the case that the algorithms that exist have been
developed for a fairly specialized set of applications. So, you
know, the deep-learning methods, they exist, and what we're
doing at the moment is basically borrowing them and applying
them everywhere that we can. But, in other words, we haven't
gone very far with developing the specialized techniques or the
specialized applications.
So even that little movie that I showed, to be honest, I
mean, the furthest that we've got is doing very basic analysis
so far, and we actually need cleverer, more sophisticated
algorithms to analyze all of that information that's latent in
those images. I know that sounds like I'm not doing my job,
but, I'm just trying to get some idea across of the challenges
of taking techniques that have been worked up and then taking
them to a completely different domain and doing something
worthwhile.
Mr. Veasey. I was also hoping that you'd be able to
describe the progress your group has made in teaching computers
to recognize different kinds of metal power--powders using----
Dr. Rollett. Powders.
Mr. Veasey. --additive manufacturing. I think that you----
Dr. Rollett. Right.
Mr. Veasey. --go on to say that these successes have the
potential to impact improvements to materials, as well as the
generation of new materials. And I hope--was hoping you could
talk about that a little bit more and for the ability of a
computer to recognize different types of metal and improvements
to materials and how that can impact the development of new
materials.
Dr. Rollett. So thank you for the question. So I was trying
to think of a powder--I mean, think of talcum powder or
something like that. You spread some on a piece of paper and
you look at it and you think, well, that powder looks much like
any other powder. It looks like something you would use in the
garden or whatever. So the point I'm trying to get across is
that when you take these pictures of these materials, one
material looks much like another. However, when you take
pictures with enough resolution and you allow these machine-
learning algorithms to work on them, then what you discover is
they can see differences that no human can see.
So it turns out that you can use the computer to
distinguish powders from different sources, different
materials, so on and so forth. And that's pretty magic. That
means that you can again, if you're a company and you're using
these powders, you can detect whether you've got--you know, if
somebody's giving you what's supposed to be the same powder,
you can analyze it and say, no, it's not the same powder after
all. So there's considerable power in that.
Another example is things break, they fracture, and you
might be surprised, but there's quite a substantial business in
analyzing failures. You know, bicycles break and somebody has
to absorb the liability. Bridges crack; somebody has to deal
with that. Well, that's another case where the people involved
look at pictures of these fracture surfaces and they make
expert judgments.
So one of the things we're discovering is that we can
actually, again, use some of the computer vision techniques to
figure out if this fracture is a different kind of fracture or
this is a different fatigue failure that's occurred. Again,
it's magic. It opens up--not eliminating the expert, not at
all. The analogy is with radiography on cancers. It's helping
the experts to do a better job, to do a faster job, to be able
to help the people that they're working for.
Mr. Veasey. Thank you very much. I appreciate that.
And, Mr. Chairman, I yield back.
Chairman Weber. Thank you, sir.
The gentlelady from Arizona is now recognized.
Mrs. Lesko. Thank you, Mr. Chairman.
I have to say this Committee is really interesting. I learn
about all types of things and people studying the brains. I
think we're going to hear about flying cars sometime soon,
which is exciting. I'm from Arizona, and the issues that are
really big in my district, which are the suburbs of Phoenix
mostly, are actually national security and border security. And
we have two border ports of entry connecting Mexico and
Arizona, and I have the Luke Air Force Base in my Congressional
district. And so I was wondering if you had any ideas how
machine learning, artificial intelligence are being used in
border security and national security. If you have any
thoughts?
Dr. Yelick. Well, I can say generally speaking that in
national security, like in science, you're often looking for
some signal, some pattern in very noisy data. So whether you're
looking at telephones or you're looking at some other kind of
collected information, you are looking for patterns. And
machine learning is certainly used in that.
I'm not aware in border security of the current
applications of machine learning. I would think that things
like face-recognition software would probably be useful there,
and I just don't know of the current applications.
Dr. Nielsen. So I know some of the colleagues at our
research center are exploring things like security, using
facial recognition but trying to take it a step further, so
using principles of machine learning, et cetera, trying to
detect the intent of a person. So they'll use computer vision,
they'll watch a group of individuals but try to infer, make
inferences about the intent of what that group is doing. Is
there something going to happen? Who is in charge of this
group? What are they trying to do?
And they're working with the Department of Defense on many
of these applications. And I think there's going to be
tremendous breakthroughs where artificial intelligence and
machine learning are going to help us not only recognize people
but also trying now to recognize the intent of what that person
is trying to do.
Dr. Rollett. And you mentioned an Air Force Base, so
something that maybe not everybody's aware of is that the
military operates very old vehicles, and they have to repair
and replace a lot. And that means that manufacturing is not
just a matter of delivering a new aircraft; it's also a matter
of how you keep old aircraft going. I mean, think of the B-52s
and how old they are.
And so there are very important defense applications for
machine learning, for manufacturing, and manufacturing in the
repair-and-replace sense. And again, when you're running old
vehicles, you're very concerned about outliers, which hasn't
come up very much so far today, but taking data and recognizing
where you've got a case that's just not in the cloud, it's not
in with everybody else and figuring out what that means and how
you're going to deal with it.
Mrs. Lesko. Anyone else? There's one person left.
Dr. Kasthuri. Of course, yes. It's me. So of course my work
doesn't deal directly with either border security or national
security, but just to echo one other sentiment, one of the
things I'm interested in is that, as our cameras get faster,
instead of taking 30 shots per second, we can now take 60 shots
per second, 90 shots per second, 120 frames per second usually,
and you start watching people's facial features as they are
just engaging in normal life. It turns out that we produce a
lot of microfacial features that happen so fast and so quick
that they often aren't detected consciously by each other but
convey a tremendous amount of information about things like
intent and et cetera.
I suspect that, as our technology, as our cameras get
better and of course if you take 120 pictures in a second
versus 30 pictures in a second, that's already four times more
data that you're collecting per second. If we can deal with the
data and get better cameras, we will actually be making
inferences about intentions sooner rather than later.
Mrs. Lesko. Very interesting. I'm glad that you all work in
these different fields.
And I yield back my time, Mr. Chairman.
Chairman Weber. Thank you, ma'am.
The gentleman from Illinois, Mr. Foster, is recognized.
Mr. Foster. Thank you, Mr. Chairman. And thank you to our
witnesses.
And, let's see, I guess I'll start with some hometown
cheerleading for Argonne National Lab, which--and I find it
quite remarkable. Argonne lab has been--they've come out to
events that we've had in my district dealing with the opioid
crisis, I find it incredible that one single laboratory--we
have everything from using the advanced photon source and its
upgrades to directly image what are called G-coupled protein
receptors at the very heart of the chemical interaction with
the brain all the way up through modeling the high-level
function of the brain, the Connectome, and everything in
between. And it's really one of the magic things that happens
at Argonne and at all of the--particularly the multipurpose
laboratories, which are really gems of our country.
Now, one thing I'd like to talk about--and it relates to
big data and superconducting--is that you have to make a bunch
of technological bets in a situation where the technology is
changing really, really rapidly. You know, for example, you
have the choice of--for the data pipes, you can do
conventional, very wide floating point things for partial
differential equations and equations of state, things like
that, the way supercomputing has been done for years, and yet
there's a lot of movement for artificial intelligence toward
much narrower data paths, you know, 8 bits or even less or 1
bit if you're talking about simulating the brain firing or not.
You know, you have questions on the storage where you can
have--classically, we have huge external data sets, you know,
like the full geometry of the brain that you will then use
supercomputing to extract the Connectome. Or now we're seeing
more and more internally generated data sets like these are
games playing each other where you just generate the data,
throw it away. You don't care about storage at all. Or
simulation of billions of miles of driving where that data
never has to be stored at all, and so that really affects the
high-level design of these machines.
In Congress, we have to commit to projects, you know, on a
sort of five-year time cycle when every six months there are
new disruptive things. We have to decide are these largely
going to be front ends to quantum computing or not? And so how
do you deal with that sort of, you know, internally in your
planning? And should we move more toward the commercial model
of move fast, take risks, and break things, or do we have--are
our projects that we have to approve in Congress things that
have to have no chance of failing? And do you think Congress is
too far on one side or the other of that tradeoff?
Dr. Yelick. I guess as a computer scientist maybe I'll
start here and I would say that you've asked a very good
question. I think this issue of risk and technology is very
important, and we do need to take lots of risks and try lots of
things, especially right now as not only are processors not
getting any faster because of the end of Dennard scaling, but
we're facing the end of Moore's law, which is the end of
transistors getting denser on a chip. And we really need to try
a number of different things, including quantum, neuromorphic
computing, and others.
The issue of even the design of computers, if we look at
the exascale computing program, very important. Of course, the
first machine targeted for Argonne National Lab is in 2021, and
the process that is really fundamental to the exascale project
is this idea of codesign, that is, bringing together people who
understand the applications like Tony and with the people that
understand the applied mathematics, and people that understand
the computer architecture design.
And the exascale program is looking at both applying
machine-learning algorithms for things like the Cancer
Initiative, as well as the microbiome where you also have these
very tiny datatypes, only four characters that you can store in
maybe two bits, and putting all of that together. So those
machines are being codesigned to try to understand all those
different applications and work well on the traditional high-
performance simulation applications, as well as some of these
new data-analysis problems.
To answer your question directly, I think that, if
anything, that project is very focused on that goal of 2021,
and some other machines will come after that in '22 and '23.
And the application--so it's not just about delivering the
machines; it's about delivering 25 applications that are all
being developed at the same time to run on those machines.
It is a very exciting project. I actually lead the
microbiome project in exascale, and I think it's a great amount
of fun. But it is a project that doesn't have much room for
risk or basic research, and so I do think it's very important
to rebuild the fundamental research program, for example, the
Department of Energy to make sure that ten years from now we
could have some other kind of future program that we would have
the people that are trained in order to answer those basic
questions and figure out how to build another computing device
of some kind.
Mr. Foster. Well, yes, thank you. That was a very
comprehensive answer. But if you could just in my last one
second here just sort of--do you think Congress is being too
risk-averse in our expectations or, you know, should we be more
risk-tolerant that allow you occasionally to fail because you
made a technological bet that is--you know, that has not come
through?
Dr. Yelick. You know, I think I'll answer that from the
science perspective. As a scientist, I absolutely want to be
able to take risks and I want to be able to fail. I think the
Congressional question I will leave to you to debate.
Mr. Foster. Thank you. I yield back.
Chairman Weber. Thank you.
The gentleman from California, Mr. Rohrabacher, is
recognized.
Mr. Rohrabacher. Thank you very much, Mr. Chairman.
I wanted to get into some basics here. This is for the
whole panel. Who's going to be put out of work because of the
changes that you see coming as we do what's necessary to fully
understand what you're doing scientifically? Who's going to be
put out of work?
Dr. Rollett. I hope very much that nobody's going to be put
out of work.
Mr. Rohrabacher. Oh, you've got to be kidding. I mean,
whenever there's a change for the better, I mean, otherwise,
we'd have people working in----
Buggy whips would still be----
Dr. Rollett. Yes. I think the point here is to sustain
American industry at its most sophisticated and competitive
level.
Mr. Rohrabacher. What professions are going to be losing
jobs? You're making me--I mean, everybody's afraid to say that.
Come on, you know?
Dr. Rollett. I would say they've mostly been lost. I mean,
if you look at steel mills, we have steel mills. They used to
run with 30,000 people.
Mr. Rohrabacher. Right.
Dr. Rollett. That's why the population of Pittsburgh was so
large years ago, right? It's decreased enormously----
Mr. Rohrabacher. Okay. Well, where can we expect that in
the future from this new technology or this new understanding
of technology? Anybody want to tell me?
Dr. Kasthuri. I have a very quick----
Mr. Rohrabacher. Don't be afraid now.
Dr. Kasthuri. I have a very quick answer. Historically, a
lot of science is done on getting relatively cheap labor to
produce data and to analyze data, by that I mean graduate
students, postdoctoral fellows, young assistant professors, et
cetera. I suspect----
Mr. Rohrabacher. So they're not going to be needed
probably?
Dr. Kasthuri. Well, I suspect that they should still be
trained but then perhaps that they won't be used specifically
in just laboriously collecting data and analyzing data.
Mr. Rohrabacher. Okay. So let's go through that. Where are
the new jobs going to be created? What new jobs will be created
by the advances that you're advocating and want us to focus
some resources on?
Dr. Kasthuri. I'm hoping that when the people who are
trained in science no longer have to do all of that work, they
do--they then expand into other fields that could use
scientific education like the legal system or Congress.
Mr. Rohrabacher. But what specifically can we look at, say,
that will remind Congressmen always to turn off the ringer even
when it's their wife? Now, I'm in big trouble, okay? Tell me--
so, what jobs are going to be created? What can we expect from
what your research is in the future? Do you have a specific job
that you can say this--we're going to be able to do this, and
thus, people will have a job doing it?
Dr. Yelick. Well, I think there will be a lot more jobs in
big data and data analysis and things like that and more
interesting jobs I think going along with what was already
said, that it's really about replacing--so if we replace taxi
drivers with self-driving cars that eliminates a certain class
of jobs but it'll----
Mr. Rohrabacher. Okay. Well, there you go.
Dr. Yelick. Right, but it allows people to then spend their
time doing something more interesting such as perhaps analyzing
the future of the transportation system and things like that.
Mr. Rohrabacher. Well, but taxicab driver--finally, I got
somebody to admit somebody's going to be hurt and going to have
to change their life. And let me just note that happens with
every bit of progress. Some people are left out and they have
to form new type of lifestyles, and we need to understand that.
Maybe we need to prepare for it as we move forward.
What diseases do you think that--especially when we're
talking about controlling things that are going on in the human
mind, what diseases do you think that we can bring under
control that are out of control now? Diabetes, obviously has
something to do with the brain is telling the body what to do,
different--maybe even cancer? What diseases do you think that
we can have a chance of curing with this?
Dr. Kasthuri. I think there's a range of neurological
diseases that obviously we'll be able to do a better job curing
or ameliorating once we understand the brain. These range from
neurodegenerative diseases like Alzheimer's and Parkinson's to
more mental illness, psychiatric illnesses and to even early
developmental diseases like autism. I think all of these will
absolutely be benefited by a better understanding----
Mr. Rohrabacher. Then if we can control the way the brain
is functioning, the maladies that you're suffering like I say
diabetes and et cetera, that maybe we can tell the brain not to
do that and once we have that deeper understanding.
One last question. I got just a couple seconds. I remember
2001 Hal got out of control and tried to kill these people. And
Elon Musk is warning us. I understand somebody's already
brought that up. But if we do end up with very independent-
minded robots, which is what I think we're talking about here,
why shouldn't we think of that as a potential danger, as well
as a potential asset? I mean, Elon Musk is right in that.
Dr. Rollett. Well, I was going to throw in that I think one
opportunity would be in health care and for example, the use of
robots as assistants, so not replacing people but having robots
help them. Well, those robots have to be programmed, they have
to be built.
Mr. Rohrabacher. Right.
Dr. Rollett. I mean, there's a huge infrastructure that we
don't have.
Mr. Rohrabacher. Yes, but if you were building robots that
can think independently, who knows--you know, and they're
helping us in the hospitals or wherever it is, what if Hal gets
out of control?
Dr. Rollett. Right, right. So I think AI is being discussed
mostly in the context of how do you do something? How do you
make something work? When it comes to what these machines
actually do, you also need supervision. And what I think we
have to do is to build in AI that addresses control and
evaluation, you know, the equivalent of the little guy on your
shoulder saying don't do that; you're going to get into
trouble. So you need something like that, which I haven't heard
people talk about much.
Mr. Rohrabacher. Okay. Well, thank you very much, Mr.
Chairman. I yield back.
Chairman Weber. You've been watching too many
Schwarzenegger films.
Mr. Rohrabacher. That's true.
Chairman Weber. The gentleman yields back and, Mr.
McNerney, you're recognized for five minutes.
Mr. McNerney. I thank the Chairman. And I apologize to the
panel for having to step in and out in the hearing so far.
Mr. Nielsen, I'm a former wind engineer. I spent about 20
years in the business. And I understand that the digital twin
technology has allowed GE to produce--to increase production by
about 20 percent. Is that right?
Dr. Nielsen. About five percent on an average wind turbine,
yes.
Mr. McNerney. Five percent?
Dr. Nielsen. Five percent, which is pretty amazing when you
think we're not switching any of the hardware. It's just making
that control system on a wind turbine much smarter using a----
Mr. McNerney. And five percent is believable.
Dr. Nielsen. Five percent----
Mr. McNerney. Twenty percent for the wind farm----
Dr. Nielsen. No--yes, it's five percent for----
Mr. McNerney. Okay. Okay. I can believe that. As Chair of
the Grid Innovation Caucus, I'm particularly interested in
using new technology to create a smarter grid. We have things
like the duck curve that are affecting the grid. How can all
this technology improve grid stability and reliability and
efficiency and so on?
Dr. Nielsen. Yes, so we're now embarking on research for
understanding how to better integrate disparate power sources
together in regional, so imagine us trying to use AI machine
learning, say, okay, I have a single combined-cycle power
plant. How do I better optimize the efficiency of it, produce
less emissions, use less fuel, allow more profit from it? But
we're taking that now a step further and saying how do I then
look regionally and integrating not only that combined-cycle
power plant but the solar farm, the wind farm, et cetera? How
do I balance that and optimize at a grid-scale level versus
just a microscale level?
So that's some of the research that's ongoing now. We're
continuing to work on it. But that's our plan is to better
figure out that macroscale optimization problem.
Mr. McNerney. So, I mean, once you get that figured out,
then you need to have some sort of a SCADA or control system
that can dispatch and----
Dr. Nielsen. Yes, correct.
Mr. McNerney. Okay. So that's another product for GE or for
the other----
Dr. Nielsen. Yes. Correct.
Mr. McNerney. Okay.
Dr. Nielsen. We're figuring out how to not only build those
optimization routines but how to then put them in what we call
edge devices, the SCADA systems, the----
Mr. McNerney. Sure.
Dr. Nielsen. --unit control systems, et cetera. So it's not
only trying to figure out the algorithm but making sure that
algorithm can execute in a timescale that can be put into some
of these, as you mentioned, SCADA systems and control systems.
Mr. McNerney. Okay. Well, with the digital ghost, the--a
power plant can replicate an industrial system and the
component parts for cyber vulnerability. Is that right?
Dr. Nielsen. So we use digital ghost at what we call the
cyber physical layer. So imagine having a digital twin of a gas
turbine. So that digital twin tells us how that gas turbine is
behaving and should behave. We then compare to what signal is
being generated, what sensors are being--signal's been
generated, and we compare that behavior and say that behavior
doesn't look right. Our digital twin says something's not
correct. The thermodynamics aren't correct.
Mr. McNerney. Well, I mean, I can see that for mechanical--
--
Dr. Nielsen. Yes.
Mr. McNerney. --systems. What about cyber?
Dr. Nielsen. So what we're doing is we're not applying it
at sort of the network layer. We're not watching network
traffic. We're actually looking at the machine level and
understanding if the machine is behaving as it should be given
the inputs, the control signals, as well as the outputs, the
sensors, et cetera. Some recent attacks look at replicating
sensors----
Mr. McNerney. So the same sort of behavior characteristics
are going to be monitored--can tell you whether or not there's
a cyber issue or some other sort of mechanical failure----
Dr. Nielsen. Yes.
Mr. McNerney. --impending?
Dr. Nielsen. Perfect. It's a----
Mr. McNerney. Very good.
Dr. Nielsen. It's an anomaly detection scheme, yes.
Mr. McNerney. Dr. Yelick, thank you for coming. And I
visited your lab a number of times. It's always a pleasure to
do so. I think you guys are doing some really good work out
there.
One of the things that was striking was the work you did on
exascale computing, simulating a San Francisco earthquake and
how striking that is. Do you think we have the collective use--
have we collectively used this information to harden our
systems, to harden our communities against an earthquake, or is
that something that is yet to happen?
Dr. Yelick. That's something that is yet to happen. We're
just starting to see some of this very detailed information
coming from the simulations. And as I mentioned earlier, even
bringing in more detailed data into the simulations to give you
better geological information about the stability of a certain
region or even a certain local area, a city block or whatever,
and using that information is not something that is happening
yet but obviously should be.
Mr. McNerney. This is sort of a rhetorical question but
somebody can answer it if you feel like. I know we hear about
the social challenges of digital technology and AI and big
data, you know, in terms of job displacement. Does AI tell us
anything about that, about how we should respond to this
crisis?
Dr. Yelick. I don't know of any studies that have used AI
to do that. People do use AI to understand the market,
economics, and things like that, and I'm sure that people are
using large-scale data analytics of various kinds, and they
certainly are to understand changes in jobs and what will
happen with them.
It is, by the way, a very active area of discussion within
the computer science community about both the ethics, which you
heard about I think at previous hearing of AI, but also the
issues of replacing jobs.
Mr. McNerney. Sure. Dr. Rollett?
Dr. Rollett. If I might jump in, I would encourage you to
think about supporting research in policy and even social
science to address that issue because AI displacing people is
about education, it's about retraining, it's about how people
behave. So we scientists are really at sort of the front end of
this, but there's a lot of implications that are much broader
than what we've talked about this morning.
Mr. McNerney. All right. Thank you. Mr. Chairman, I yield
back.
Chairman Weber. Thank you, sir.
The gentleman from Florida, Dr. Dunn, is recognized.
Mr. Dunn. Thank you very much, Chairman Weber.
And I want to add my thank you to the panel and underscore
my personal belief in how important all of your work is. I've
visited Dr. Bobby Kasthuri's lab, a great fan of your work and
your energy level. Dr. Yelick, we'll be visiting you in the
near future, so that'll be fun, too.
I want to focus on the niche in big computing, which is
artificial intelligence, and I apologize I missed that hearing
earlier, but it was near and dear to my heart.
I think we all see many potential benefits of artificial
intelligence, but there are some potential problems, and I
think it serves us to face those as we're having this virtual
lovefest for artificial intelligence. You know, and we've known
this since at least the '60s. I mean, the Isaac Asimov robotic
novels and the robotic laws, the Three Laws of Robotics, which
I have in my printout, the copies of in case anybody doesn't
remember them. I bet this group does.
But what I want to do is--I also, by the way, was looking
for guides for artificial intelligence and I came up with the
12 Boy Scout laws, too, so I don't know how that--so I want to
offer some quotes and then get some thoughts from you, and
these are quotes from people who are recognizably smart people.
Stephen Hawking said, ``I think the development of artificial
intelligence could spell the end of the human race.'' Elon
Musk, quoted several times here, said, ``I think we should be
very careful about artificial intelligence. If I were to guess
what our biggest existential threat is, it's probably that.''
Bill Gates responded, ``I agree with Elon Musk and I don't
understand why people are concerned.''
And then finally, Jaan Tallinn, one of the inventors of
Skype, said with ``strong and artificial intelligence, planning
ahead is a better strategy than learning from mistakes.'' And
went on to say, ``It really sucks to be the number-two
intelligent species on the planet; just ask the gorillas.''
So in everybody's handout you have a very brief summary of
a series of experiments run at MIT on artificial intelligence.
The first one was named Norman, which was an artificial
intelligence educated on biased data, not false data but biased
data and turned into a deeply sociopathic intelligence. There
was another one Tay, which was really just an artificial
intelligence Twitterbot, which they turned loose into the
internet, and I think it wasn't the intention of the MIT
researchers, but people engaged with Tay and tried to provoke
it to say racist and inappropriate things, which it did. And
there are some other experiments from MIT as well.
So I want to note, like Dr. Kasthuri, I have sons that are
more clever than I, but they are not virtual supermen, nor do
they operate at the speed of light, so, you know, there's ways
of working with them. I'm not so sure about that with
artificial intelligence.
My question first, what are the implications of a future
where black-box machine learning, the process can't even be
interpreted? You know, once it gets several layers in, we can't
interpret it. What's the implications today on that to you, Dr.
Kasthuri and Dr. Yelick, if I could?
Dr. Kasthuri. Congressman Dunn, thank you for the kind
words to start. And I actually suspect there is a reasonable
concern that the things that we develop in artificial
intelligence are different than the other things like our
children because their ability to change is at the speed of
computers as opposed to the speed of our own. So I agree that
there's legitimate cause for concern.
I suspect that we will have to come up with lessons and
safeguards the same way that we've done with every existential
crisis: the discovery of nuclear energy, the application to
nuclear weapons. As humans, we do have some history of living
on the edge and figuring out how to get the benefit of
something and keep the risk at bay.
You're right that if algorithms can change faster than we
can think, our existing previous historical safeguards might
not work.
To the specific question that you asked about the non-
interpretability, for me, without knowing what the algorithm is
producing, how do you innovate? If you don't know the
fundamental nature of what the algorithm is--its principles for
how it comes to a conclusion, I worry that we won't be able to
innovate on those results.
And this is interestingly perhaps as a thought exercise:
What if a machine-learning algorithm could tell me--could
make--could collect enough data to make a prediction about a
brain, about your brain or someone else's brain that was
incredibly accurate? Would we at that moment care how that
machine-learning algorithm arrived at its conclusion? Or would
we at that moment take the results that the algorithm produces
and just go on with it, in which case there could be a missed
opportunity for learning something deeply fundamental and
principled about the brain.
Mr. Dunn. And very quickly, Dr. Yelick.
Dr. Yelick. Well, I agree with that. I think that these
deep learning algorithms which have these multiple layers,
which is why they're deep, they have millions perhaps of
parameters inside of them. And we don't really understand when
you get an answer out why all these parameters put together
tell you that that's a cat and this one's not a cat. And so
that may be okay if we're trying to figure out where to place
ads as long as we give it unbiased data about where the place
the ads so the right--so----
Mr. Dunn. But it might be more problem if it was flying a
drone swarm on attack some place?
Dr. Yelick. Well, where it's a problem is if I'm a
scientist, I want to understand why. It's not enough to say
there's a correlation between these two things. And if the, you
know, drone is flying in the right place, that's really
probably the most important thing about some kind of a
controlled vehicle. But in science, you want to----
Mr. Dunn. We're dangerously close to being way, way, way
over time, so I better yield back here, Mr.--thank you very
much, though. I appreciate the chance.
Chairman Weber. All right. The gentlelady from Nevada, Ms.
Rosen, is recognized.
Ms. Rosen. Thank you. I want to thank you for one of the
most interesting, informative, and I want to say this is on the
bleeding edge of everything that we need to worry about for
sure.
But one thing we haven't talked about is data storage. And
data storage specifically is critical infrastructure in this
country, right, because we have tons and tons of data
everywhere, and where it goes and how we keep it is going to be
of utmost importance.
And so I know that we're trying to focus on that in the
future, and in my district in Nevada we have a major data
storage company. It has state-of-the-art reliability. We have
lots of quality standards to ensure its data is secure, but
like I said, we don't consider it critical infrastructure.
So right now in this era of unprecedented data breaches,
data hacks, every moment they are just pounding on us, in your
view what are--the data storage centers that house the
government and private sector, where are their vulnerabilities
and what are the implications? How should we be sure that we
classify them as critical infrastructure?
Dr. Yelick. So, clearly, those data centers are storing
very important information that should be protected. And, as
you said, even at the computing centers that we run in the
labs, there's a constant barrage of attacks, although we store
at NERSC the center at Berkeley lab only scientific data, so it
is not really critical data. I think that using these kinds of
machine-learning techniques to look for patterns is one of the
best mechanisms we have to prevent attack, and they do have to
learn from these patterns in order to figure out what is--and--
what is abnormal behavior. And we're looking at--as we build
out the next network, even kind of embedding that information
into the network so that you can see patterns of attack even
before they get to a particular data set or a particular
computer system.
Ms. Rosen. Thank you. I have one other question. And you
were talking about using predictive analytics with a digital
twin to talk about fatigue in planes. But how can we use that
to discuss infrastructure fatigue as we talk about the
infrastructure failures around this country in bridges, roads,
ports, et cetera, et cetera? So----
Dr. Rollett. That's I think a question of recognizing the
need and talking to the agencies and finding out whether you
consider there are adequate programs to do that. I'm going to
guess that there is not a huge amount of activity, but I don't
know, so that's why I'm being very cautious in my answer.
But I suspect it's one of the opportunity areas. It's an
area where there is data. It's often rather incomplete, but it
would definitely benefit from having the techniques applied,
the machine-learning techniques to try to find the patterns, to
try to identify outliers, particularly trends that are not
good.
Ms. Rosen. Thank you.
Dr. Nielsen. I would just----
Ms. Rosen. Oh, please, yes. Yes.
Dr. Nielsen. Oh, I'm sorry. I would just second the
comments made. I mean, at GE we obviously focus a lot of our
attention on the commercial assets that we build, but there's
no reason the technologies, the ideas that are being applied
there could be applied to bridges and infrastructure and all
that.
Ms. Rosen. Right.
Dr. Nielsen. It's just, I think, a matter of will and
policy to do that, right?
Ms. Rosen. So I--do you think that would be well worth our
time here in this Committee to promote those kinds of policies
or research for you all or someone to do the--use the
predictive analytics? Congresswoman Esty and I sit on some
infrastructure committees, and really important that we try to
find out points of failure before they fail, right?
Dr. Rollett. Absolutely. And I would encourage you to bring
state and local government into that discussion because they
often own a lot of those assets.
Ms. Rosen. Yes. Thank you. I yield back my time.
Chairman Weber. The gentlelady yields back.
The gentlelady from Connecticut is recognized.
Ms. Esty. Thank you so much. And this is tremendously
important for this Committee and for the U.S. Congress to be
dealing with, and we really appreciate you taking the time with
us today.
All of you have mentioned somewhat in passing this critical
importance of how are the algorithms structured and how are we
going to embed the values if we have AI moving much faster than
our brains can function or at least on multiple levels
simultaneously?
So we did have a hearing last month in talking about this,
and one of the issues that came up that everyone supported--and
I'd like your thoughts on that--is the critical importance of a
diverse workforce in doing that. If you're going to try to
train AI, it needs to represent the diversity of human
experience, and therefore, it can't be like my son who did
computer science in astrophysics. If they all look like that,
if those are--the algorithms are all being developed by, you
know, 26-year-olds like my son Thomas, we're not going to have
the diversity of life experience.
So, first, if you can quickly--because I've got a couple of
questions--thoughts on how do we ensure that? Because we're
looking at that issue. We talk about that diverse workforce all
the time, but when we're looking at AI and algorithms, it
becomes vitally important that we do this. It's not about
checking the box to say the Department of Labor that we've got
a diverse workforce. This is actually vital to what we need to
do.
Dr. Yelick. So if I can just comment on that. Yesterday,
before I left UC Berkeley, I gave a lecture to the freshman
summer class introductory computing class. My title was rather
ostentatious as ``How to Save the World with Computing.'' What
I find is that when you talk about the applications of
computing and including data analytics and machine learning and
real problems that are societal problems, you tend to bring in
a much more diverse workforce. That class in particular has had
over 50 percent women and a very good representation at least
relative to the norm of underrepresented minorities as well.
Ms. Esty. Anyone else who--I mean it--MIT has found that
when they change the title of some of their computer science
classes to again be applied in sort of more political and
social realms, they had a dramatic change in terms of
composition of classes.
Dr. Nielsen. Yes, I would just quickly build upon that,
too. I think to me when you look at AI and machine learning,
you have to have a critical eye. You have to always be looking
at it. And I think a diverse workforce and diverse experience
can help just bring more perspectives to help critically
question why are those algorithms doing what they're doing?
What is the outcomes? How can we improve that? So I would
support that supposition, yes.
Dr. Yelick. I'll just mention that the name of the course--
which I was not teaching, by the way, I was giving a guest
lecture--is ``The Beauty and Joy of Computing,'' so maybe that
helps.
Ms. Esty. Well, that helps. And if I could have you turn
again--and some of you have mentioned the important role of
federal research. I mean that's what this Committee is looking
at, what is uniquely the federal role. As you see across the
board, there's more and more effort and being engaged and we
see it in space research and other places to move into the
private sector with the notion the federal government is not
very good at picking winners and losers. So if you can all talk
about what you think are the most critical tasks for federal
investment in, say, foundational and basic research that then
will be developed by the GE's and others and companies not yet
formed or conceived of because, again, that's part of our job
is to figure out--I see it as our job to defend putting those
basic research dollars in because we don't know where they're
going to go but we do know they're vital to keep us, whether
it's competitive or frankly just have better research and more
care.
Dr. Kasthuri. So perhaps I can go really quick. I suspect
that there is a model of funding scientific research that's
this idea that if you plant a million seeds in the ground, a
few flowers will grow, where individual labs and individual
scientists have the freedom to judge what is the next important
question to address.
And I can see why having the federal government decide the
next important question to address might not be the most
efficient way to push science forward. But where I do see the
federal government really playing a role is in the level of
facilities and resources, that what I imagine is that the
federal government establishes large-scale resources and
facilities like the national lab system and then allow
individual scientists to promote their individual ideas but
leveraging the federal resources. And I wonder if this is a
compromise between allowing these seeds to grow but the federal
government--maybe this is appropriate but maybe not--providing
the fertilizer for those seeds.
Ms. Esty. They think we generate a lot of it at least in
this place.
Dr. Yelick. So I would just add I think the importance of
fundamental research, as well as the facilities and
infrastructure and the applied mathematics, the computer
science, statistics, very important in machine learning. And,
as we said, these machine-learning algorithms have been used a
lot in nonscientific domains. There's a lot of interest in
applying them in scientific domains. I think the peer-review
process in science will make machine learning better for
everybody if we really put a lot of scrutiny on it.
Dr. Rollett. And very quickly, I wanted to add that I think
it's important that program managers in the federal government
have some discretion over what they fund and take risks. And
it's also important that the agencies have effective means of
getting community input. And I don't want to name names, but
some agencies have far more effective mechanisms for that than
others.
Ms. Esty. Well, we might want to follow up with that last
point.
And I wanted to just put out for you to help us with--and
you mentioned it, Dr. Yelick, with--on peer review, this
systematic--because of pressures to publish or perish and show
success is we are not sharing the failures, which are
absolutely essential for science to make progress. It's one of
the issues we've touched on a lot in this Committee. We don't
have any good answers, and it's gotten worse because of the
pressures to do--to get grant money and to show progress. But I
am deeply concerned about those pressures both from the private
sector and the public sector making it harder for us--people
hoard the, quote, ``bad results,'' but they're absolutely
essential for us to learn from them.
And so I don't know how we change that dynamic, but I think
that is something that we could really use your thoughts on
that because whether it's--AI can maybe help us with disclosing
the dead ends and we learn from the dead ends and we move
forward. But it is something that we have a big issue with in
how we deal with the sharing of the not-useful results, which
may turn out to be very useful down the line.
Dr. Yelick. I completely agree with that. I think the first
step in that is sharing the scientific data and allowing people
to reproduce the successful results but also, as you said,
examine the supposed failures to see--there are many examples
of this in physics and other disciplines where people go back
to data that may be 10 or 20 years old and find some new
discovery in it.
Ms. Esty. Thank you very much. I really appreciate your
indulgence to keep us here to the bitter end. Thank you. Not
the bitter, not you, just the fact that the bell has rung, and
we had a lot of questions for you. We appreciate it. Thank you
so much.
Chairman Weber. After failing 1,000 times for the
lightbulb, Dr. Edison, his staffer said doesn't that frustrate
you? He goes, what are you talking about? We're 1,000 ways
closer to success.
So I thank the witnesses for their testimony and the
Members for their questions. The record will remain open for
two weeks for additional written comments and written questions
from the Members.
This hearing is adjourned.
[Whereupon, at 12:08 p.m., the Subcommittees were
adjourned.]
Appendix I
----------
Answers to Post-Hearing Questions
Answers to Post-Hearing Questions
Responses by Dr. Bobby Kasthuri
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Responses by Dr. Katherine Yelick
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Responses by Dr. Matthew Nielsen
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Responses by Dr. Anthony Rollett
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Appendix II
----------
Additional Material for the Record
Documents submitted by Representative Neal Dunn
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]