[House Hearing, 115 Congress]
[From the U.S. Government Publishing Office]




 
                        BIG DATA CHALLENGES AND 
                      ADVANCED COMPUTING SOLUTIONS

=======================================================================

                             JOINT HEARING

                               BEFORE THE

                        SUBCOMMITTEE ON ENERGY &
                SUBCOMMITTEE ON RESEARCH AND TECHNOLOGY

              COMMITTEE ON SCIENCE, SPACE, AND TECHNOLOGY
                        HOUSE OF REPRESENTATIVES

                     ONE HUNDRED FIFTEENTH CONGRESS

                             SECOND SESSION

                               __________

                             JULY 12, 2018

                               __________

                           Serial No. 115-69

                               __________

 Printed for the use of the Committee on Science, Space, and Technology
 
 
 
 
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT] 
 


       Available via the World Wide Web: http://science.house.gov
       
       
       
                         _________ 

           U.S. GOVERNMENT PUBLISHING OFFICE
                   
30-879 PDF           WASHINGTON : 2018            
       

              COMMITTEE ON SCIENCE, SPACE, AND TECHNOLOGY

                   HON. LAMAR S. SMITH, Texas, Chair
FRANK D. LUCAS, Oklahoma             EDDIE BERNICE JOHNSON, Texas
DANA ROHRABACHER, California         ZOE LOFGREN, California
MO BROOKS, Alabama                   DANIEL LIPINSKI, Illinois
RANDY HULTGREN, Illinois             SUZANNE BONAMICI, Oregon
BILL POSEY, Florida                  AMI BERA, California
THOMAS MASSIE, Kentucky              ELIZABETH H. ESTY, Connecticut
RANDY K. WEBER, Texas                MARC A. VEASEY, Texas
STEPHEN KNIGHT, California           DONALD S. BEYER, JR., Virginia
BRIAN BABIN, Texas                   JACKY ROSEN, Nevada
BARBARA COMSTOCK, Virginia           CONOR LAMB, Pennsylvania
BARRY LOUDERMILK, Georgia            JERRY McNERNEY, California
RALPH LEE ABRAHAM, Louisiana         ED PERLMUTTER, Colorado
GARY PALMER, Alabama                 PAUL TONKO, New York
DANIEL WEBSTER, Florida              BILL FOSTER, Illinois
ANDY BIGGS, Arizona                  MARK TAKANO, California
ROGER W. MARSHALL, Kansas            COLLEEN HANABUSA, Hawaii
NEAL P. DUNN, Florida                CHARLIE CRIST, Florida
CLAY HIGGINS, Louisiana
RALPH NORMAN, South Carolina
DEBBIE LESKO, Arizona
                                 ------                                

                         Subcommittee on Energy

                   HON. RANDY K. WEBER, Texas, Chair
DANA ROHRABACHER, California         MARC A. VEASEY, Texas, Ranking 
FRANK D. LUCAS, Oklahoma                 Member
MO BROOKS, Alabama                   ZOE LOFGREN, California
RANDY HULTGREN, Illinois             DANIEL LIPINSKI, Illinois
THOMAS MASSIE, Kentucky              JACKY ROSEN, Nevada
STEPHEN KNIGHT, California           JERRY McNERNEY, California
GARY PALMER, Alabama                 PAUL TONKO, New York
DANIEL WEBSTER, Florida              BILL FOSTER, Illinois
NEAL P. DUNN, Florida                MARK TAKANO, California
RALPH NORMAN, South Carolina         EDDIE BERNICE JOHNSON, Texas
LAMAR S. SMITH, Texas
                                 ------                                

                Subcommittee on Research and Technology

                 HON. BARBARA COMSTOCK, Virginia, Chair
FRANK D. LUCAS, Oklahoma             DANIEL LIPINSKI, Illinois, Ranking 
RANDY HULTGREN, Illinois                 Member
STEPHEN KNIGHT, California           ELIZABETH H. ESTY, Connecticut
BARRY LOUDERMILK, Georgia            JACKY ROSEN, Nevada
DANIEL WEBSTER, Florida              SUZANNE BONAMICI, Oregon
ROGER W. MARSHALL, Kansas            AMI BERA, California
DEBBIE LESKO, Arizona                DONALD S. BEYER, JR., Virginia
LAMAR S. SMITH, Texas                EDDIE BERNICE JOHNSON, Texas
                            C O N T E N T S

                             July 12, 2018

                                                                   Page
Witness List.....................................................     2

Hearing Charter..................................................     3

                           Opening Statements

Statement by Representative Randy K. Weber, Chairman, 
  Subcommittee on Energy, Committee on Science, Space, and 
  Technology, U.S. House of Representatives......................     4
    Written Statement............................................     6

Statement by Representative Marc A. Veasey, Ranking Member, 
  Subcommittee on Energy, Committee on Science, Space, and 
  Technology, U.S. House of Representatives......................     8
    Written Statement............................................     9

Statement by Representative Barbara Comstock, Chairwoman, 
  Subcommittee on Research and Technology, Committee on Science, 
  Space, and Technology, U.S. House of Representatives...........    10
    Written Statement............................................    11

Statement by Representative Lamar Smith, Chairman, Committee on 
  Science, Space, and Technology, U.S. House of Representatives..    12
    Written Statement............................................    13

Written Statement by Representative Eddie Bernice Johnson, 
  Ranking Member, Committee on Science, Space, and Technology, 
  U.S. House of Representatives..................................    15

Written Statement by Representative Daniel Lipinski. Ranking 
  Member, Subcommittee on Research and Technology, Committee on 
  Science, Space, and Technology, U.S. House of Representatives..    17

                               Witnesses:

Dr. Bobby Kasthuri, Researcher, Argonne National Laboratory; 
  Assistant Professor, The University of Chicago
    Oral Statement...............................................    19
    Written Statement............................................    22

Dr. Katherine Yelick, Associate Laboratory Director for Computing 
  Sciences, Lawrence Berkeley National Laboratory; Professor, The 
  University of California, Berkeley
    Oral Statement...............................................    31
    Written Statement............................................    34

Dr. Matthew Nielsen, Principal Scientist, Industrial Outcomes 
  Optimization, GE Global Research
    Oral Statement...............................................    47
    Written Statement............................................    49

Dr. Anthony Rollett, U.S. Steel Professor of Materials Science 
  and Engineering, Carnegie Mellon University
    Oral Statement...............................................    57
    Written Statement............................................    59

Discussion.......................................................    66

             Appendix I: Answers to Post-Hearing Questions

Dr. Bobby Kasthuri, Researcher, Argonne National Laboratory; 
  Assistant Professor, The University of Chicago.................    92

Dr. Katherine Yelick, Associate Laboratory Director for Computing 
  Sciences, Lawrence Berkeley National Laboratory; Professor, The 
  University of California, Berkeley.............................    97

Dr. Matthew Nielsen, Principal Scientist, Industrial Outcomes 
  Optimization, GE Global Research...............................   104

Dr. Anthony Rollett, U.S. Steel Professor of Materials Science 
  and Engineering, Carnegie Mellon University....................   113

            Appendix II: Additional Material for the Record

Document submitted by Representative Neal P. Dunn, Committee on 
  Science, Space, and Technology, U.S. House of Representatives..   120


                          BIG DATA CHALLENGES


                    AND ADVANCED COMPUTING SOLUTIONS

                              ----------                              


                        THURSDAY, JULY 12, 2018

                  House of Representatives,
                         Subcommittee on Energy and
           Subcommittee on Research and Technology,
               Committee on Science, Space, and Technology,
                                                   Washington, D.C.

    The Subcommittees met, pursuant to call, at 10:15 a.m., in 
Room 2318, Rayburn House Office Building, Hon. Randy Weber 
[Chairman of the Subcommittee on Energy] presiding.

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]



    Chairman Weber. The Committee on Science, Space, and 
Technology will come to order.
    Without objection, the Chair is authorized to declare 
recess of the Subcommittees at any time.
    Good morning, and welcome to today's hearing entitled ``Big 
Data Challenges and Advanced Computing Solutions.'' I now 
recognize myself for five minutes for an opening statement.
    Today, we will explore the application of machine-learning-
based algorithms to big-data science challenges. Born from the 
artificial intelligence--AI--movement that began in the 1950s, 
machine learning is a data-analysis technique that gives 
computers the ability to learn directly from data without being 
explicitly programmed.
    Generally speaking--and don't worry; I'll save the detailed 
description for you all, our expert witnesses--machine learning 
is used when computers are trained--more than husbands are 
trained, right, ladies--on large data sets to recognize 
patterns in that data and learn to make future decisions based 
on these observations.
    Today, specialized algorithms termed ``deep learning'' are 
leading the field of machine-learning-based approaches. These 
algorithms are able to train computers to perform certain tasks 
at levels that can exceed human ability. Machine learning also 
has the potential to improve computational science methods for 
many big-data problems.
    As the Nation's largest federal sponsor of basic research 
in the physical sciences with expertise in big-data science, 
advanced algorithms, data analytics, and high-performance 
computing, the Department of Energy is uniquely equipped to 
fund robust fundamental research in machine learning. The 
Department also manages the 17 DOE national labs and 27 world-
leading scientific user facilities, which are instrumental to 
connecting basic science and advanced computing.
    Machine learning and other advanced computing processes 
have broad applications in the DOE mission space from high 
energy physics to fusion energy sciences to nuclear weapons 
development. Machine learning also has important applications 
in academia and industry. In industry, common examples of 
machine-learning techniques are in automated driving, facial 
recognition, and automated speech recognition.
    At Rice University near my home district, researchers seek 
to utilize machine-learning approaches to address challenges in 
geological sciences. In addition, the University of Houston's 
Solutions Lab supports research that will use machine learning 
to predict the behavior of flooding events and aid in 
evacuation planning. This would be incredibly beneficial for my 
district and all areas that are prone to hurricanes and to 
flooding. In fact, in Texas we're still recovering from 
Hurricane Harvey, the wettest storm in United States history.
    The future of scientific discovery includes the 
incorporation of advanced data analysis techniques like machine 
learning. With the next generation of supercomputers, including 
the exascale computing systems that DOE is expected to field by 
2021, American researchers utilizing these technologies will be 
able to explore even bigger challenges. With the immense 
potential for machine-learning technologies to answer 
fundamental scientific questions, provide the foundation for 
high-performance computing capabilities, and to drive future 
technological development, it's clear that we should prioritize 
this research.
    I want to thank our accomplished panel of witnesses for 
their testimony today, and I look forward to hearing what role 
Congress should play in advancing this critical area of 
research.
    [The prepared statement of Chairman Weber follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
   
    
    Chairman Weber. I now recognize the Ranking Member for an 
opening statement.
    Mr. Veasey. Thank you, Chairman Weber. Thank you, 
Chairwoman Comstock, and also, thank you to the distinguished 
panel for being here this morning.
    As you know, there are a growing number of industries today 
that are relying on generating and interpreting large amounts 
of data to overcome new challenges. The new--the energy sector 
in particular is making strides in leveraging these new 
technologies and techniques. Today, we're going to hear more 
about the advancements that we're going to see in the upcoming 
years.
    Sensor-equipped aircraft engines, locomotive, gas, and wind 
turbines are now able to track production efficiency and the 
wear and tear on vital machinery. This enables significant 
reductions in fuel consumption, as well as carbon emissions. 
The technologies are also significantly improving our ability 
to detect failures before they occur and prevent disasters, and 
by doing so will save money, will save time, and lives. And by 
using analytics, sensors, and operational data, we can manage 
and optimize systems ranging from energy storage components to 
power plants and to the electric grid.
    As digital technologies revolutionize the energy sector, we 
also must ensure the safe and responsible use of these 
processes. With our electric grid always in under persistent 
threats from everything from cyber to other modes of 
subterfuge, the security of these connected systems is of the 
utmost importance. Nevertheless, I'm excited to learn more 
about the value and benefits that these technologies may be 
able to provide for our economy and our environment alike.
    I'm looking forward to hearing what we can do in Congress 
to help guide and support the responsible development of these 
new data-driven approaches to the management of these evermore 
complex systems that our society is very dependent on.
    Thank you, and, Mr. Chairman, I yield back the balance of 
my time.
    [The prepared statement of Mr. Veasey follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
        
    Chairman Weber. Thank you, Mr. Veasey.
    I now recognize the Chairwoman of the Research and 
Technology Subcommittee, the gentlewoman from Virginia, Mrs. 
Comstock, for an opening statement.
    Mrs. Comstock. Thank you, Chairman Weber.
    A couple of weeks ago, our two Subcommittees joined 
together on a hearing to examine the state of artificial 
intelligence and the types of research being conducted to 
advance this technology. The Committee learned about the 
nuances of the term artificial intelligence, such as the 
difference between narrow and general AI and implications for a 
world in which AI is ubiquitous.
    Today, we delve deeper into disciplines originating from 
the AI movement of the 1950s that include machine learning, 
deep learning, and neural networks. Until recently, machine 
learning and especially deep-learning technologies were only 
theoretical because deep-learning models require massive 
amounts of data and computing power. But advances in high-
performance graphics, processing units, cloud computing, and 
data storage have made these techniques possible.
    Machine learning is pervasive in our day-to-day lives from 
tagging photos on Facebook to protecting emails with spam 
filters to using a virtual assistant like Siri or Alexa for 
information. Machine-learning-based algorithms have powerful 
applications that ultimately help make our lives more fun, 
safe, and informative.
    In the federal government, the Department of Energy stands 
out for its work in high-performance computing and approaches 
to big-data science challenges. The Energy Department 
researchers are using machine-learning approaches to study 
protein behavior, to understand the trajectories of patient 
health outcomes, and to predict biological drug responses. At 
Argonne National Laboratory, for example, researchers are using 
intensive machine-learning-based algorithms to attempt to map 
the human brain.
    A program of particular interest to me involves a DOE and 
Department of Veterans Affairs venture known as the MVP-
CHAMPION program. This joint collaboration will leverage DOE's 
high-performance computing and machine-learning capabilities to 
analyze health records of more than 20 million veterans 
maintained by the VA. The goal of this partnership is to arm 
the VA with data it can use to potentially improve health care 
offered to our veterans by developing new treatments and 
preventive strategies and best practices.
    The potential for AI to help humans and further scientific 
discoveries is obviously immense. I look forward to what our 
witnesses will testify to today about their work and--which may 
give us a glimpse into the revolutionary technologies of 
tomorrow that we're here to discuss.
    So I thank you, Mr. Chairman, and I yield back.
    [The prepared statement of Mrs. Comstock follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
    
    
    Chairman Weber. I thank the gentlelady.
    And let me introduce our witnesses. Our first witness is 
Dr. Bobby--Mr. Chairman, are you going to----
    Chairman Smith. Mr. Chairman, thank you. In the interest of 
time, I just ask unanimous consent to put my opening statement 
in the record.
    Chairman Weber. Without objection.
    [The prepared statement of Chairman Smith follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
    
    
    [The prepared statement of Ranking Member Johnson follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    

    
    
    [The prepared statement of Mr. Lipinski follows:]

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
    
    Chairman Weber. Thank you. I appreciate that.
    Now, I will introduce the witnesses. Our first witness is 
Dr. Bobby Kasthuri, the first neuroscience researcher at 
Argonne National Lab and an Assistant Professor in the 
Department of Neurobiology at the University of Chicago. You're 
busy. Dr. Kasthuri's current research focuses on innovation and 
new approaches to brain mapping, including the use of high-
energy x-rays from synchrotron sources for mapping brains in 
their entirety.
    He holds a Bachelor of Science from Princeton University, 
an M.D. from Washington University School of Medicine, and a 
Ph.D. from Oxford University where he studied as a Rhodes 
scholar. Welcome, Doctor.
    Our second witness today is Dr. Katherine Yelick, a 
Professor of Electrical Engineering and Computer Sciences at 
the University of California, Berkeley, and the Associate 
Laboratory Director for Computing at Lawrence Berkeley National 
Laboratory. Her research is in high-performance computing, 
programming languages, compilers, parallel algorithms, and 
automatic performance tuning.
    Dr. Yelick received her Bachelor of Science, Master of 
Science, and Ph.D. all in computer science at the Massachusetts 
Institute of Technology. Welcome, Dr. Yelick.
    Our next witness is Dr. Matthew Nielsen, Principal 
Scientist at the GE Global Research Center. Dr. Nielsen's 
current research focuses on digital twin and computer modeling 
and simulation of physical assets using first-principle physics 
and machine-learning methods.
    He received a Bachelor of Science in physics at Alma 
College in Alma, Michigan, and a Ph.D. in applied physics from 
Rensselaer.
    Dr. Nielsen. Rensselaer.
    Chairman Weber. Rensselaer, okay, Polytechnic Institute in 
Troy, New York. Welcome, Dr. Nielsen.
    And our final witness today is Dr. Anthony Rollett, the 
U.S. Steel Professor of Metallurgical Engineering and Materials 
Science at Carnegie Mellon University, a.k.a. CMU. Dr. Rollett 
has been a Professor of Materials Science Engineering at CMU 
for over 20 years and is the Co-Director of CMU's 
NextManufacturing Center. Dr. Rollett's research focuses on 
microstructural evolution and microstructure property 
relationships in 3-D.
    He received a Master of Arts in metallurgy and materials 
science from Cambridge University and a Ph.D. in materials 
engineering from Drexel University. Welcome, Dr. Rollett.
    I now recognize Dr. Kasthuri for five minutes to present 
his testimony. Doctor?

          TESTIMONY OF DR. BOBBY KASTHURI, RESEARCHER,

                  ARGONNE NATIONAL LABORATORY;

                      ASSISTANT PROFESSOR,

                   THE UNIVERSITY OF CHICAGO

    Dr. Kasthuri. Thank you. Chairman Smith, Chairman Weber, 
Chairwoman Comstock, Ranking Members Veasey and Lipinski, and 
Members of the Subcommittees, thank you for this opportunity to 
talk and appear before you. My name is Bobby Kasthuri. I'm a 
Neuroscientist at Argonne National Labs and an Assistant 
Professor in the Department of Neurobiology at the University 
of Chicago.
    And the reason I'm here talking to you today is because I 
think we are at a pivotal moment in our decades-long quest to 
understand the brain. And the reason we're at this pivotal 
moment is that we're actually witnessing in real time is the 
collision of two different disciplines, two different worlds, 
the worlds of computer science and neuroscience. And if we can 
nurture and develop this union, it could fundamentally change 
many things about our society.
    First, it could fundamentally change how we think about 
understanding the brain. It could change and revolutionize how 
we treat mental illness, and perhaps even more significantly, 
it can change how we think and imagine and build our future 
computers and our future robots based on how brains solve 
problems.
    The major obstacle between us and realizing this vision is 
that, for many neuroscientists, modern neuroscience is 
extremely expensive and extremely resource-intensive. To give 
you an idea of the scale, I thought it might help to give you 
an example of the enormity of the problem that we're trying to 
do.
    The human brain, your brains, probably contain on order 100 
billion brain cells or neurons, and the main thing that neurons 
do is connect with each other. And so in your brain there's 
probably--each neuron connects on average 10,000 times with 
10,000 other neurons. That means in your brain there are orders 
of magnitude more connections between neurons than stars in the 
Milky Way galaxy. And what's even more important for 
neuroscientists is that we believe that this map, this map of 
you, this map of connections contains all of the things that 
make us human. Our creativity, our ability to think critically, 
our fears, our dreams are all contained in that map.
    But unfortunately, that map, if we were to do it, wouldn't 
be one gigabyte of data; it wouldn't be 100 gigabytes of data. 
It could be on order a billion gigabytes of data, perhaps the 
largest data set about anything ever collected in the history 
of humanity. The problem is that for many neuroscientists even 
analyzing a fraction of this map is beyond their resources, the 
resources of their laboratory, the resources of the 
universities, and perhaps the resources of even large 
institutions. And if we don't address this gap, then what will 
happen is that only the richest neuroscientists will be able to 
answer their questions, and we would like every neuroscientist 
to have access to answer the most important questions about 
brains and ultimately promote this fusion of computer science 
and neuroscience.
    Luckily, there is a potential solution, and the potential 
solution is the Department of Energy and the national lab 
system, which is part of the Department of Energy. As stewards 
of our scientific architecture, as stewards of some of the most 
advanced technological and computing capabilities available, 
the Department of Energy and the national labs can address this 
gap, and in fact, they do address this gap in many different 
sciences.
    If I was a young astrophysicist or a young materials 
scientist, no one would expect me to get money and build my own 
space telescope. Instead, I would leverage the amazing 
resources of the national lab system to answer my fundamental 
questions. And although many fields of science have learned how 
to leverage the expertise and the resources available in the 
national lab system, neuroscientists have not.
    A national center for brain mapping situated within the DOE 
lab system could actually be a sophisticated clearinghouse to 
ensure that the correct physics and engineering and computer 
science tools are vetted and accessible for measuring brain 
structure and brain function. Since the national labs are also 
the stewards of our advanced computing infrastructure, they're 
ideally suited to incubate these revolutions in computer and 
neurosciences.
    Decades earlier, as a biologist, I just recently learned 
that the DOE and the national labs helped usher in humanity's 
perhaps greatest scientific achievement of the 20th century, 
the mapping of the human genome and the understanding of the 
genetic basis of life. We believe that the DOE and the national 
lab system can make a similar contribution to understanding the 
human brain.
    Other countries like Japan, South Korea, and China, 
cognizant of the remarkable benefits to economic and national 
security that understanding brains and using them to make 
computer science better have already invested in national 
efforts in artificial intelligence and national efforts to 
understand the brain. The United States has not yet, and I 
think it's important at the end of my statement for everyone to 
remember that we are the ones who went to the moon, we are the 
ones who harnessed the power of nuclear energy, and we are the 
ones that led the genomic revolution. And I suspect it's the 
moment now for the United States to lead again, to map and help 
reverse engineer the physical substrates of human thought, 
arguably the most challenging quest of the 21st century and 
perhaps the last great scientific frontier.
    Thank you for your time and attention today. I welcome any 
questions you might have.
    [The prepared statement of Dr. Kasthuri follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
    
    
    Chairman Weber. Thank you, Doctor.
    Dr. Yelick, you're recognized for five minutes.

               TESTIMONY OF DR. KATHERINE YELICK,

                 ASSOCIATE LABORATORY DIRECTOR

                    FOR COMPUTING SCIENCES,

             LAWRENCE BERKELEY NATIONAL LABORATORY;

       PROFESSOR, THE UNIVERSITY OF CALIFORNIA, BERKELEY

    Dr. Yelick. Chairman Smith, Chairman Weber, Chairwoman 
Comstock, Ranking Members Veasey and Lipinski, distinguished 
Members of the Committee, thank you for holding this hearing 
and for the Committee's support for science. And thank you for 
inviting me to testify.
    My name is Kathy Yelick and I'm the Associate Laboratory 
Director for Computing Sciences at Lawrence Berkeley National 
Laboratory, a DOE Office of Science laboratory managed by the 
University of California. I'm also Professor of Electrical 
Engineering and Computer Sciences at the University of 
California, Berkeley.
    Berkeley Lab is home to five national scientific user 
facilities serving over 10,000 researchers covering all 50 
States. The combination of experimental, computational, and 
networking facilities puts Berkeley Lab on the cutting edge of 
data-intensive science.
    In my testimony today, I plan to do four things: first, 
describe some of the large-scale data challenges in the DOE 
Office of Science; second, examine the emerging role of machine 
learning; third, discuss some of the incredible opportunities 
for machine learning in science, which leverage DOE's role as a 
leader in high-performance computing, applied mathematics, 
experimental facilities, and team-based science; and fourth, 
explore some of the challenges of machine learning and data-
intensive science.
    Big-data challenges are often characterized by the four 
``V's,'' the volume, that is the total size of data; the 
velocity, the rate at which the data is being produced; 
variability, the diversity of different types of data; and 
veracity, the noise, errors, and the other quality issues in 
the data. Scientific data has all of these.
    Genomic data, for example, has grown by over a factor of 
1,000 in the last decade, but the most abundant form of life, 
microbes, are not well-understood. Microbes can fix nitrogen, 
break down biomass for fuels, or fight algal blooms. DOE's 
Joint Genome Institute has over 12 trillion bases--that is DNA 
characters A, C, T, and G--of microbial DNA, enough to fill the 
Library of Congress if you printed them in very boring books 
that only contain those four characters.
    But genome sequencers produce only fragments with errors, 
and the DNA of the entire microbial community is all mixed 
together. So it's like taking the Library of Congress, 
shredding all of the books, throwing in some junk, and then 
asking somebody to reconstruct the books from them. We use 
supercomputers to do this, to assemble the pieces, to find the 
related genes, and to compare the communities.
    DOE's innovations are actually helping to create some of 
these data challenges. The detectors used in electron 
microscopes, which were developed at Berkeley Lab and since 
commercialized, have produced data that's almost 10,000 times 
faster than just ten years ago.
    Machine learning is an amazingly powerful strategy for 
analyzing data. Perhaps the most well-known example is 
identifying images such as cats on the internet. A machine-
learning algorithm is fed a large set of, say, ten million 
images of which some of them are labeled as having cats, and 
the algorithm uses those images to build a model, sort of a 
probability of which images are likely to contain cats. Now, in 
science we're not looking for cats, but images arise in many 
different scientific disciplines from electron microscopes to 
light sources to telescopes.
    Nobel laureate Saul Perlmutter used images of supernovae--
exploding stars--to measure the accelerating expansion of the 
universe. The number of images produced each night from 
telescopes has grown from tens per night to tens of millions 
per night over the last 30 years. They used to be analyzed 
manually by scientific experts, and now, much of that work has 
been replaced by machine-learning algorithms. The upcoming LSST 
telescope will produce 15 terabytes of data every night. If you 
watch that, one night's worth of data as a movie, it would take 
over ten years, so you can imagine why scientists are 
interested in using machine learning to help them analyze that 
data.
    Machine learning can be used to find patterns that cluster 
similar items or approximate complicated experiments. A recent 
survey at Berkeley lab found over 100 projects that are using 
some form of machine learning. They use it to track subatomic 
particles, analyze light source data, search for new materials 
for better batteries, improve crop yield, and identify abnormal 
behavior on the power grid.
    Machine learning, it does not replace the need for high-
performance computing simulations but adds a complementary tool 
for science. Recent earthquake simulations of the bay area show 
that just a 3-mile difference in location of an identical 
building makes a significant difference in the safety of that 
building. It really is all about location, location, location. 
And the team that did this work is looking at taking data from 
embedded sensors and eventually even from smart meters to give 
even more detailed location-specific results.
    There is tremendous enthusiasm for machine learning in 
science but some cautionary notes as well. Machine-learning 
results are often lacking in explanations, interpretations, or 
error bars, a frustration for scientists. And scientific data 
is complicated and often incomplete. The algorithms are known 
to be biased by the data that they see. A self-driving car may 
not recognize voices from Texas if it's only seen data from the 
Midwest.
    Chairman Weber. Hey, hey.
    Dr. Yelick. Or we may miss a cosmic event in the southern 
hemisphere if they've only seen data from telescopes in the 
northern hemisphere. Foundational research in machine learning 
is needed, along with the network to move the data to the 
computers and share it with the community and make it as easy 
to search for scientific data as it is to find a used car 
online.
    Machine learning has revolutionized the field of artificial 
intelligence and it requires three things: large amounts of 
data, fast computers, and good algorithms. DOE has all of 
these. Scientific instruments are the eyes, ears, and hands of 
science, but unlike artificial intelligence, the goal is not to 
replicate human behavior but to augment it with superhuman 
measurement control and analysis capabilities, empowering 
scientists to handle data at unprecedented scales, provide new 
scientific insights, and solve important societal challenges.
    Thank you.
    [The prepared statement of Dr. Yelick follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
    
    
    Chairman Weber. Thank you, Doctor.
    Dr. Nielsen, you're recognized for five minutes.

               TESTIMONY OF DR. MATTHEW NIELSEN,

                      PRINCIPAL SCIENTIST,

               INDUSTRIAL OUTCOMES OPTIMIZATION,

                       GE GLOBAL RESEARCH

    Dr. Nielsen. Chairman Smith, Chairman Weber, and Chairwoman 
Comstock, Ranking Members Veasey and Lipinski, and Members of 
the Subcommittee, it is an honor to share General Electric's 
perspective on innovative machine-learning-based approaches to 
big-data science challenges that promote a more resilient, 
efficient, and sustainable energy infrastructure. I am Matt 
Nielsen, a Principal Scientist at GE's Global Research Center 
in upstate New York.
    The installed asset base of GE's power and renewable 
businesses generates roughly 1/3 of the planet's power, and 40 
percent of the world's electricity is managed by our software. 
GE Energy's assets include everything from gas and steam power, 
nuclear, grid solutions, energy storage, onshore and offshore 
wind, and hydropower.
    The nexus of physical and digital technologies is 
revolutionizing what industrial assets can do and how they are 
managed. One of the single most important questions industrial 
companies such as GE are grappling with is how to most 
effectively integrate the use of AI and machine learning into 
their business operations to differentiate the products and 
services they offer. GE has been on this journey for more than 
a decade.
    A key learning for us--and I can attest to this as being a 
physicist--has been the importance of tying our digital 
solutions to the physics of our machines and to the extensive 
knowledge on how they are controlled. I'll now highlight a few 
industrial applications of AI machine learning where GE is 
collaborating with our customers and federal agencies like the 
U.S. Department of Energy.
    At GE, digital twins are a chief application of AI and 
machine learning. Digital twins are living digital models of 
industrial assets, processes, and systems that use machine 
learning to see, think, and act on big data. Digital twins 
learn from a variety of sources, including sensor data from the 
physical machines or processes, fleet data, and industrial-
domain expertise. These computer models continuously update as 
new data becomes available, enabling a near-real-time view of 
the condition of the asset.
    To date, GE scientists and engineers have created nearly 
1.2 million digital twins. Many of the digital twins are 
created using machine-learning techniques such as neural 
networks. The application of digital twins in the energy sector 
is enabling GE to revolutionize the operation and maintenance 
of our assets and to drive new innovative approaches in 
critical areas such as services and cybersecurity.
    Now onto digital ghosts. Cyber threats to industrial 
control systems that manage our critical infrastructure such as 
power plants are growing at an alarming rate. GE is working 
with the Department of Energy on a cost-shared program to build 
the world's first industrial immune system for electric power 
plants. It cannot only detect and localize cyber threats but 
also automatically act to neutralize them, allowing the system 
to continue to operate safely.
    This effort engages a cross disciplinary team of engineers 
from the global research and our power business. They are 
pairing the digital twins that I mentioned of the power plants 
machines, industrial controls knowledge, and machine learning. 
The key again for this industrial immune system is the 
combination of advanced machine learning with a deep 
understanding of the machines' thermodynamics and physics.
    We have demonstrated to date the ability to rapidly and 
accurately detect and even localize simulated cyber threats 
with nearly 99 percent accuracy using our digital ghost 
techniques. We're also making significant progress now in 
automatically neutralizing these threats. It is a great example 
of how public-private research partnerships can advance 
technically risky but universally needed technologies.
    Along with improving cyber resiliency, AI and machine-
learning technologies are enabling us to improve GE's energy 
services portfolio, helping our customers optimize and reduce 
unplanned downtime for their assets. Through GE's asset 
performance management platform, we help our customers avoid 
disruptions by providing deep, real-time data insights on the 
condition and operation of their assets. Using AI, machine 
learning, and digital twins, we can better predict when 
critical assets require repair or have a physical fault. This 
allows our customers to move from a schedule-based maintenance 
system to a condition-based maintenance system.
    The examples I have shared and GE's extensive developments 
with AI and machine learning have given us a first-hand 
experience into what it takes to successfully apply these 
technologies into our Nation's energy infrastructure. My full 
recommendations are in my written testimony, and I'll only 
summarize them here.
    Number one, continue to fund opportunities for public-
private partnerships to expand the application and benefits of 
AI and machine learning across the energy sector.
    Two, encourage the collaboration between AI, machine 
learning, and subject matter experts, engineers, and 
scientists.
    And number three, continue to invest in the Nation's high-
performance computing assets and expand opportunities for 
private industry to work with the national labs.
    I appreciate the opportunity to offer our perspective on 
how the development of AI and machine-learning technologies can 
meet the shared goals of creating a more efficient and 
resilient energy infrastructure.
    One final thought is to reinforce a theme that I've 
emphasized throughout my testimony, and that is the importance 
of having teams of physical and digital experts involved in 
driving the future of AI and machine-learning solutions.
    Thank you, and I look forward to answering any questions.
    [The prepared statement of Dr. Nielsen follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
   
    
    Chairman Weber. Thank you, Dr. Nielsen.
    Dr. Rollett, you're recognized for five minutes.

               TESTIMONY OF DR. ANTHONY ROLLETT,

                    U.S. STEEL PROFESSOR OF

               MATERIALS SCIENCE AND ENGINEERING,

                   CARNEGIE MELLON UNIVERSITY

    Dr. Rollett. So my thanks to Chairman Weber, Chairman 
Smith, Chairwoman Comstock, Ranking Members Veasey and 
Lipinski, and all the Members for your interest.
    Speaking as a metallurgist, it's my pleasure and privilege 
to testify before you because I've found big data and machine 
learning, which depend on advanced computing, to be a never-
ending source of insight for my research, be it on additive 
manufacturing or in developing new methods of research on 
structural materials.
    My bottom line is that there are pervasive opportunities, 
as you've heard, to benefit from big data and machine learning. 
Nevertheless, there are many challenges to be addressed in 
terms of algorithm development, learning how to apply the 
methods to new areas, transforming data into information, 
upgrading curricula, and developing regulatory frameworks.
    New and exciting manufacturing technologies such as 3-D 
printing are coming on stream that generate big data, but they 
need further development, especially for qualification, in 
other words, the science that underpins the processes and 
materials needed to satisfy requirements.
    So consider that printing a part with a powder bed machine, 
standard machine, requires 1,000-fold repetition of spreading a 
hair's-breadth layer of powder, writing that desired shape in 
each layer, shifting the part by that same hair's breadth, and 
repeating. So if you think about taking a part and dividing the 
dimension of that part by a hair's breadth, multiplied by yards 
of laser-melting track, you can easily estimate that each part 
contains miles and miles of tracks, hence, the big data.
    The recent successes with machine learning have used data 
that is already information-rich, as you've heard, cats, dogs, 
and so on. And so to advanced manufacturing and basic science, 
however, we have to find better ways to transform the data, 
stream into a big information stream.
    Another very important context is that education in all 
STEM subjects needs to include the use of advanced computing 
for data analysis and machine learning. And I know that this 
Committee has focused on expanding computer science education, 
so thank you for that.
    So for printing, please understand that the machines are 
highly functional and produce excellent results. Nevertheless, 
if we're going to be able to qualify these machines to produce 
reliable parts that can be used in, for example, commercial 
aviation, we've got some work to do.
    If I might ask for the video, Daniel, if you can manage to 
get that to play. So I'd like to illustrate the challenges in 
my own research.
    [Video shown.]
    Dr. Rollett. I often used the light sources, in other 
words, x-rays from synchrotrons, most of which are curated by 
the Department of Energy. I use several modes of 
experimentation such as computer topography, diffraction 
microscopy, and dynamic x-ray radiography. So this DXR 
technique produces movies of the melting of the powder layers 
exactly as it occurs in 3-D printing with the laser. And again, 
at the micrometer scale you can see about a millimeter there. 
And you can also see that the dynamic nature of the process 
means that one must capture this at the same rate as, say, the 
more familiar case of a bullet going through armor.
    Over the last couple of years, we've gotten many deep 
insights as to how the process works, but again, for the big-
data aspect, each of these experiments lasts about a 
millisecond. That's about 500 times faster than you can blink. 
And it provides gigabytes of images, hence, the big data. 
Storing and transmitting such large amounts of data, which are 
arriving at ever-increasing rates, is a challenge for this 
vital public resource. I should say that the light sources 
themselves are well aware of this challenge. Giving more 
serious attention to such challenges requires funding agencies 
to adopt the right vision in terms of recognizing the need for 
fusion of data science with the specific applications.
    I also want to say that cybersecurity is widely understood 
to be an important problem with almost weekly stories about 
data leaks and hacking efforts. What's not quite so well 
understood is exactly how we're going to interface 
manufacturing with cybersecurity.
    So, in summary, I suggest that there are three areas of 
opportunity. First, federal agencies should continue to support 
the application of machine learning to advanced manufacturing, 
particularly for the qualification of new technologies and 
materials. I thank and commend all of my funders for supporting 
these advances and particularly want to call out the FAA for 
providing strong motivation here.
    In the future, research initiatives should also seize the 
potential for moonshot efforts on objectives such as 
integrating artificial intelligence capabilities directly into 
advanced manufacturing machines and advancing synergy between 
technologies such as additive manufacturing and robotics.
    Second, we need to continue to energize and revitalize STEM 
education at all levels to reflect the importance of the data 
in learning and computing with a focus on manufacturing. I 
myself have had to learn these things as I've gone along.
    Third, based on the evidence that machine learning is being 
successfully applied in many areas, we should encourage 
agencies to seek programs in areas where it's not so obvious 
how to apply the new tools and to instantiate programs in 
communities where data, machine learning, and advanced 
computing are not yet prevalent.
    Having traveled abroad extensively, I can assure you that 
the competition is serious. Countries that we used to dismiss 
out of hand, they're publishing more than we are and securing 
more patents than we do.
    Again, I thank you for the opportunity to testify and share 
my views on this vital subject. I know that we will all be glad 
to answer your questions.
    [The prepared statement of Dr. Rollett follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
   
    
    Chairman Weber. Thank you, Doctor. I now recognize myself 
for five minutes.
    This question is for all the witnesses. You've all used 
similar terminology in your testimonies like artificial 
intelligence, machine learning, and deep learning. So that we 
can all start off on the same page, I'll start with Dr. 
Kasthuri. But could you explain what these terms mean and how 
they relate to each other?
    In the interest of time, I'm going to divvy these up. Dr. 
Kasthuri, you take artificial intelligence. Dr. Yelick, you 
take machine learning. Dr. Nielsen, you take deep learning. All 
right? Doctor, you're up.
    Dr. Kasthuri. Thank you, Chairman Weber. That's an 
excellent question. In the interest of time I'm not going to 
speak about artificial intelligence. There are clearly experts 
sitting next to me. I'm interested in the idea of finding 
natural intelligence wherever we can, and I would say that the 
confusion that exists in these terminologies also exist when we 
think about intelligence beyond the artificial space. And I'm 
happy to--maybe perhaps after I let the other scientists speak 
to talk about how we define natural intelligence different 
ways, which might help elucidate the ways we define artificial 
intelligence.
    Chairman Weber. All right. Fair enough. Dr. Yelick, do you 
feel that monkey on your back?
    Dr. Yelick. Yes. Thank you very much for the question. So 
let me try to cover a little bit of all three. So artificial 
intelligence is a very long-standing subfield of computer 
science looking at how to make computers behave with humanlike 
behavior. And one of the most powerful techniques for some of 
the subproblems in artificial intelligence such as computer 
vision and speech processing are machine-learning algorithms. 
These algorithms have been around for a long time, but the 
availability of large amounts of labeled data and large amounts 
of computing have really made them take off in terms of being 
able to solve those artificial intelligence problems in certain 
ways.
    The specific type of machine learning is a broad class of 
algorithms that come from statistics and computer science, but 
the specific classes called deep learning algorithms, and I 
won't go into the details. I will defer that if somebody else 
wants to try to explain deep learning algorithms, but they are 
used for these particular breakthroughs in artificial 
intelligence.
    I would say that the popular press often equates the word 
artificial intelligence with the term deep learning because the 
algorithms have been so powerful, and so that can create some 
confusion.
    Chairman Weber. All right. Thank you. Dr. Nielsen?
    Dr. Nielsen. Yes, I'm not an expert in deep learning, but 
we are practitioners of deep learning at GE. And really it's 
taken off in, I would say, the last several years as we've seen 
a rise in big data. So we have nearly 300,000 assets spread 
globally and each one generating gigabytes of data. Now, 
processing that gigabytes of data and trying to make sense of 
it we're using deep learning techniques. It's a subfield, as 
you mentioned, of machine-learning algorithms but allows us to 
extract more information, more relationships if you will.
    So, for example, we use deep learning to help us build a 
computer model of a combined-cycle power plant, very complex 
system, very complex thermodynamics. And it's only because we 
have been able to collect now years and years of historical 
data and then process it through a deep-learning algorithm. So, 
for us, deep learning is a breakthrough enabled by advances in 
computing technology, advances in big-data science, and it's 
allowing us to build what we think is more complex models of 
not only our assets but the processes that they perform.
    Chairman Weber. And, Dr. Rollett, before you answer, you 
issued a warning quite frankly in your statement that there's 
been more patents filed by some of the foreign countries than 
we are. Do you attribute that to what we're talking about here? 
Go ahead.
    Dr. Rollett. In very simple terms, I think what I'm calling 
attention to is investment level in the science that underpins 
all kinds of things, so whether it be the biology of the brain, 
the functioning of the brain or how you make machines work, how 
you construct machines, control algorithms, so on, and so 
forth. That's really what I'm trying to get at.
    Chairman Weber. Okay.
    Dr. Rollett. And I'm trying to give you some support, some 
ammunition that what you're doing as a committee, set of 
Subcommittees is really worthwhile.
    Chairman Weber. Yes, well, thank you. I appreciate that.
    I'm going to move on to the second question. Several of you 
mentioned your reliance on DOE facilities, which is, again, 
what you're talking about, particularly light sources and 
supercomputing which we are focused on, have been to a couple 
of those for the types of big-data research that you all 
perform and my question is how necessary is it for the United 
States to keep up to date? You've already address that with the 
patents statement, a warning that you issued, but what I want 
to know is have any of you all--would you opine on who the 
nearest competitor is? And have you interfaced with any 
scientists or individuals from those companies? And if so, in 
what field and in what way? Doctor?
    Dr. Kasthuri. I would say that, internationally, sort of 
the nearest two competitors to us are Germany and China. And in 
general in the scientific world there is a tension between 
collaboration and competition independent of whether the 
scientist lives in America or doesn't live in America.
    I think the good news is that for us at least in 
neuroscience we realize that the scale of the problem is so 
enormous and has so much opportunity, there's plenty of food 
for everyone to eat. So right now, we live at the world of 
cooperation between individual scientists where we share data, 
share problems, and share solutions back and forth unless of 
course familiar with what happens at levels much higher than 
that.
    Chairman Weber. Thank you. Dr. Yelick?
    Dr. Yelick. Yes, in the area of high-performance computing 
I would say the closest competitor at this point is China. And 
in science we also like to look at derivatives, so what we 
really see is that China is growing very, very rapidly in terms 
of their leadership. At this point we do have the fastest 
computer and the top-500 list in the United States, but of 
course until recently that was the top two--the number-one and 
-three machines were from China. But perhaps more importantly 
than that there are actually more machines manufactured in 
China on that list than there are machines that are fractured 
in the United States, so there is a huge and growing interest, 
and certainly a lot of research, a lot of funding in China for 
artificial intelligence, machine learning, and all of that 
applied to science and other problems.
    Chairman Weber. Have you met with anybody from over in 
China involved in the field?
    Dr. Yelick. Yes. Last summer, I actually did a tour of all 
of the major supercomputing facilities in China, so I got to 
see what were the number-one and number-three machines at that 
time--and was very impressed by the scientists. I think one of 
the things that you see--and a lot of, by the way, very junior 
scientists, the students that they are training in these areas, 
they use these machines to also draw talent back to China from 
the United States or to keep talent that was trained in China 
in the United States. And they have very impressive people in 
terms of the computer scientists and computational scientists.
    Chairman Weber. And, Dr. Nielsen, very quickly because I'm 
out of time.
    Dr. Nielsen. Yes, I would just like to echo that, like Dr. 
Rollett, we follow publications and patents, and we're seeing a 
growing number from China, so I'd like to echo that just from 
that statement. We're seeing growing interest in the use of 
high-performance computing to go look at things like 
cybersecurity from China, so obviously, that's the number-one 
location we're looking at.
    Chairman Weber. Good. Thank you, Dr. Rollett. I'm happy to 
move on now. So I'm now going to recognize the gentlelady from 
Oregon for five minutes.
    Ms. Bonamici. Thank you very much, Mr. Chairman.
    What an impressive panel and what a great conversation and 
an important one.
    I represent northwest Oregon where Intel is developing the 
foundation for the first exascale machines. We know the 
potential of high-performance computing and all energy 
exploration, predicting climate weather, predictive and 
preventive medicine, emergency response, just a tremendous 
amount of potential. And we certainly recognize on this 
Committee that investment in exascale systems and high-
performance computing is important for our economic 
competitiveness, national security, and many reasons.
    And we know--I also serve on the Education Committee, and I 
know that our country has some of the best scientists and 
programmers and engineers, but what really sets our country 
apart is entrepreneurs and innovation. And those 
characteristics require creative and critical thinking, which 
is fostered through a well-rounded education, including the 
arts.
    I don't think anyone on this Committee is going to be 
surprised to hear me mention the STEAM Caucus, which is--I'm 
cochairing with Representative Stefanik from New York, working 
on integrating arts and design into STEM, learning to educate 
innovators. We have out in Oregon this wonderful organization 
called Northwest Noggin, which is a collaboration of our 
medical school, Oregon Health Sciences University, Portland 
State University, Pacific Northwest College of Art, and the 
Regional Arts and Culture Council. And they go around exciting 
the public about ongoing taxpayer-supported neuroscience 
research. And they're doing great work and expanding the number 
of people who are interested in science and also communicating 
with all generations and all people about the benefits of 
science.
    So, Dr. Rollett, in your testimony you talked about the 
role of data analytics across manufacturing--the manufacturing 
sector. And you noted that it's not necessarily going to be 
important for all data analytic workers to have a computer 
science degree, so what skills are most important for 
addressing the opportunities? You did say in your testimony 
that technology forces us to think differently about how to 
make things, so talk about the next manufacturing center at 
Carnegie Mellon and what you're doing to prepare students for 
evolving fields? And we know as technology changes we need 
intellectual flexibility as well, so how do you educate people 
for that kind of work?
    Dr. Rollett. So thank you for the opportunity to address 
that. The way that we're approaching that is telling our 
students don't be afraid of these new techniques. Jump in, try 
them, and lo and behold, almost every time they're trying it--
sometimes it's a struggle, but almost every time that they try 
it they're discovering, oh, this actually works. Even if it's 
not big data in quite the sense that, say, Kathy would tell us, 
even small data works.
    So, for example, in these powder bed machines you spread a 
layer. Well, if you just take a picture of that layer and then 
another picture and you keep analyzing it and you use these 
computer vision techniques, which are sort of a subset of 
machine learning, lo and behold, you can figure out whether 
your part is building properly or not. That's the kind of thing 
that we've got to transmit to all of our students to say it's 
not that bad, jump in and try it and little by little, you'll 
get there.
    Ms. Bonamici. I think over the years many students have 
been very risk-averse and they don't want to risk taking 
something where they might not get the best grade possible, so 
we have to work on overcoming that because there's so much 
potential out there until students have the opportunity to get 
in and have some of that hands-on learning.
    Dr. Yelick, I'm in the Northwest and it's not a question of 
if but when we have an earthquake off the Northwest coast, and 
a tsunami could be triggered of course by that earthquake along 
the Cascadia subduction zone. So in your testimony you discuss 
the research at Berkeley Lab to simulate a large magnitude 
earthquake, and I listened very carefully because you were 
talking about the effects on an identical building in different 
areas. This data could be really crucial as we are assessing 
the need for more resilient infrastructure not only in Oregon 
but across the country. So what technical challenges are you 
facing and sort of curating, sharing, and labeling and 
searching that data? And what support can the federal 
government provide to accelerate a resolution of these issues?
    Dr. Yelick. Well, thank you very much for the question. 
Yes, this is very exciting work that's going on, and simulating 
earthquakes is currently at a regional scale. There are 
technology challenges to trying to even get that to larger-
scale simulations, but I think even more importantly the work 
that I talked about is trying to use information about the 
geology to try to give you much more precise information about 
the safety of a particular location.
    And the challenge is to try to collect this data and then 
to actually invert it, that is turn it into a model so you 
collect the data and then in some sense you're trying to 
develop a set of equations that say how that area--based on the 
data that's been collected from little tiny seismic events, 
it'll tell you something about how that particular subregion, 
even a yard or a city block or something like that, how that 
city block is going to behave in an earthquake. And you can use 
the information from tiny seismic events and then to infer how 
it will behave in a large significant earthquake. And so 
there's technical challenge, mathematical challenges of doing 
that, as well as the scale of computing for both doing the 
data, inverting the data but also then doing the simulation.
    And I think you bring up a very good point about the 
community needs for these community data sets because you 
really want to make it possible for many groups of people, not 
just, for example, a power company that has smart meter data 
but for other people to access that kind of data.
    Ms. Bonamici. Thank you. And I want to follow up with that. 
I'm running out of time, but as we talk about infrastructure 
and investment in infrastructure, we know that by making better 
decisions at the outset we can save lives and save property, so 
the more information we have about where we're building and how 
we're building is going to be a benefit to people across this 
country, as well as in northwest Oregon. So thank you again to 
this distinguished panel. I yield back.
    Chairman Weber. Thank you, ma'am.
    The gentlelady from Virginia, Mrs. Comstock, is recognized.
    Mrs. Comstock. Thank you, Mr. Chairman, and thank all of 
you here. This has been very interesting once again.
    Now, I guess I'd ask to all of you, what are the unexamined 
big-data challenges that could benefit from machine learning? 
And what are the consequences for the United States for not 
being the world leader in that if we aren't going forward in 
the future? Maybe, Dr. Rollett, if you'd like to start. You 
look like you had an answer ready to go, so----
    Dr. Rollett. I'll give you a small example from my own 
field. So when we deal with materials, then we have to look 
inside the materials. So we typically take a piece of steel and 
we cut it and we polish it and we take pictures of it. So 
traditionally, what we've done is play the expert witness as it 
were. You look at these pictures, which I often say resemble 
more of a Jackson Pollock painting than anything that remotely 
as a simple as a cat, and so the excitement in our field is 
that we now have the tools that we can start to tease things 
out of these pictures, that we go from something where we are 
completely dependent on sort of gray-bearded experts to let the 
computer do a lot of the job for you. And that speeds things up 
and it automates them and it allows companies to detect 
problems that they're running across. So it's just one example.
    Dr. Kasthuri. Congresswoman Comstock, thank you for the 
question. I have two sort of answers specifically to thinking 
about brains and then to thinking about education. I think 
these are the potential things that we can lose. One of the 
things that I find fascinating about how our brains work is 
that whether you are Einstein thinking up relativity or Mozart 
making a concerto or you're just at home watching reality TV, 
all brains operate at about 20 watts of energy. These light 
bulbs in this room are probably at 60 watts of energy. And 
although you might already think some of your colleagues are 
dim bulbs, in this sense, what's amazing about the things that 
they can accomplishes that they accomplish them at energy 
efficiencies that are currently unheard of for any type of 
algorithm.
    So I feel like if we can leverage machine learning, deep 
analytics, and understand how the brain passes information and 
processes information for energies that are really energy 
efficiencies unheard of in our current algorithms and robots, 
that's a huge benefit to both the national and economic 
securities of our country. That's the first.
    And the second thing I'd like to add, the other reason that 
it's important for us to lead now--and I'll do it by example--
is that in 1962 at Rice University John F. Kennedy announced 
that we were going to the moon. And he announced it and in his 
speech he said we're going to go to the moon--and I 
paraphrase--not because it's easy but because it's hard and 
because hard things test our mettle and test our capabilities.
    The other interesting fact about that is that in 1969 when 
we landed on the moon, the average age of a NASA scientist was 
29 years old, so quick math suggests that when Kennedy 
announced the moonshot, many of these people were in college. 
They were students. And there was something inspirational about 
positing something difficult, positing something visionary. And 
I suspect that this has benefited us--in recruiting this 
generation of scientists to the moonshot has benefited this 
country in ways that we yet haven't calculated. And I suspect 
that if we don't move now, we lose both of these opportunities, 
among many others.
    Mrs. Comstock. So it's really a matter of getting that 
focus and attention and commitment so that you have that next 
generation understanding this is really a long-term investment, 
and we have a passion for it, so they will.
    Dr. Kasthuri. Exactly.
    Dr. Yelick. I'll just add briefly that I think we really 
want to--in terms of the threat associated with this is really 
about continuing to be a leader in computing but also about the 
control and use of information. And you can see the kinds of 
examples we've given are really important, and you hear about 
it in the news about the control and use of information. We 
need leaders in understanding how to do that and make sure that 
information is used wisely.
    We teach our freshmen at Berkeley a course in data science, 
so whether they're going to go off and become English majors or 
art majors or engineers, we think it's really important for 
people to understand data.
    Dr. Nielsen. And just real briefly, I'd like to build a 
little bit on Dr. Rollett's comments. For us, we're seeing 
tremendous benefit in big data for things like trying to better 
predict when an aircraft engine part has to be repaired, when 
it needs to be inspected, very critical for the safety of that 
engine. For gas turbines, same thing. Wind parts need to be 
inspected and repaired.
    So where does big data come in? It comes in with 
computational fluid dynamics, which we leverage--actually, the 
high-performance computing infrastructure of the United States 
materials science, material knowledge, trying to understand 
grain structure, et cetera. So for us, that nexus of the 
digital technologies with the physics, understanding the 
thermodynamics of our assets are leading us into what I think 
is just a better place to be from maintenance scheduling, 
safety, resiliency, et cetera.
    Mrs. Comstock. Thank you. I really appreciate all of your 
answers.
    I yield back, Mr. Chairman.
    Chairman Weber. The gentleman from Virginia, Mr. Beyer, is 
recognized for five minutes.
    Mr. Beyer. Mr. Chairman, thank you very much, and thank you 
all very much for doing this.
    Dr. Kasthuri, so on the BRAIN Initiative I think obviously 
the most--maybe the most exciting thing happening in the world 
today, I was fascinated by this whole notion of the Connectome, 
1 billion neurons with 1 quadrillion connections, you talk 
about it being if you took--of all the written material in the 
world into one data set, it'd just be a small fraction of the 
size of this brain map. Is it possible that it's simpler than 
that, that it sort of strains my understanding that there are 
few things in nature that are as complex as that. Why in 
evolution have we developed something that--and every human 
being on the planet has a brain that's already--contains more 
connections than every bit of written material?
    Dr. Kasthuri. Congressman Beyer, that's a great question, 
and like most scientists I'm going to do a little bit of 
handwaving and a little bit of conjecture because the question 
that you're asking is the question that we are trying to 
accomplish. We know reasonably well that there are, as you 
said, 100 billion brain cells, neurons, that make on order 1 
quadrillion connections in the brain. Now, that--when I say the 
data of that, I'm really talking about the raw image data. What 
will it take to take a picture of every part of the brain and 
if you added up all the data of all those pictures together, it 
would be the largest data set ever collected.
    Now, I suspect we have to do that at least once and then it 
might be possible that there are patterns within that data that 
then simplify the next time that we have to map your brain. One 
way to think about this is that before we had a map of DNA, we 
didn't realize that there was a pattern within DNA, meaning 
every three nucleotides--A, C, T, et cetera--codes for a 
protein. And that essentially simplifies the data structure to, 
let's say, 1/3. I don't need to know, I just need to know that 
these three things are an internal pattern that then gets 
repeated again and again and again. And that was a fundamental 
insight. We have no similar insight into the brain. Is there a 
repetitive pattern that would actually reduce the amount of 
data that we had to collect?
    So, you're right, it might be that the second brain or the 
third brain isn't going to be that much data, but now let me 
give you the counter because as a scientist I have to do both 
sides or all sides. The other thing we know is that each human 
brain is unique, very much like a snowflake. Your brain, the 
connectivity, the connections in your brain at some level have 
to represent your life history, what your brain has 
experienced.
    And so the question for me--and I think it's really one of 
the most important questions--is even within the snowflake 
there are things that are unique to snowflakes but they're the 
same. They either have seven arms are eight arms or six arms. I 
get them confused with spiders, but it's one of those is the 
answer. So there's regularity in a snowflake at the level of 
the arms, but there is uniqueness at the level of the things 
that jut out of the seven arms of the snowflake. And the 
fundamental question is what is unique, what is the part that 
makes each of us a neurological snowflake and what is common 
between all of us? And that would be one of the very first 
goals of doing a map is to discover the answer to your 
question.
    Mr. Beyer. Yes, well, thank you for a very thoughtful 
answer. And I keep coming back to the Einstein notion that 
always looking for the simplest answers, things that unify it 
altogether. So here's another simple question. You talked in 
your very first paragraph about reverse engineering human 
cognition into our computers, good idea? At our most recent AI 
hearing here a lot of the controversy was, you know, dealing 
with Elon Musk and others and their concerns about what happens 
when consciousness emerges in machines.
    Dr. Kasthuri. Again, a fantastic question. Here's my 
version of an answer. We deal with smarter things every day. 
Many of our children, especially mine, wind up getting 
consciousness and being smarter than us, certainly smarter than 
me, but yet we don't worry about the fact that this next 
generation of children, forever the next generation of children 
will always be smarter than us because we've developed ways as 
a society to instill in them the value systems that we have. 
And there are multiple avenues for how we can instill in our 
children the value systems that we have.
    I suspect we might use the same things when we make smart 
algorithms, the same way we make smart children. We won't just 
produce smart algorithms but we'll instill in them the values 
that we have the same way that we instill our values in our 
children.
    Now, that didn't answer your question of whether reverse 
engineering the brain is a specific good idea for AI or not. 
The only thing I would say is that no matter what we can 
imagine AI--artificial intelligence doing, there is a 
biological system that does that at more energy efficiency and 
its speed for which that AI physical silicon system does not. 
But I suspect these answers are probably best debated amongst 
you and then you could tell us.
    Mr. Beyer. Well, that was a very optimistic thing. I want 
to say one of the things we do is we keep the car keys in those 
circumstances.
    Mr. Chairman, I yield back.
    Chairman Weber. Thank you. The gentleman from Kansas is 
recognized for five minutes.
    Mr. Marshall. Well, thank you, Mr. Chairman.
    Speaking of Kansas, I'm sure you all remember President 
Eisenhower is the one who started NASA in 1958, but it was 
President Kennedy, as several of you have stated, that, you 
know, gave us the definitive goal to get to the moon. And as a 
young boy I saw that before my eyes, the whole country wrapped 
around that.
    Each of you get one minute. What's your big, hairy, 
audacious goal, your idea, it took 11 years, '58 to '69 to get 
to the Moon. Where are we going to be in 11 years? Dr. Rollett, 
we'll start with you and you each get one minute.
    Dr. Rollett. I think we're going to see that manufacturing 
is a much more clever operation. It understands the materials. 
It understands how things are going to last, and it draws in a 
much wider set of disciplines than it currently does. I have to 
admit I don't exactly have an analogy to going to the moon, but 
that's a very good challenge.
    Mr. Marshall. What I like about your idea is that's going 
to add to the GDP. Our GDP grows when we become more efficient, 
not when federal government sends dollars to States for social 
projects, so I love adding to GDP.
    Dr. Nielsen, I guess you're next.
    Dr. Nielsen. So I would love it if every one of our 
assets--and I mentioned there are about 300,000 globally--had 
their own digital twin, so every aircraft engine had its own 
digital twin. A digital twin is a computer model that when the 
asset is operating, we're collecting data. So imagine an 
aircraft engine taking off. As soon as that aircraft engine 
takes off, we pull the data back from the aircraft engine and 
we update the computer model. That computer model becomes a 
digital twin of the physical asset. If every one of our 
300,000-plus assets had a digital twin, we'd be able to know 
with very good precision when it needed to be maintained, when 
it needed to be pulled off wing, what kind of repairs when it 
went to a repair shop, what kind of repairs need to occur.
    Mr. Marshall. You can do that with satellites and a whole 
bunch of things.
    Dr. Nielsen. We can pull back data from a whole variety of 
different pathways. It's then utilizing that data in the most 
efficient way, which we use machine learning and AI-type 
technologies----
    Mr. Marshall. Maybe get internet to rural places by doing 
that, right?
    Dr. Nielsen. Yes.
    Mr. Marshall. Okay. We better go on. Dr. Yelick?
    Dr. Yelick. So I think one of the biggest challenges is 
understanding the microbiome and being able to use that 
information about the microbiome in both health applications 
and agriculture, in engineering, materials, and other areas.
    So I think that we already know that your microbiome, your 
own personal microbiome is associated with things like obesity, 
diabetes, cardiovascular disease, and many other disorders. We 
don't understand it as well in agriculture, but we're looking 
at things like taking images of fields, putting biosensors into 
the fields and putting all this information together to 
understand how to make--to improve the microbiome to improve 
crop yield and reduce other problems. So I think it's about 
both understanding and controlling the microbiome, which is a 
huge computational problem.
    Mr. Marshall. Okay. Dr. Kasthuri?
    Dr. Kasthuri. The thing I would really like to have done in 
11 years is understand how brains learn. And actually it 
reminds me of something that I should've said earlier about the 
differences between artificial intelligence, machine learning, 
deep learning, and how brains learn. The main difference is 
that for many of these algorithms you have to provide them 
thousands of examples, millions of examples, billions of 
examples before they can then produce inferences or predictions 
that are based on those examples.
    For those of you with children, you know that that's not 
the way children learn. They can learn in one example. They can 
learn in half an example. Sometimes I don't even know where 
they're learning these things. And when they learn something, 
they learn not only the very specific details of that thing, 
they can immediately abstract it to a bunch of other examples.
    For me, this happened with my son the first time he learned 
what a tiger was. An image of a tiger he could see, and then as 
soon as he learned that, he could see a cartoon of a tiger, he 
could see a tiger upside down, he could see the back of a tiger 
or the side of a tiger, and from the first example be able to 
infer, learn all of these other general applications.
    If in 11 years we could understand how the brain does that 
and then reverse engineer that into our algorithms and our 
computers and robots, I suspect that will influence our GDP in 
ways that we hadn't yet imagined.
    Mr. Marshall. Okay. Thank you so much. I yield back.
    Chairman Weber. I thank the gentleman.
    The gentleman from the great State of Texas is recognized.
    Mr. Veasey. Thank you, Mr. Chairman.
    Dr. Rollett, am I pronouncing that right?
    Dr. Rollett. It'll do.
    Mr. Veasey. Okay. In your testimony you talk about the huge 
amounts of data that are generated by experiments using light 
sources to examine the processes involved in additive 
manufacturing. You also highlight the need for more advanced 
computing algorithms to help researchers extract information 
from this data. And you state that we are essentially building 
the infrastructure for digital engineering and manufacturing. I 
was hoping that you'd be able to expand on that a little bit 
and tell us also what are the necessary components of such 
infrastructure.
    Dr. Rollett. Right. So one of the things that I didn't have 
time to talk about is where does the data go? And so, you know, 
one's generating terabytes, the standard story is you go to a 
light source, you do an experiment, all of that data has to go 
on disk drives, and then you literally carry the disk drives 
back home. So despite the substantial investments in the 
internet and the data pipe so to speak, from the perspective of 
an experiment, it's still somewhat clumsy. So even that 
infrastructure could do with some attention.
    It's also the case that the algorithms that exist have been 
developed for a fairly specialized set of applications. So, you 
know, the deep-learning methods, they exist, and what we're 
doing at the moment is basically borrowing them and applying 
them everywhere that we can. But, in other words, we haven't 
gone very far with developing the specialized techniques or the 
specialized applications.
    So even that little movie that I showed, to be honest, I 
mean, the furthest that we've got is doing very basic analysis 
so far, and we actually need cleverer, more sophisticated 
algorithms to analyze all of that information that's latent in 
those images. I know that sounds like I'm not doing my job, 
but, I'm just trying to get some idea across of the challenges 
of taking techniques that have been worked up and then taking 
them to a completely different domain and doing something 
worthwhile.
    Mr. Veasey. I was also hoping that you'd be able to 
describe the progress your group has made in teaching computers 
to recognize different kinds of metal power--powders using----
    Dr. Rollett. Powders.
    Mr. Veasey. --additive manufacturing. I think that you----
    Dr. Rollett. Right.
    Mr. Veasey. --go on to say that these successes have the 
potential to impact improvements to materials, as well as the 
generation of new materials. And I hope--was hoping you could 
talk about that a little bit more and for the ability of a 
computer to recognize different types of metal and improvements 
to materials and how that can impact the development of new 
materials.
    Dr. Rollett. So thank you for the question. So I was trying 
to think of a powder--I mean, think of talcum powder or 
something like that. You spread some on a piece of paper and 
you look at it and you think, well, that powder looks much like 
any other powder. It looks like something you would use in the 
garden or whatever. So the point I'm trying to get across is 
that when you take these pictures of these materials, one 
material looks much like another. However, when you take 
pictures with enough resolution and you allow these machine-
learning algorithms to work on them, then what you discover is 
they can see differences that no human can see.
    So it turns out that you can use the computer to 
distinguish powders from different sources, different 
materials, so on and so forth. And that's pretty magic. That 
means that you can again, if you're a company and you're using 
these powders, you can detect whether you've got--you know, if 
somebody's giving you what's supposed to be the same powder, 
you can analyze it and say, no, it's not the same powder after 
all. So there's considerable power in that.
    Another example is things break, they fracture, and you 
might be surprised, but there's quite a substantial business in 
analyzing failures. You know, bicycles break and somebody has 
to absorb the liability. Bridges crack; somebody has to deal 
with that. Well, that's another case where the people involved 
look at pictures of these fracture surfaces and they make 
expert judgments.
    So one of the things we're discovering is that we can 
actually, again, use some of the computer vision techniques to 
figure out if this fracture is a different kind of fracture or 
this is a different fatigue failure that's occurred. Again, 
it's magic. It opens up--not eliminating the expert, not at 
all. The analogy is with radiography on cancers. It's helping 
the experts to do a better job, to do a faster job, to be able 
to help the people that they're working for.
    Mr. Veasey. Thank you very much. I appreciate that.
    And, Mr. Chairman, I yield back.
    Chairman Weber. Thank you, sir.
    The gentlelady from Arizona is now recognized.
    Mrs. Lesko. Thank you, Mr. Chairman.
    I have to say this Committee is really interesting. I learn 
about all types of things and people studying the brains. I 
think we're going to hear about flying cars sometime soon, 
which is exciting. I'm from Arizona, and the issues that are 
really big in my district, which are the suburbs of Phoenix 
mostly, are actually national security and border security. And 
we have two border ports of entry connecting Mexico and 
Arizona, and I have the Luke Air Force Base in my Congressional 
district. And so I was wondering if you had any ideas how 
machine learning, artificial intelligence are being used in 
border security and national security. If you have any 
thoughts?
    Dr. Yelick. Well, I can say generally speaking that in 
national security, like in science, you're often looking for 
some signal, some pattern in very noisy data. So whether you're 
looking at telephones or you're looking at some other kind of 
collected information, you are looking for patterns. And 
machine learning is certainly used in that.
    I'm not aware in border security of the current 
applications of machine learning. I would think that things 
like face-recognition software would probably be useful there, 
and I just don't know of the current applications.
    Dr. Nielsen. So I know some of the colleagues at our 
research center are exploring things like security, using 
facial recognition but trying to take it a step further, so 
using principles of machine learning, et cetera, trying to 
detect the intent of a person. So they'll use computer vision, 
they'll watch a group of individuals but try to infer, make 
inferences about the intent of what that group is doing. Is 
there something going to happen? Who is in charge of this 
group? What are they trying to do?
    And they're working with the Department of Defense on many 
of these applications. And I think there's going to be 
tremendous breakthroughs where artificial intelligence and 
machine learning are going to help us not only recognize people 
but also trying now to recognize the intent of what that person 
is trying to do.
    Dr. Rollett. And you mentioned an Air Force Base, so 
something that maybe not everybody's aware of is that the 
military operates very old vehicles, and they have to repair 
and replace a lot. And that means that manufacturing is not 
just a matter of delivering a new aircraft; it's also a matter 
of how you keep old aircraft going. I mean, think of the B-52s 
and how old they are.
    And so there are very important defense applications for 
machine learning, for manufacturing, and manufacturing in the 
repair-and-replace sense. And again, when you're running old 
vehicles, you're very concerned about outliers, which hasn't 
come up very much so far today, but taking data and recognizing 
where you've got a case that's just not in the cloud, it's not 
in with everybody else and figuring out what that means and how 
you're going to deal with it.
    Mrs. Lesko. Anyone else? There's one person left.
    Dr. Kasthuri. Of course, yes. It's me. So of course my work 
doesn't deal directly with either border security or national 
security, but just to echo one other sentiment, one of the 
things I'm interested in is that, as our cameras get faster, 
instead of taking 30 shots per second, we can now take 60 shots 
per second, 90 shots per second, 120 frames per second usually, 
and you start watching people's facial features as they are 
just engaging in normal life. It turns out that we produce a 
lot of microfacial features that happen so fast and so quick 
that they often aren't detected consciously by each other but 
convey a tremendous amount of information about things like 
intent and et cetera.
    I suspect that, as our technology, as our cameras get 
better and of course if you take 120 pictures in a second 
versus 30 pictures in a second, that's already four times more 
data that you're collecting per second. If we can deal with the 
data and get better cameras, we will actually be making 
inferences about intentions sooner rather than later.
    Mrs. Lesko. Very interesting. I'm glad that you all work in 
these different fields.
    And I yield back my time, Mr. Chairman.
    Chairman Weber. Thank you, ma'am.
    The gentleman from Illinois, Mr. Foster, is recognized.
    Mr. Foster. Thank you, Mr. Chairman. And thank you to our 
witnesses.
    And, let's see, I guess I'll start with some hometown 
cheerleading for Argonne National Lab, which--and I find it 
quite remarkable. Argonne lab has been--they've come out to 
events that we've had in my district dealing with the opioid 
crisis, I find it incredible that one single laboratory--we 
have everything from using the advanced photon source and its 
upgrades to directly image what are called G-coupled protein 
receptors at the very heart of the chemical interaction with 
the brain all the way up through modeling the high-level 
function of the brain, the Connectome, and everything in 
between. And it's really one of the magic things that happens 
at Argonne and at all of the--particularly the multipurpose 
laboratories, which are really gems of our country.
    Now, one thing I'd like to talk about--and it relates to 
big data and superconducting--is that you have to make a bunch 
of technological bets in a situation where the technology is 
changing really, really rapidly. You know, for example, you 
have the choice of--for the data pipes, you can do 
conventional, very wide floating point things for partial 
differential equations and equations of state, things like 
that, the way supercomputing has been done for years, and yet 
there's a lot of movement for artificial intelligence toward 
much narrower data paths, you know, 8 bits or even less or 1 
bit if you're talking about simulating the brain firing or not.
    You know, you have questions on the storage where you can 
have--classically, we have huge external data sets, you know, 
like the full geometry of the brain that you will then use 
supercomputing to extract the Connectome. Or now we're seeing 
more and more internally generated data sets like these are 
games playing each other where you just generate the data, 
throw it away. You don't care about storage at all. Or 
simulation of billions of miles of driving where that data 
never has to be stored at all, and so that really affects the 
high-level design of these machines.
    In Congress, we have to commit to projects, you know, on a 
sort of five-year time cycle when every six months there are 
new disruptive things. We have to decide are these largely 
going to be front ends to quantum computing or not? And so how 
do you deal with that sort of, you know, internally in your 
planning? And should we move more toward the commercial model 
of move fast, take risks, and break things, or do we have--are 
our projects that we have to approve in Congress things that 
have to have no chance of failing? And do you think Congress is 
too far on one side or the other of that tradeoff?
    Dr. Yelick. I guess as a computer scientist maybe I'll 
start here and I would say that you've asked a very good 
question. I think this issue of risk and technology is very 
important, and we do need to take lots of risks and try lots of 
things, especially right now as not only are processors not 
getting any faster because of the end of Dennard scaling, but 
we're facing the end of Moore's law, which is the end of 
transistors getting denser on a chip. And we really need to try 
a number of different things, including quantum, neuromorphic 
computing, and others.
    The issue of even the design of computers, if we look at 
the exascale computing program, very important. Of course, the 
first machine targeted for Argonne National Lab is in 2021, and 
the process that is really fundamental to the exascale project 
is this idea of codesign, that is, bringing together people who 
understand the applications like Tony and with the people that 
understand the applied mathematics, and people that understand 
the computer architecture design.
    And the exascale program is looking at both applying 
machine-learning algorithms for things like the Cancer 
Initiative, as well as the microbiome where you also have these 
very tiny datatypes, only four characters that you can store in 
maybe two bits, and putting all of that together. So those 
machines are being codesigned to try to understand all those 
different applications and work well on the traditional high-
performance simulation applications, as well as some of these 
new data-analysis problems.
    To answer your question directly, I think that, if 
anything, that project is very focused on that goal of 2021, 
and some other machines will come after that in '22 and '23. 
And the application--so it's not just about delivering the 
machines; it's about delivering 25 applications that are all 
being developed at the same time to run on those machines.
    It is a very exciting project. I actually lead the 
microbiome project in exascale, and I think it's a great amount 
of fun. But it is a project that doesn't have much room for 
risk or basic research, and so I do think it's very important 
to rebuild the fundamental research program, for example, the 
Department of Energy to make sure that ten years from now we 
could have some other kind of future program that we would have 
the people that are trained in order to answer those basic 
questions and figure out how to build another computing device 
of some kind.
    Mr. Foster. Well, yes, thank you. That was a very 
comprehensive answer. But if you could just in my last one 
second here just sort of--do you think Congress is being too 
risk-averse in our expectations or, you know, should we be more 
risk-tolerant that allow you occasionally to fail because you 
made a technological bet that is--you know, that has not come 
through?
    Dr. Yelick. You know, I think I'll answer that from the 
science perspective. As a scientist, I absolutely want to be 
able to take risks and I want to be able to fail. I think the 
Congressional question I will leave to you to debate.
    Mr. Foster. Thank you. I yield back.
    Chairman Weber. Thank you.
    The gentleman from California, Mr. Rohrabacher, is 
recognized.
    Mr. Rohrabacher. Thank you very much, Mr. Chairman.
    I wanted to get into some basics here. This is for the 
whole panel. Who's going to be put out of work because of the 
changes that you see coming as we do what's necessary to fully 
understand what you're doing scientifically? Who's going to be 
put out of work?
    Dr. Rollett. I hope very much that nobody's going to be put 
out of work.
    Mr. Rohrabacher. Oh, you've got to be kidding. I mean, 
whenever there's a change for the better, I mean, otherwise, 
we'd have people working in----
    Buggy whips would still be----
    Dr. Rollett. Yes. I think the point here is to sustain 
American industry at its most sophisticated and competitive 
level.
    Mr. Rohrabacher. What professions are going to be losing 
jobs? You're making me--I mean, everybody's afraid to say that. 
Come on, you know?
    Dr. Rollett. I would say they've mostly been lost. I mean, 
if you look at steel mills, we have steel mills. They used to 
run with 30,000 people.
    Mr. Rohrabacher. Right.
    Dr. Rollett. That's why the population of Pittsburgh was so 
large years ago, right? It's decreased enormously----
    Mr. Rohrabacher. Okay. Well, where can we expect that in 
the future from this new technology or this new understanding 
of technology? Anybody want to tell me?
    Dr. Kasthuri. I have a very quick----
    Mr. Rohrabacher. Don't be afraid now.
    Dr. Kasthuri. I have a very quick answer. Historically, a 
lot of science is done on getting relatively cheap labor to 
produce data and to analyze data, by that I mean graduate 
students, postdoctoral fellows, young assistant professors, et 
cetera. I suspect----
    Mr. Rohrabacher. So they're not going to be needed 
probably?
    Dr. Kasthuri. Well, I suspect that they should still be 
trained but then perhaps that they won't be used specifically 
in just laboriously collecting data and analyzing data.
    Mr. Rohrabacher. Okay. So let's go through that. Where are 
the new jobs going to be created? What new jobs will be created 
by the advances that you're advocating and want us to focus 
some resources on?
    Dr. Kasthuri. I'm hoping that when the people who are 
trained in science no longer have to do all of that work, they 
do--they then expand into other fields that could use 
scientific education like the legal system or Congress.
    Mr. Rohrabacher. But what specifically can we look at, say, 
that will remind Congressmen always to turn off the ringer even 
when it's their wife? Now, I'm in big trouble, okay? Tell me--
so, what jobs are going to be created? What can we expect from 
what your research is in the future? Do you have a specific job 
that you can say this--we're going to be able to do this, and 
thus, people will have a job doing it?
    Dr. Yelick. Well, I think there will be a lot more jobs in 
big data and data analysis and things like that and more 
interesting jobs I think going along with what was already 
said, that it's really about replacing--so if we replace taxi 
drivers with self-driving cars that eliminates a certain class 
of jobs but it'll----
    Mr. Rohrabacher. Okay. Well, there you go.
    Dr. Yelick. Right, but it allows people to then spend their 
time doing something more interesting such as perhaps analyzing 
the future of the transportation system and things like that.
    Mr. Rohrabacher. Well, but taxicab driver--finally, I got 
somebody to admit somebody's going to be hurt and going to have 
to change their life. And let me just note that happens with 
every bit of progress. Some people are left out and they have 
to form new type of lifestyles, and we need to understand that. 
Maybe we need to prepare for it as we move forward.
    What diseases do you think that--especially when we're 
talking about controlling things that are going on in the human 
mind, what diseases do you think that we can bring under 
control that are out of control now? Diabetes, obviously has 
something to do with the brain is telling the body what to do, 
different--maybe even cancer? What diseases do you think that 
we can have a chance of curing with this?
    Dr. Kasthuri. I think there's a range of neurological 
diseases that obviously we'll be able to do a better job curing 
or ameliorating once we understand the brain. These range from 
neurodegenerative diseases like Alzheimer's and Parkinson's to 
more mental illness, psychiatric illnesses and to even early 
developmental diseases like autism. I think all of these will 
absolutely be benefited by a better understanding----
    Mr. Rohrabacher. Then if we can control the way the brain 
is functioning, the maladies that you're suffering like I say 
diabetes and et cetera, that maybe we can tell the brain not to 
do that and once we have that deeper understanding.
    One last question. I got just a couple seconds. I remember 
2001 Hal got out of control and tried to kill these people. And 
Elon Musk is warning us. I understand somebody's already 
brought that up. But if we do end up with very independent-
minded robots, which is what I think we're talking about here, 
why shouldn't we think of that as a potential danger, as well 
as a potential asset? I mean, Elon Musk is right in that.
    Dr. Rollett. Well, I was going to throw in that I think one 
opportunity would be in health care and for example, the use of 
robots as assistants, so not replacing people but having robots 
help them. Well, those robots have to be programmed, they have 
to be built.
    Mr. Rohrabacher. Right.
    Dr. Rollett. I mean, there's a huge infrastructure that we 
don't have.
    Mr. Rohrabacher. Yes, but if you were building robots that 
can think independently, who knows--you know, and they're 
helping us in the hospitals or wherever it is, what if Hal gets 
out of control?
    Dr. Rollett. Right, right. So I think AI is being discussed 
mostly in the context of how do you do something? How do you 
make something work? When it comes to what these machines 
actually do, you also need supervision. And what I think we 
have to do is to build in AI that addresses control and 
evaluation, you know, the equivalent of the little guy on your 
shoulder saying don't do that; you're going to get into 
trouble. So you need something like that, which I haven't heard 
people talk about much.
    Mr. Rohrabacher. Okay. Well, thank you very much, Mr. 
Chairman. I yield back.
    Chairman Weber. You've been watching too many 
Schwarzenegger films.
    Mr. Rohrabacher. That's true.
    Chairman Weber. The gentleman yields back and, Mr. 
McNerney, you're recognized for five minutes.
    Mr. McNerney. I thank the Chairman. And I apologize to the 
panel for having to step in and out in the hearing so far.
    Mr. Nielsen, I'm a former wind engineer. I spent about 20 
years in the business. And I understand that the digital twin 
technology has allowed GE to produce--to increase production by 
about 20 percent. Is that right?
    Dr. Nielsen. About five percent on an average wind turbine, 
yes.
    Mr. McNerney. Five percent?
    Dr. Nielsen. Five percent, which is pretty amazing when you 
think we're not switching any of the hardware. It's just making 
that control system on a wind turbine much smarter using a----
    Mr. McNerney. And five percent is believable.
    Dr. Nielsen. Five percent----
    Mr. McNerney. Twenty percent for the wind farm----
    Dr. Nielsen. No--yes, it's five percent for----
    Mr. McNerney. Okay. Okay. I can believe that. As Chair of 
the Grid Innovation Caucus, I'm particularly interested in 
using new technology to create a smarter grid. We have things 
like the duck curve that are affecting the grid. How can all 
this technology improve grid stability and reliability and 
efficiency and so on?
    Dr. Nielsen. Yes, so we're now embarking on research for 
understanding how to better integrate disparate power sources 
together in regional, so imagine us trying to use AI machine 
learning, say, okay, I have a single combined-cycle power 
plant. How do I better optimize the efficiency of it, produce 
less emissions, use less fuel, allow more profit from it? But 
we're taking that now a step further and saying how do I then 
look regionally and integrating not only that combined-cycle 
power plant but the solar farm, the wind farm, et cetera? How 
do I balance that and optimize at a grid-scale level versus 
just a microscale level?
    So that's some of the research that's ongoing now. We're 
continuing to work on it. But that's our plan is to better 
figure out that macroscale optimization problem.
    Mr. McNerney. So, I mean, once you get that figured out, 
then you need to have some sort of a SCADA or control system 
that can dispatch and----
    Dr. Nielsen. Yes, correct.
    Mr. McNerney. Okay. So that's another product for GE or for 
the other----
    Dr. Nielsen. Yes. Correct.
    Mr. McNerney. Okay.
    Dr. Nielsen. We're figuring out how to not only build those 
optimization routines but how to then put them in what we call 
edge devices, the SCADA systems, the----
    Mr. McNerney. Sure.
    Dr. Nielsen. --unit control systems, et cetera. So it's not 
only trying to figure out the algorithm but making sure that 
algorithm can execute in a timescale that can be put into some 
of these, as you mentioned, SCADA systems and control systems.
    Mr. McNerney. Okay. Well, with the digital ghost, the--a 
power plant can replicate an industrial system and the 
component parts for cyber vulnerability. Is that right?
    Dr. Nielsen. So we use digital ghost at what we call the 
cyber physical layer. So imagine having a digital twin of a gas 
turbine. So that digital twin tells us how that gas turbine is 
behaving and should behave. We then compare to what signal is 
being generated, what sensors are being--signal's been 
generated, and we compare that behavior and say that behavior 
doesn't look right. Our digital twin says something's not 
correct. The thermodynamics aren't correct.
    Mr. McNerney. Well, I mean, I can see that for mechanical--
--
    Dr. Nielsen. Yes.
    Mr. McNerney. --systems. What about cyber?
    Dr. Nielsen. So what we're doing is we're not applying it 
at sort of the network layer. We're not watching network 
traffic. We're actually looking at the machine level and 
understanding if the machine is behaving as it should be given 
the inputs, the control signals, as well as the outputs, the 
sensors, et cetera. Some recent attacks look at replicating 
sensors----
    Mr. McNerney. So the same sort of behavior characteristics 
are going to be monitored--can tell you whether or not there's 
a cyber issue or some other sort of mechanical failure----
    Dr. Nielsen. Yes.
    Mr. McNerney. --impending?
    Dr. Nielsen. Perfect. It's a----
    Mr. McNerney. Very good.
    Dr. Nielsen. It's an anomaly detection scheme, yes.
    Mr. McNerney. Dr. Yelick, thank you for coming. And I 
visited your lab a number of times. It's always a pleasure to 
do so. I think you guys are doing some really good work out 
there.
    One of the things that was striking was the work you did on 
exascale computing, simulating a San Francisco earthquake and 
how striking that is. Do you think we have the collective use--
have we collectively used this information to harden our 
systems, to harden our communities against an earthquake, or is 
that something that is yet to happen?
    Dr. Yelick. That's something that is yet to happen. We're 
just starting to see some of this very detailed information 
coming from the simulations. And as I mentioned earlier, even 
bringing in more detailed data into the simulations to give you 
better geological information about the stability of a certain 
region or even a certain local area, a city block or whatever, 
and using that information is not something that is happening 
yet but obviously should be.
    Mr. McNerney. This is sort of a rhetorical question but 
somebody can answer it if you feel like. I know we hear about 
the social challenges of digital technology and AI and big 
data, you know, in terms of job displacement. Does AI tell us 
anything about that, about how we should respond to this 
crisis?
    Dr. Yelick. I don't know of any studies that have used AI 
to do that. People do use AI to understand the market, 
economics, and things like that, and I'm sure that people are 
using large-scale data analytics of various kinds, and they 
certainly are to understand changes in jobs and what will 
happen with them.
    It is, by the way, a very active area of discussion within 
the computer science community about both the ethics, which you 
heard about I think at previous hearing of AI, but also the 
issues of replacing jobs.
    Mr. McNerney. Sure. Dr. Rollett?
    Dr. Rollett. If I might jump in, I would encourage you to 
think about supporting research in policy and even social 
science to address that issue because AI displacing people is 
about education, it's about retraining, it's about how people 
behave. So we scientists are really at sort of the front end of 
this, but there's a lot of implications that are much broader 
than what we've talked about this morning.
    Mr. McNerney. All right. Thank you. Mr. Chairman, I yield 
back.
    Chairman Weber. Thank you, sir.
    The gentleman from Florida, Dr. Dunn, is recognized.
    Mr. Dunn. Thank you very much, Chairman Weber.
    And I want to add my thank you to the panel and underscore 
my personal belief in how important all of your work is. I've 
visited Dr. Bobby Kasthuri's lab, a great fan of your work and 
your energy level. Dr. Yelick, we'll be visiting you in the 
near future, so that'll be fun, too.
    I want to focus on the niche in big computing, which is 
artificial intelligence, and I apologize I missed that hearing 
earlier, but it was near and dear to my heart.
    I think we all see many potential benefits of artificial 
intelligence, but there are some potential problems, and I 
think it serves us to face those as we're having this virtual 
lovefest for artificial intelligence. You know, and we've known 
this since at least the '60s. I mean, the Isaac Asimov robotic 
novels and the robotic laws, the Three Laws of Robotics, which 
I have in my printout, the copies of in case anybody doesn't 
remember them. I bet this group does.
    But what I want to do is--I also, by the way, was looking 
for guides for artificial intelligence and I came up with the 
12 Boy Scout laws, too, so I don't know how that--so I want to 
offer some quotes and then get some thoughts from you, and 
these are quotes from people who are recognizably smart people. 
Stephen Hawking said, ``I think the development of artificial 
intelligence could spell the end of the human race.'' Elon 
Musk, quoted several times here, said, ``I think we should be 
very careful about artificial intelligence. If I were to guess 
what our biggest existential threat is, it's probably that.'' 
Bill Gates responded, ``I agree with Elon Musk and I don't 
understand why people are concerned.''
    And then finally, Jaan Tallinn, one of the inventors of 
Skype, said with ``strong and artificial intelligence, planning 
ahead is a better strategy than learning from mistakes.'' And 
went on to say, ``It really sucks to be the number-two 
intelligent species on the planet; just ask the gorillas.''
    So in everybody's handout you have a very brief summary of 
a series of experiments run at MIT on artificial intelligence. 
The first one was named Norman, which was an artificial 
intelligence educated on biased data, not false data but biased 
data and turned into a deeply sociopathic intelligence. There 
was another one Tay, which was really just an artificial 
intelligence Twitterbot, which they turned loose into the 
internet, and I think it wasn't the intention of the MIT 
researchers, but people engaged with Tay and tried to provoke 
it to say racist and inappropriate things, which it did. And 
there are some other experiments from MIT as well.
    So I want to note, like Dr. Kasthuri, I have sons that are 
more clever than I, but they are not virtual supermen, nor do 
they operate at the speed of light, so, you know, there's ways 
of working with them. I'm not so sure about that with 
artificial intelligence.
    My question first, what are the implications of a future 
where black-box machine learning, the process can't even be 
interpreted? You know, once it gets several layers in, we can't 
interpret it. What's the implications today on that to you, Dr. 
Kasthuri and Dr. Yelick, if I could?
    Dr. Kasthuri. Congressman Dunn, thank you for the kind 
words to start. And I actually suspect there is a reasonable 
concern that the things that we develop in artificial 
intelligence are different than the other things like our 
children because their ability to change is at the speed of 
computers as opposed to the speed of our own. So I agree that 
there's legitimate cause for concern.
    I suspect that we will have to come up with lessons and 
safeguards the same way that we've done with every existential 
crisis: the discovery of nuclear energy, the application to 
nuclear weapons. As humans, we do have some history of living 
on the edge and figuring out how to get the benefit of 
something and keep the risk at bay.
    You're right that if algorithms can change faster than we 
can think, our existing previous historical safeguards might 
not work.
    To the specific question that you asked about the non-
interpretability, for me, without knowing what the algorithm is 
producing, how do you innovate? If you don't know the 
fundamental nature of what the algorithm is--its principles for 
how it comes to a conclusion, I worry that we won't be able to 
innovate on those results.
    And this is interestingly perhaps as a thought exercise: 
What if a machine-learning algorithm could tell me--could 
make--could collect enough data to make a prediction about a 
brain, about your brain or someone else's brain that was 
incredibly accurate? Would we at that moment care how that 
machine-learning algorithm arrived at its conclusion? Or would 
we at that moment take the results that the algorithm produces 
and just go on with it, in which case there could be a missed 
opportunity for learning something deeply fundamental and 
principled about the brain.
    Mr. Dunn. And very quickly, Dr. Yelick.
    Dr. Yelick. Well, I agree with that. I think that these 
deep learning algorithms which have these multiple layers, 
which is why they're deep, they have millions perhaps of 
parameters inside of them. And we don't really understand when 
you get an answer out why all these parameters put together 
tell you that that's a cat and this one's not a cat. And so 
that may be okay if we're trying to figure out where to place 
ads as long as we give it unbiased data about where the place 
the ads so the right--so----
    Mr. Dunn. But it might be more problem if it was flying a 
drone swarm on attack some place?
    Dr. Yelick. Well, where it's a problem is if I'm a 
scientist, I want to understand why. It's not enough to say 
there's a correlation between these two things. And if the, you 
know, drone is flying in the right place, that's really 
probably the most important thing about some kind of a 
controlled vehicle. But in science, you want to----
    Mr. Dunn. We're dangerously close to being way, way, way 
over time, so I better yield back here, Mr.--thank you very 
much, though. I appreciate the chance.
    Chairman Weber. All right. The gentlelady from Nevada, Ms. 
Rosen, is recognized.
    Ms. Rosen. Thank you. I want to thank you for one of the 
most interesting, informative, and I want to say this is on the 
bleeding edge of everything that we need to worry about for 
sure.
    But one thing we haven't talked about is data storage. And 
data storage specifically is critical infrastructure in this 
country, right, because we have tons and tons of data 
everywhere, and where it goes and how we keep it is going to be 
of utmost importance.
    And so I know that we're trying to focus on that in the 
future, and in my district in Nevada we have a major data 
storage company. It has state-of-the-art reliability. We have 
lots of quality standards to ensure its data is secure, but 
like I said, we don't consider it critical infrastructure.
    So right now in this era of unprecedented data breaches, 
data hacks, every moment they are just pounding on us, in your 
view what are--the data storage centers that house the 
government and private sector, where are their vulnerabilities 
and what are the implications? How should we be sure that we 
classify them as critical infrastructure?
    Dr. Yelick. So, clearly, those data centers are storing 
very important information that should be protected. And, as 
you said, even at the computing centers that we run in the 
labs, there's a constant barrage of attacks, although we store 
at NERSC the center at Berkeley lab only scientific data, so it 
is not really critical data. I think that using these kinds of 
machine-learning techniques to look for patterns is one of the 
best mechanisms we have to prevent attack, and they do have to 
learn from these patterns in order to figure out what is--and--
what is abnormal behavior. And we're looking at--as we build 
out the next network, even kind of embedding that information 
into the network so that you can see patterns of attack even 
before they get to a particular data set or a particular 
computer system.
    Ms. Rosen. Thank you. I have one other question. And you 
were talking about using predictive analytics with a digital 
twin to talk about fatigue in planes. But how can we use that 
to discuss infrastructure fatigue as we talk about the 
infrastructure failures around this country in bridges, roads, 
ports, et cetera, et cetera? So----
    Dr. Rollett. That's I think a question of recognizing the 
need and talking to the agencies and finding out whether you 
consider there are adequate programs to do that. I'm going to 
guess that there is not a huge amount of activity, but I don't 
know, so that's why I'm being very cautious in my answer.
    But I suspect it's one of the opportunity areas. It's an 
area where there is data. It's often rather incomplete, but it 
would definitely benefit from having the techniques applied, 
the machine-learning techniques to try to find the patterns, to 
try to identify outliers, particularly trends that are not 
good.
    Ms. Rosen. Thank you.
    Dr. Nielsen. I would just----
    Ms. Rosen. Oh, please, yes. Yes.
    Dr. Nielsen. Oh, I'm sorry. I would just second the 
comments made. I mean, at GE we obviously focus a lot of our 
attention on the commercial assets that we build, but there's 
no reason the technologies, the ideas that are being applied 
there could be applied to bridges and infrastructure and all 
that.
    Ms. Rosen. Right.
    Dr. Nielsen. It's just, I think, a matter of will and 
policy to do that, right?
    Ms. Rosen. So I--do you think that would be well worth our 
time here in this Committee to promote those kinds of policies 
or research for you all or someone to do the--use the 
predictive analytics? Congresswoman Esty and I sit on some 
infrastructure committees, and really important that we try to 
find out points of failure before they fail, right?
    Dr. Rollett. Absolutely. And I would encourage you to bring 
state and local government into that discussion because they 
often own a lot of those assets.
    Ms. Rosen. Yes. Thank you. I yield back my time.
    Chairman Weber. The gentlelady yields back.
    The gentlelady from Connecticut is recognized.
    Ms. Esty. Thank you so much. And this is tremendously 
important for this Committee and for the U.S. Congress to be 
dealing with, and we really appreciate you taking the time with 
us today.
    All of you have mentioned somewhat in passing this critical 
importance of how are the algorithms structured and how are we 
going to embed the values if we have AI moving much faster than 
our brains can function or at least on multiple levels 
simultaneously?
    So we did have a hearing last month in talking about this, 
and one of the issues that came up that everyone supported--and 
I'd like your thoughts on that--is the critical importance of a 
diverse workforce in doing that. If you're going to try to 
train AI, it needs to represent the diversity of human 
experience, and therefore, it can't be like my son who did 
computer science in astrophysics. If they all look like that, 
if those are--the algorithms are all being developed by, you 
know, 26-year-olds like my son Thomas, we're not going to have 
the diversity of life experience.
    So, first, if you can quickly--because I've got a couple of 
questions--thoughts on how do we ensure that? Because we're 
looking at that issue. We talk about that diverse workforce all 
the time, but when we're looking at AI and algorithms, it 
becomes vitally important that we do this. It's not about 
checking the box to say the Department of Labor that we've got 
a diverse workforce. This is actually vital to what we need to 
do.
    Dr. Yelick. So if I can just comment on that. Yesterday, 
before I left UC Berkeley, I gave a lecture to the freshman 
summer class introductory computing class. My title was rather 
ostentatious as ``How to Save the World with Computing.'' What 
I find is that when you talk about the applications of 
computing and including data analytics and machine learning and 
real problems that are societal problems, you tend to bring in 
a much more diverse workforce. That class in particular has had 
over 50 percent women and a very good representation at least 
relative to the norm of underrepresented minorities as well.
    Ms. Esty. Anyone else who--I mean it--MIT has found that 
when they change the title of some of their computer science 
classes to again be applied in sort of more political and 
social realms, they had a dramatic change in terms of 
composition of classes.
    Dr. Nielsen. Yes, I would just quickly build upon that, 
too. I think to me when you look at AI and machine learning, 
you have to have a critical eye. You have to always be looking 
at it. And I think a diverse workforce and diverse experience 
can help just bring more perspectives to help critically 
question why are those algorithms doing what they're doing? 
What is the outcomes? How can we improve that? So I would 
support that supposition, yes.
    Dr. Yelick. I'll just mention that the name of the course--
which I was not teaching, by the way, I was giving a guest 
lecture--is ``The Beauty and Joy of Computing,'' so maybe that 
helps.
    Ms. Esty. Well, that helps. And if I could have you turn 
again--and some of you have mentioned the important role of 
federal research. I mean that's what this Committee is looking 
at, what is uniquely the federal role. As you see across the 
board, there's more and more effort and being engaged and we 
see it in space research and other places to move into the 
private sector with the notion the federal government is not 
very good at picking winners and losers. So if you can all talk 
about what you think are the most critical tasks for federal 
investment in, say, foundational and basic research that then 
will be developed by the GE's and others and companies not yet 
formed or conceived of because, again, that's part of our job 
is to figure out--I see it as our job to defend putting those 
basic research dollars in because we don't know where they're 
going to go but we do know they're vital to keep us, whether 
it's competitive or frankly just have better research and more 
care.
    Dr. Kasthuri. So perhaps I can go really quick. I suspect 
that there is a model of funding scientific research that's 
this idea that if you plant a million seeds in the ground, a 
few flowers will grow, where individual labs and individual 
scientists have the freedom to judge what is the next important 
question to address.
    And I can see why having the federal government decide the 
next important question to address might not be the most 
efficient way to push science forward. But where I do see the 
federal government really playing a role is in the level of 
facilities and resources, that what I imagine is that the 
federal government establishes large-scale resources and 
facilities like the national lab system and then allow 
individual scientists to promote their individual ideas but 
leveraging the federal resources. And I wonder if this is a 
compromise between allowing these seeds to grow but the federal 
government--maybe this is appropriate but maybe not--providing 
the fertilizer for those seeds.
    Ms. Esty. They think we generate a lot of it at least in 
this place.
    Dr. Yelick. So I would just add I think the importance of 
fundamental research, as well as the facilities and 
infrastructure and the applied mathematics, the computer 
science, statistics, very important in machine learning. And, 
as we said, these machine-learning algorithms have been used a 
lot in nonscientific domains. There's a lot of interest in 
applying them in scientific domains. I think the peer-review 
process in science will make machine learning better for 
everybody if we really put a lot of scrutiny on it.
    Dr. Rollett. And very quickly, I wanted to add that I think 
it's important that program managers in the federal government 
have some discretion over what they fund and take risks. And 
it's also important that the agencies have effective means of 
getting community input. And I don't want to name names, but 
some agencies have far more effective mechanisms for that than 
others.
    Ms. Esty. Well, we might want to follow up with that last 
point.
    And I wanted to just put out for you to help us with--and 
you mentioned it, Dr. Yelick, with--on peer review, this 
systematic--because of pressures to publish or perish and show 
success is we are not sharing the failures, which are 
absolutely essential for science to make progress. It's one of 
the issues we've touched on a lot in this Committee. We don't 
have any good answers, and it's gotten worse because of the 
pressures to do--to get grant money and to show progress. But I 
am deeply concerned about those pressures both from the private 
sector and the public sector making it harder for us--people 
hoard the, quote, ``bad results,'' but they're absolutely 
essential for us to learn from them.
    And so I don't know how we change that dynamic, but I think 
that is something that we could really use your thoughts on 
that because whether it's--AI can maybe help us with disclosing 
the dead ends and we learn from the dead ends and we move 
forward. But it is something that we have a big issue with in 
how we deal with the sharing of the not-useful results, which 
may turn out to be very useful down the line.
    Dr. Yelick. I completely agree with that. I think the first 
step in that is sharing the scientific data and allowing people 
to reproduce the successful results but also, as you said, 
examine the supposed failures to see--there are many examples 
of this in physics and other disciplines where people go back 
to data that may be 10 or 20 years old and find some new 
discovery in it.
    Ms. Esty. Thank you very much. I really appreciate your 
indulgence to keep us here to the bitter end. Thank you. Not 
the bitter, not you, just the fact that the bell has rung, and 
we had a lot of questions for you. We appreciate it. Thank you 
so much.
    Chairman Weber. After failing 1,000 times for the 
lightbulb, Dr. Edison, his staffer said doesn't that frustrate 
you? He goes, what are you talking about? We're 1,000 ways 
closer to success.
    So I thank the witnesses for their testimony and the 
Members for their questions. The record will remain open for 
two weeks for additional written comments and written questions 
from the Members.
    This hearing is adjourned.
    [Whereupon, at 12:08 p.m., the Subcommittees were 
adjourned.]

                               Appendix I

                              ----------                              


                   Answers to Post-Hearing Questions




                   Answers to Post-Hearing Questions
Responses by Dr. Bobby Kasthuri

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]


Responses by Dr. Katherine Yelick

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]


Responses by Dr. Matthew Nielsen

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]

Responses by Dr. Anthony Rollett

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]


                              Appendix II

                              ----------                              


                   Additional Material for the Record




            Documents submitted by Representative Neal Dunn
            
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]