[House Hearing, 117 Congress]
[From the U.S. Government Publishing Office]



 
                  PAPER MILLS AND RESEARCH MISCONDUCT:
                         FACING THE CHALLENGES
                        OF SCIENTIFIC PUBLISHING

=======================================================================

                                     
                                     

                                HEARING

                               BEFORE THE

                     SUBCOMMITTEE ON INVESTIGATIONS
                             AND OVERSIGHT

                                 OF THE

                      COMMITTEE ON SCIENCE, SPACE,
                             AND TECHNOLOGY

                                 OF THE

                        HOUSE OF REPRESENTATIVES

                    ONE HUNDRED SEVENTEENTH CONGRESS

                             SECOND SESSION

                               __________

                             JULY 20, 2022

                               __________

                           Serial No. 117-65

                               __________

 Printed for the use of the Committee on Science, Space, and Technology
 
 
 

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
                                     
                                     
                                     
                                     

       Available via the World Wide Web: http://science.house.gov
       
       
       
                      ______

             U.S. GOVERNMENT PUBLISHING OFFICE 
48-028             WASHINGTON : 2022       
       
       

              COMMITTEE ON SCIENCE, SPACE, AND TECHNOLOGY

             HON. EDDIE BERNICE JOHNSON, Texas, Chairwoman
ZOE LOFGREN, California              FRANK LUCAS, Oklahoma, 
SUZANNE BONAMICI, Oregon                 Ranking Member
AMI BERA, California                 MO BROOKS, Alabama
HALEY STEVENS, Michigan,             BILL POSEY, Florida
    Vice Chair                       RANDY WEBER, Texas
MIKIE SHERRILL, New Jersey           BRIAN BABIN, Texas
JAMAAL BOWMAN, New York              ANTHONY GONZALEZ, Ohio
MELANIE A. STANSBURY, New Mexico     MICHAEL WALTZ, Florida
BRAD SHERMAN, California             JAMES R. BAIRD, Indiana
ED PERLMUTTER, Colorado              DANIEL WEBSTER, Florida
JERRY McNERNEY, California           MIKE GARCIA, California
PAUL TONKO, New York                 STEPHANIE I. BICE, Oklahoma
BILL FOSTER, Illinois                YOUNG KIM, California
DONALD NORCROSS, New Jersey          RANDY FEENSTRA, Iowa
DON BEYER, Virginia                  JAKE LaTURNER, Kansas
CHARLIE CRIST, Florida               CARLOS A. GIMENEZ, Florida
SEAN CASTEN, Illinois                JAY OBERNOLTE, California
CONOR LAMB, Pennsylvania             PETER MEIJER, Michigan
DEBORAH ROSS, North Carolina         JAKE ELLZEY, TEXAS
GWEN MOORE, Wisconsin                MIKE CAREY, OHIO
DAN KILDEE, Michigan
SUSAN WILD, Pennsylvania
LIZZIE FLETCHER, Texas
                                 ------                                

              Subcommittee on Investigations and Oversight

                  HON. BILL FOSTER, Illinois, Chairman
ED PERLMUTTER, Colorado              JAY OBERNOLTE, California,
AMI BERA, California                   Ranking Member
GWEN MOORE, Wisconsin                STEPHANIE I. BICE, Oklahoma
SEAN CASTEN, Illinois                MIKE CAREY, OHIO

                         C  O  N  T  E  N  T  S

                             July 20, 2022

                                                                   Page

Hearing Charter..................................................     2

                           Opening Statements

Statement by Representative Bill Foster, Chairman, Subcommittee 
  on Investigations and Oversight, Committee on Science, Space, 
  and Technology, U.S. House of Representatives..................     9
    Written Statement............................................    10

Statement by Representative Jay Obernolte, Ranking Member, 
  Subcommittee on Investigations and Oversight, Committee on 
  Science, Space, and Technology, U.S. House of Representatives..    12
    Written Statement............................................    12

Written statement by Representative Eddie Bernice Johnson, 
  Chairwoman, Committee on Science, Space, and Technology, U.S. 
  House of Representatives.......................................    13

                               Witnesses:

Dr. Jennifer Byrne, Director of Biobanking, New South Wales 
  Health Pathology; Professor of Molecular Oncology, University 
  of Sydney
    Oral Statement...............................................    15
    Written Statement............................................    18

Mr. Chris Graf, Research Integrity Director, Springer Nature; 
  Chair of the Governance Board, STM Association Integrity Hub
    Oral Statement...............................................    25
    Written Statement............................................    27

Dr. Brandon Stell, Neuroscientist, French National Centre for 
  Scientific Research; President and Co-Founder, The PubPeer 
  Foundation
    Oral Statement...............................................    36
    Written Statement............................................    38

Discussion.......................................................    43

             Appendix I: Answers to Post-Hearing Questions

Dr. Jennifer Byrne, Director of Biobanking, New South Wales 
  Health Pathology; Professor of Molecular Oncology, University 
  of Sydney......................................................    56

Mr. Chris Graf, Research Integrity Director, Springer Nature; 
  Chair of the Governance Board, STM Association Integrity Hub...    59

Dr. Brandon Stell, Neuroscientist, French National Centre for 
  Scientific Research; President and Co-Founder, The PubPeer 
  Foundation.....................................................    63

            Appendix II: Additional Material for the Record

Documents submitted by Representative Bill Foster, Chairman, 
  Subcommittee on Investigations and Oversight, Committee on 
  Science, Space, and Technology, U.S. House of Representatives
    ``Neutron Production and Absorption in Uranium,'' H.L. 
      Anderson, E. Fermi, and Leo Szilard........................    66
    ``Uranium: Neutron Production and Absorption,'' B. Foster and 
      E. Perlmutter..............................................    71
    Letter, Allison C. Lerner, Inspector General, National 
      Science Foundation.........................................    74


                        PAPER MILLS AND RESEARCH


                    MISCONDUCT: FACING THE CHALLENGES


                        OF SCIENTIFIC PUBLISHING
                              ----------                              


                        WEDNESDAY, JULY 20, 2022

                  House of Representatives,
      Subcommittee on Investigations and Oversight,
               Committee on Science, Space, and Technology,
                                                   Washington, D.C.

    The Subcommittee met, pursuant to notice, at 10:02 a.m., in 
room 2318 of the Rayburn House Office Building, Hon. Bill 
Foster [Chairman of the Subcommittee] presiding.

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]


    Chairman Foster. Well, thank you. And this hearing will now 
come to order. Without objection, the Chair is authorized to 
declare recess at any time.
    Before I deliver my opening remarks, I wanted to note that, 
today, the Committee is meeting both in person and virtually, 
which in my case is probably a good thing since I'm currently 
on day five of self-isolation due to mildly symptomatic COVID.
    I want to announce a couple of reminders to the Members 
about the conduct of the hearing. First, Members and staff who 
were attending in person may choose to be masked, but it is not 
a requirement. However, any individuals with symptoms, a 
positive test, or exposure to someone with COVID-19 should wear 
a mask while present.
    Members who are attending virtually should keep their video 
feed on as long as they are present in the hearing. Members are 
responsible for their own microphones. Please also keep your 
microphones muted unless you're speaking.
    And finally, if Members have documents that they wish to 
submit for the record, please email them to the Committee 
Clerk, whose email address was circulated prior to the hearing.
    Well, good morning, and welcome to our Members and 
witnesses. For today's hearing, I'm proud to announce that 
Representative Perlmutter and I, in our copious spare time, 
have been conducting experiments on a groundbreaking topic in 
nuclear physics. We are excited to share the results of that 
effort today as a preprint, and we plan to submit it to the 
Reviews of Modern Physics.
    Well, just kidding, folks. What I'm actually referring to 
is an automated rip-off of a seminal paper published in the 
Journal of Physical Review in 1939 by Enrico Fermi called 
``Neutron Production and Absorption in Uranium,'' which has 
certain applications and relevance to nuclear power and nuclear 
weapons. We took Dr. Fermi's paper and ran it through a free 
online fake text generator that uses artificial intelligence 
(AI) to disguise plagiarism, and this took about 15 seconds. We 
then took about five minutes to tweak a few sentences to 
disguise the true source a little better. And once it was 
ready, we ran this paper through two well-respected plagiarism 
checkers. Each of these detectors found our fake paper was, and 
I quote, ``100 percent unique and 0 percent plagiarism.'' Not 
surprisingly, these fake content generators have presumably 
been tuned up to generate low plagiarism scores, sort of the 
spambot equivalent of the generative adversarial network AI 
technique that's used to generate deep fake images and videos. 
And we even sent it over to the Inspector General at the 
National Science Foundation (NSF).
    Now, any real human physicist peer reviewer for a journal 
or an NSF grant proposal would notice immediately that this 
paper uses silly technical jargon and plagiarizes from a very 
famous paper, and they would also find it unconventional that 
the report was authored by two sitting Congressmen and includes 
an acknowledgement to our Ranking Member Jay Obernolte. But you 
can imagine how bad actors might use tools as we did to sneak 
plagiarized content past journal editors and peer reviewers.
    The AI-assisted plagiarism tool we used to make this fake 
paper is only one of the many in the arsenals of paper mills. 
These are criminal enterprises that sell authorship credits for 
the fraudulent papers they place in academic journals. 
Scientific disciplines such as the life sciences, which rely 
heavily on images to communicate the results of experiments, 
are popular targets for fraud because of how easy it is to 
manipulate images.
    Now, with the advent of sophisticated natural language 
processing software, it's becoming just as easy to churn out 
fraudulent but outwardly coherent text. Add in a few basic 
templates and the creation of hundreds of papers, complete with 
figures and citations, becomes the work of an afternoon. And 
much to the--this is much to the disgust of real scientists who 
might spend months on a single paper.
    The scientific community must rise to meet this challenge, 
and it is already taking the first steps. Journals are looking 
for new ways to collaborate in detecting fraud during the 
review process. One recent effort is the STM (International 
Association of Scientific, Technical, and Medical Publishers) 
Integrity Hub, which would serve as a platform for journals to 
share dedicated fraud detection tools. The first tool under 
construction will flag the simultaneous submission of papers to 
multiple journals, which is a strong indicator of paper mill 
activity.
    There's also a strong international community of 
researchers unaffiliated with publishers, many of them 
volunteers, who work to identify fraudulent papers following 
publication. Just as automation is enabling those committing 
fraud, it is also being used by these researchers to combat it. 
Next-generation plagiarism checkers don't just compare text to 
text, but intelligently scan for indicators of AI-generated 
text. Other tools detect manipulated images or identify 
erroneous science within the text of the paper itself. The 
automation arms race is upon us. We are here today to discuss 
how researchers and publishers can develop tools and policies 
that will help them stay ahead of the paper mills.
    As we discuss scientific misconduct today, one of the most 
important things to keep in mind is the scale of the problem. 
Hundreds of papers with signs of fraud are indeed a serious 
concern. However, according to the NSF, a whopping 2.9 million 
papers were published last year alone. The number of cases of 
fraud must be viewed within that context. Creating and 
maintaining a body of scientific literature without flaws of 
any kind is an impossible quest, but published scientific 
literature remains the greatest body of human knowledge in the 
world, and it is our responsibility to look after its 
integrity.
    This effort begins with a public dialog. As Dr. Fermi said 
famously, ``Whatever nature has in store for mankind, 
unpleasant as it may be, men must accept, for ignorance is 
never better than knowledge.''
    I look forward to earning more knowledge today with the 
help of our esteemed witnesses.
    [The prepared statement of Chairman Foster follows:]

    Good morning and welcome to our Members and witnesses.
    For today's hearing, I'm proud to announce that 
Representative Perlmutter and I have, in our spare time, been 
conducting experiments on a groundbreaking topic in nuclear 
physics. We're excited to share the results of that effort 
today as a preprint. We plan to submit it to Reviews of Modern 
Physics.
    Just kidding. What I'm actually holding is a cheap rip-off 
of a seminal paper called ``Neutron Production and Absorption 
in Uranium,'' which was published in the journal Physical 
Review in 1939. Its author was Enrico Fermi. We took Dr. 
Fermi's paper and ran it through a free online text generator 
that uses artificial intelligence to disguise plagiarism. This 
took 15 seconds. We then took five minutes to tweak a few 
sentences to disguise their true source a little better. Once 
it was ready, we ran this paper through two well-respected 
plagiarism checkers. We even sent it over to the Inspector 
General at the National Science Foundation. Each of these 
detectors found our fake paper was, and I quote--``100% unique, 
0% plagiarism.''
    Now, any real physicist peer reviewer for a journal or an 
NSF grant would notice immediately that this paper uses silly 
technical jargon and plagiarizes from a very famous paper. They 
would also find it unconventional that the report was authored 
by two sitting Congressmen and includes an acknowledgement to 
Ranking Member Jay Obernolte. But you can imagine how bad 
actors might use tools as we did to sneak plagiarized content 
past journal editors and peer reviewers.
    The AI-assisted plagiarism tool we used to make the fake 
paper is only one of many in the arsenals of ``paper mills.'' 
These are criminal enterprises that sell authorship credits for 
the fraudulent papers they place in academic journals. 
Scientific disciplines such as the life sciences, which rely 
heavily on images to communicate the results of experiments, 
are popular targets for fraud because of how easy it is to 
manipulate images. Now, with the advent of sophisticated 
natural language processing software, it is becoming just as 
easy to churn out fraudulent but coherent text. Add in a few 
basic templates and the creation of hundreds of papers--
complete with figures and citations--becomes the work of an 
afternoon, much to the disgust of real scientists who might 
spend months on a single paper.
    The scientific community must rise to meet this challenge, 
and it is already taking the first steps. Journals are looking 
for new ways to collaborate in detecting fraud during the 
review process. One recent effort is the STM Integrity Hub, 
which will serve as a platform for journals to share dedicated 
fraud detection tools. The first tool under construction will 
flag the simultaneous submission of papers to multiple 
journals, a strong indicator of paper mill activity.
    There is also a strong international community of 
researchers unaffiliated with publishers, many of them 
volunteers, who work to identify fraudulent papers following 
publication. Just as automation is enabling those committing 
fraud, it is also being used by these researchers to combat it. 
Next generation plagiarism checkers don't just compare text to 
text, but intelligently scan for indicators of AI-generated 
text. Other tools detect manipulated images or identify 
erroneous science within the text of the paper itself. The 
automation arms race is upon us. We are here today to discuss 
how researchers and publishers can develop tools and policies 
that will keep them ahead of the paper mills.
    As we discuss scientific misconduct today, one of the most 
important things to keep in mind is the scale of this problem. 
Hundreds of papers with signs of fraud are indeed a serious 
concern. However, according to NSF, a whopping 2.9 million 
papers were published last year alone. The number of cases of 
fraud must be viewed within that context. Creating and 
maintaining a body of scientific literature without flaws of 
any kind is a quixotic quest.
    But published scientific literature remains the greatest 
body of human knowledge about the world, and it is our 
responsibility to look after its integrity. This effort begins 
with a public dialogue. As Dr. Fermi said famously--

    ``Whatever Nature has in store for mankind, unpleasant as 
it may be, men must accept, for ignorance is never better than 
knowledge.''

    I look forward to earning some more knowledge today with 
the help of our esteemed witnesses.
    I now yield to Ranking Member Obernolte for his opening 
statement.

    Chairman Foster. And I now request unanimous consent to 
include in the record for this hearing both the real paper by 
Dr. Fermi and the fake one that we created, as well as a letter 
from the NSF Inspector General about they--how they tried to 
detect our sleight of hand.
    Many thanks to Inspector General Allison Lerner, Dr. Aaron 
Manka, and their colleagues for their help in this.
    And now the Chair will now recognize the Ranking Member for 
the Subcommittee on Investigations and Oversight, Mr. 
Obernolte, for an opening statement.
    Mr. Obernolte. Thank you very much, Chairman Foster, and 
I'm sure everyone here on the dais joins me in wishing you a 
speedy recovery and wishing that your illness remains 
asymptomatic. We're looking forward to having you back with us 
in person.
    And I want to thank you for convening this hearing on an 
incredibly important topic, the topic of research integrity. It 
really is a topic that underpins our entire system of academic 
research here in this country.
    A couple of years ago, I went back to graduate school to 
finish my doctorate, and finishing that dissertation was one of 
the hardest things I've ever done in my life. An important part 
of any research is to review the field of literature and the 
body of work on your research topic to determine exactly what's 
been done before and what the state-of-the-art in your research 
is. And I'll tell you, as I was going through the research in 
my field, it never occurred to me that some of those papers 
might be fraudulent.
    That's one of the reasons why this hearing is so important 
is to call attention to what is a cutting-edge field in the 
body of academic research and literature, you know, this 
emergence of fraudulent research and paper mills and also to 
cast some light on the ways that technology can both enable 
this bad behavior by creating powerful tools that anyone could 
use to generate fraudulent papers, but also in combating the 
spread of fraudulent research by identifying the papers that 
might have been generated with artificial intelligence 
technology. So it's one of the reasons why I'm very much 
looking forward to this hearing.
    I actually think that this is one of the things that we 
here in Congress can do very effectively, which is to 
simultaneously be a podium for the dissemination of information 
such as this because spreading awareness of this is going to be 
key to preventing the proliferation of this bad behavior. But I 
also think we have a role to play in catalyzing more research 
into how prevalent this problem is, in funding some Federal 
research into identifying the spread of the problem and 
identifying not only the causes of the problem but also some of 
the technology-based solutions to that problem, and in general 
raising awareness of this issue. And I also hope that everyone 
in the academic community joins us in recognizing just how 
destructive these paper mills have the potential to be for 
research integrity in general. I think the greater awareness 
that we have not only this is a problem but the--of the need to 
make severe penalties apparent for those who engage in 
destructive behavior like this, I think that's going to be key 
to controlling the spread of this problem.
    So, Mr. Chairman, again, thank you very much for convening 
the hearing. I'm looking forward to hearing from our witnesses. 
I yield back.
    [The prepared statement of Mr. Obernolte follows:]

    Good morning. Thank you, Chairman Foster, for convening 
this hearing. And thanks to our witnesses for appearing before 
us today.
    We are here today to discuss one of the most important 
aspects of scientific work, and the objective trust it 
instills, research integrity. As a member of the academic 
community myself, I am both troubled and perplexed to hear 
about the issue of paper mills. I am troubled because of the 
potential harm that these fraudulent papers can do to the 
scholarly record and perplexed by the motivations of 
researchers who choose to buy papers from a paper mill.
    There is a saying in academia, referenced in today's 
Hearing Charter: ``publish or perish''. This mentality, along 
with other stringent career requirements internationally, seems 
to be driving some researchers to pad their resumes with paper 
mill papers.
    Given our role in authorizing and overseeing the national 
research enterprise, I think it is important that we recognize 
this dynamic, while also thinking about how we can prevent this 
bad behavior. It is vital that publishers and universities 
remain diligent in preventing these fraudulent publications, 
and that there are consequences for engaging in this bad 
behavior.
    As so often happens, the advancement of technology is an 
important tool to help the academic community rise to this 
challenge. Emerging technologies like AI are being used today 
to combat fraud by detecting plagiarism and faulty data. We 
should be wary though, because as these tools advance, so do 
tools to enable more bad behavior. One stark example is 
presented today by Chairman Foster's experiment -using an AI 
tool to create a fake academic paper. Even more problematic--
this paper was not flagged as plagiarism by advanced plagiarism 
tools. Technology brings us new opportunities, but also new 
challenges. To combat this fraud it is important that the 
community remains diligent about the strengths and weaknesses 
of technology, and considers how additional investments in 
research can help to address this problem.
    This is one area where I believe the Federal Government can 
play an important role--funding additional research on fraud 
detection. By placing emphasis and resources on research to 
create tools to help detect and flag fraudulent papers, federal 
research agencies can provide valuable input on what methods 
and tools should be considered best practices.
    I am looking forward to hearing from our witnesses today. 
Each of them represents an important perspective in the 
academic community on how to combat this issue at a different 
stage in the process.
    Thank you, Chairman Foster, for convening this hearing. And 
thanks again to our witnesses for appearing before us today. I 
look forward to our discussion.
    I yield back the balance of my time.

    Chairman Foster. Thank you.
    [The prepared statement of Chairwoman Johnson follows:]

    Good morning. Today's hearing will consider what seems to 
be a growing threat to the integrity of scientific publishing. 
The number of papers retracted in 2021 crossed 3,500, and 
volunteer sleuths find hundreds of cases of research misconduct 
each year.
    I do not want to suggest that scientific journals are not 
paying attention to research misconduct. Quality control in 
paper submissions is a journal's bread and butter. Their 
reputations are a direct result of how successful they are in 
keeping fraudulent content out of print. But I also understand 
that if the goal is to keep 100% of fraud, fabrication, and 
plagiarism out of print, the odds are not in their favor.
    With the dawn of foreign paper mills, the production of 
fraudulent content is now systematic. Language models powered 
by artificial intelligence are growing more sophisticated every 
day, making it easier than ever to produce fake content that 
looks authentic, or plagiarize real content so that it looks 
original.
    As the methods of bad actors grow more powerful, we need to 
consider whether the scientific publishing enterprise is arming 
itself accordingly. Do journals have access to cost-effective, 
automated tools to assist with detecting misconduct before they 
even get to the peer review stage? Are there any automated 
tools that peer reviewers themselves can use to assist in their 
evaluation of original research? Are journals both motivated 
and equipped to investigate and adjudicate in a timely fashion 
any claims of misconduct that might be made about a paper that 
they have already published? Do journals always make it clear 
when it an article has been retracted for misconduct, so that 
the influence of the offending science is curtailed 
appropriately?
    Our hearing today is focused on scientific journals, which 
are privately managed and funded. Prevention and detection of 
misconduct in federally funded research is its own critical 
issue. But I want to underscore that because of how scientists 
lean on the other work of others, scientific integrity in 
privately-funded research is still a public good. Remember that 
in order to ``see further'' in his research, Sir Isaac Newton 
``stood on the shoulders of giants.'' Scientists use the 
published work of others to inform their own findings. Those 
other scientists are often halfway around the world, trying to 
publish and get ahead in an environment that the United States 
doesn't control.
    If fraudulent work from any nation is allowed to persist in 
the scientific literature, it can undermine the good faith 
efforts of honest researchers. It can even influence laws or 
the behavior of the public to disastrous effect. Consider the 
fraudulent Wakefield paper, first published in 1998, which 
suggested childhood vaccines cause autism. A savvy journalist 
raised alarms about the critical flaws in this paper in 2004, 
but it was not officially retracted until 2010. It wreaked 
untold harm on public health in the interim.
    I commend the volunteers like Dr. Byrne and Dr. Stell for 
their dedication to integrity in the scholarly record. The work 
that you and your peers do is a true service to the public. I 
know that it is often done at great personal sacrifice. I also 
want to commend Mr. Graf and STM for acknowledging the threats 
to your industry and for pursuing some scalable tools to 
address it. I look forward to hearing today about how 
government can be a partner to you going forward.
    I yield back.

    Chairman Foster. And at this time, I'd like to introduce 
our witnesses. Our first witness is Dr. Jennifer Byrne. Dr. 
Byrne is a Professor of Molecular Oncology at the University of 
Sydney. Her research has helped inform international debate on 
systemic fraud within the preclinical research literature, and 
she's an advocate for improved post-publication error reporting 
and correction. Dr. Byrne was a keynote speaker for both the 
2021 Computational Research Integrity Conference and the 
Singapore Research Ethics Conference. She also chaired the 
Paper Mill Symposium at the 2022 World Conference on Research 
Integrity.
    After Dr. Byrne is Mr. Charles Graf--Chris Graf, excuse me. 
Mr. Graf is the Research Integrity Director and Leader of the 
Editorial Excellence Team in Springer Nature. He also chairs 
the Governance Committee of the STM Integrity Hub, an 
initiative launched in early 2022 to help publishers 
collaborate to protect research integrity. Mr. Graf previously 
served as the Chair of the Committee on Publication Ethics, or 
COPE, and as a member of the Program Committee for the Seventh 
World Conference on Research Integrity.
    Our final witness is Dr. Brandon Stell. Dr. Stell leads a 
research team with the French National Centre for Scientific 
Research, CNRS, studying the processing of sensory information 
in the brain. In 2012, Dr. Stell co-founded the website 
PubPeer.com to provide scientists with a forum to discuss the 
published research literature. PubPeer has since grown to be 
one of the leading sites for scientific discussion with a 
dedicated community of users who have helped strengthen the 
scientific record by exposing and correcting its weaknesses.
    As our witnesses should know, each of you have five--
whoops--yes, excuse me. As each of you should know, you will 
have five minutes for your spoken testimony. Your written 
testimony will be included in the record for the hearing, and 
you will all--when you all have completed your spoken 
testimony, we will begin with questions. Each Member will have 
five minutes to question the panel.
    And we will start now with Dr. Byrne.

                TESTIMONY OF DR. JENNIFER BYRNE,

                    DIRECTOR OF BIOBANKING,

               NEW SOUTH WALES HEALTH PATHOLOGY;

                PROFESSOR OF MOLECULAR ONCOLOGY,

                      UNIVERSITY OF SYDNEY

    Dr. Byrne. Thank you very much. So it's a pleasure to be 
here today. And I would very much like to thank Chairman Foster 
and Ranking Member Obernolte and all of the distinguished 
Members of the Committee. So my name is Jennifer Byrne, and I 
am a cancer researcher who has been studying what I believe to 
be systematic research fraud for about the past seven years.
    As we have heard, paper mills are commercialized 
organizations that provide undeclared services to authors of 
scientific and scholarly publications including fabrication, 
fabricated data, and manuscripts. This represents a significant 
threat to science in terms of both its practice and its 
reputation. And the literature must take a no-tolerance 
approach toward papers that may have been constructed solely 
for career or commercial gain.
    So there are a number of major factors that either drive 
authors toward paper mills or enable their activities, the 
first of which is unrealistic publication requirements that can 
be leveled across a broad range of authors, such as academic 
students and medical doctors, who may be--who may not be able 
to achieve the publication requirements that their institutions 
require of them.
    But the--more in the field of publishing there has been an 
increasing focus upon a commercial focus through increasing use 
of author-paid publication services. These add to digital 
publishing capacities over the last 20 years that have greatly 
increased the capacity for papers to be published rapidly and 
also have enabled the creation of new journals. In contrast, 
the experimental sciences have not experienced the same 
capacity to increase their rate of data production.
    Finally, a very important issue is the imbalance that 
currently exists between the production and the correction of 
scientific and academic publications. So most systems require 
appropriate balances between production and quality control. 
For example, cooking in a kitchen requires somebody to clean 
the kitchen afterwards. But at the moment in the scientific 
literature, the activity of production is greatly--it greatly 
overwhelms the capacity to clean and remove waste from the 
literature. This is a major advantage for research fraud and 
fraudulent publications because once they are published, they 
are very difficult to remove.
    So the scope of the presence of fraudulent papers from 
paper mills within the scientific literature is largely unknown 
and--because this has been understudied. We recently screened 
just under 12,000 human gene research papers where we 
identified over 700 papers with errors that could signal paper 
mill involvement. Extrapolating from screening a very tiny 
fraction of the literature, I would estimate that the human 
gene literature contains more than 100,000 papers that have 
been produced by paper mills. This is a very serious issue, and 
the overall presence of paper mill publications and literature 
could be much higher because, obviously, many disciplines 
beyond human gene research have been targeted.
    So the possible ramifications of large numbers of 
fraudulent papers are very concerning. For the research 
community, it is very likely that these papers are already 
misleading researchers in their research directions. They can 
damage research careers at all stages, encourage the support of 
unproductive research directions, and slow research 
translations through opportunity costs.
    So clearly, given this scale of paper mill contributions, 
automated tools are necessary for the identification of the 
products of paper mills. We have used automation to screen 
papers for wrongly identified nucleotide sequences. These are 
reagents that are used in experiments, and their identities 
cannot be determined by the human eye, but they can be verified 
by appropriate detectors.
    So the Seek & Blastn tool that was created by Dr. Cyril 
Labbe in Grenoble in 2017 uses an automated system of 
detection. Experience with this tool indicates that it provides 
a scale that cannot be matched by human experts. But its 
results need to be checked by humans in order to avoid false 
accusations of research errors, and clearly, this type of 
support can be difficult to obtain through research grants.
    Tools such as Seek & Blastn can also be used by paper mills 
to remove errors from their papers and create papers that are 
more plausible and more likely to be published. So it is very 
important in my view that we move toward targeting paper mills 
through features that represent their business model as opposed 
to features of their products.
    Publishers are now likely to be actively screening 
manuscripts for features of paper mills as an attempt to both 
detect and deter future submissions. We have proposed that 
another method that could be taken would be to require all 
research manuscripts to be posted to preprint service at the 
time of submission to reduce the duplicate submissions that 
Chairman Foster referred to in his opening address. We also 
believe that more aggressive steps are required to specifically 
disrupt the paper mill model such as to delay manuscript 
submissions through compulsory registrations at least one year 
prior to manuscript submission. This would not deter 
experimental scientists but would greatly damage the rapid 
publication timeframes that paper mills rely upon.
    We would also like to see journals turning the same tools 
that they're using for screening manuscripts into their own 
archives to identify the papers from paper mills that have 
likely already been published and that are already misleading 
researchers in their daily work.
    The Committee on Publication Ethics has recently described 
the need for retraction processes to rapidly adapt in response 
to the possibility of paper mills. We have proposed that 
journals could rapidly flag papers with verifiable errors using 
neutrally-worded notices such as editorial notes before 
investigation starts, as opposed to when they conclude so that 
researchers can be aware of papers having problematic features.
    So, in summary, paper mills represent an unprecedented 
challenge to scientific and academic publishing, but they also 
provide a tremendous opportunity to enact transformational 
change. This can be achieved by increasing the oversight of 
scientific publishing, recalibrating our capacity to correct 
published information, as well as to produce new information 
and overhauling the reward systems that underpin the careers of 
researchers and other professionals who publish it within the 
academic literature.
    Thank you very much again for this opportunity to speak 
before the Committee, and I'll be very happy to answer any 
questions.
    [The prepared statement of Dr. Byrne follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
    
       
    Chairman Foster. Thank you. And next is Mr. Graf.

                  TESTIMONY OF MR. CHRIS GRAF,

         RESEARCH INTEGRITY DIRECTOR, SPRINGER NATURE;

                 CHAIR OF THE GOVERNANCE BOARD,

                 STM ASSOCIATION INTEGRITY HUB

    Mr. Graf. Thank you, Chairman Foster and Ranking Member 
Obernolte and esteemed Members of the Committee for inviting 
me. I'll give an overview following Jenny Byrne's overview of 
what research publishers are doing to safeguard research 
integrity.
    As introduced previously, I'm Chris Graf, the Research 
Integrity Director at Springer Nature, which is one of the 
world's leading research publishers and Chair of the Governance 
Committee for the Integrity Hub, which is a collaborative tech 
initiative from the STM Association, which is the global trade 
body for research publishers.
    My written testimony explains how the research publishing 
sector is facing one of its current challenges, namely that of 
paper mills and research misconduct. And I conclude in that 
written testimony that the opportunities exploited by paper 
mills are created somewhere upstream where research is done. 
Publishers can and are doing more to stop papers from paper 
mills, and other actors also have a responsibility, including 
the organizations that fund and employ and set policy for 
researchers and for research. I'd argue that's where the 
solution lies in a broad coalition of those who are able to act 
to make change happen.
    I would like to use the rest of my five minutes for some 
background. I'll talk briefly about science then briefly about 
publishing, and then a little more extensively about paper 
mills. First, science, so trust in science remains strong. The 
2021 survey from NORC at the University of Chicago reports that 
48 percent of Americans have a great deal of confidence in the 
scientific community. And members of those scientific 
communities published--well, my numbers say 5 million peer-
reviewed scientific articles in '21. That's from a dimensions 
data base. That contrasts with the 2.9 million that the NSF 
reported, so lots of millions, but somewhere between 2.9 and 
five. Within that, if we zoom in a little on COVID, there are 
now 630,000 COVID papers in the World Health Organization data 
base for literature on COVID, on coronavirus disease. Half of 
those were published last year in 2021. So that's a view across 
science and a bit about publishing that science.
    But we're here to talk about paper mills and misconduct. So 
that's when things go wrong or actually very wrong. What 
happens then? Well, you may know but we retract scientific 
papers when un-addressable concerns are identified. Those 
concerns range from honest and fundamental errors that might be 
embarrassing for a researcher, but, you know, that researcher 
should be applauded for addressing them and for retracting 
those papers and clearing up the inaccurate information they've 
published. And they range from those honest and fundamental 
errors through questionable and misleading research, which 
could be naive and might be negligent, but probably isn't 
malicious, right through to misconduct, including that promoted 
by paper mills.
    And that kind of retraction doesn't happen often. 
Historically, 4 in 10,000, peer-reviewed science articles are 
retracted after publication. Zooming in to look at that through 
the lens of COVID, about 300 of the 630,000 COVID papers 
published so far have been retracted. And I think that's 
similar to the general rate that I described earlier, 4 in 
10,000.
    You know, I think that's an indicator of significant and 
successful investments made into quality and into integrity by 
researchers first and also by publishers when it comes to 
publishing them. I'd argue that the contribution that research 
publishers make to quality and integrity is--well, I'd argue 
it's true. It's based on years of collaborative efforts. 
Publishers with other stakeholders for years have been 
developing and sharing resources about how to manage honest but 
fundamental mistakes through to the other end of the range that 
we talked about earlier, to misconduct and systemic--systematic 
manipulations.
    And publishers continue to invest in screening for 
integrity, including routine checks for plagiarism that we've 
heard about, but they're being enhanced and improved, as well 
as other indicators for ethics and quality like the disclosures 
of conflicts of interest. Some publishers, as Jenny referred 
to, are beginning to roll out screening for image manipulation, 
which is a newer fingerprint that might indicate the presence 
of a paper mill. And both of these require investments not only 
in technology but in actual people to use that technology.
    Even so, I agree that paper mills are a growing threat. 
Evidence suggests they're operating with relative freedom. When 
they find a way into a journal that has weak defenses, they 
certainly exploit that, and they do cause real damage. We've 
referred to that already. They steal credentials from 
legitimate researchers, for example, they con their way into 
editorial positions of power at journals, and then they use 
that to their advantage. And that's identity theft, and that's 
fraud. They do many other inappropriate things as well.
    So let me close with what I think the challenge is. 
Legitimate researchers currently benefit from a largely trust-
based system. Solving the paper mill problem without making 
publishing harder and less trust-based for the vast majority of 
legitimate researchers I think is the challenge. The new STM 
Integrity Hub shows how publishers are taking collective action 
and using their combined knowledge and technology to do just 
that.
    So that's where I'll end. Thank you for the opportunity to 
present on how the publishing sector is responding to the 
challenges of research misconduct. It really was a privilege 
and is a privilege to be part of today's hearing. I genuinely 
look forward to your questions and also to continue to serve in 
any way that might be useful for you. Thank you.
    [The prepared statement of Mr. Graf follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
    
    
    Chairman Foster. Thank you. And next is Dr. Stell.

        TESTIMONY OF DR. BRANDON STELL, NEUROSCIENTIST,

        FRENCH NATIONAL CENTRE FOR SCIENTIFIC RESEARCH;

        PRESIDENT AND CO-FOUNDER, THE PUBPEER FOUNDATION

    Dr. Stell. Chairman Foster, Ranking Member Obernolte, and 
all distinguished Members of the Committee, it is an honor to 
join the hearing today. I'm a U.S. citizen and neuroscientist 
with the French National Center for Scientific Research and 
President and Co-Founder of the PubPeer Foundation, which is a 
nonprofit organization that maintains the website PubPeer.com. 
My testimony today is my own and does not necessarily reflect 
the views of the CNRS.
    In the fall of 2012, I launched PubPeer with the help of 
two colleagues, aiming to provide a forum for scientists to 
discuss the scientific literature. Today, I run it with the 
help of Boris Barbour, another CNRS researcher. My motivation 
for the website is to capture discussions of the scientific 
literature that we scientists typically have in the lab and 
share them publicly to help other scientists evaluate the 
scientific literature. The site has witnessed the emergence of 
a community of expert reviewers and helped expose low-quality 
research. The numbers--numerous regular PubPeer users are now 
clearly more experts than most journal and institution staff 
when it comes to the forensic examination of the literature. We 
currently receive around 3,500 comments and 700,000 pageviews 
per month.
    Today, I will share insights from the first 10 years of 
running this website. When we launched this site, we received a 
flood of comments pointing out serious flaws in the literature. 
It was as if there was a backlog of unprocessed problems, and 
the website provided a new release valve to share them. Prior 
to the website, the procedure for sharing such problems was to 
write to the authors, their institutions, and the journals and 
hope that one of them would correct the record. Even when this 
process succeeded, it was extremely slow, and conflicts of 
interest often discouraged any action.
    The site allows this blockage to be circumvented. Issues 
can be immediately displayed online so that anyone interested 
can be made aware and authors can respond. Shortening by months 
or often years the time it takes to find out about these issues 
in the literature saves researchers' time and ultimately tax 
dollars that would have been spent trying to build on flawed 
research.
    How can this be? How did these issues find their way into 
the literature? Perhaps the underlying reason is not too 
surprising. With the expansion of science, we continue to 
create new and important advances at a faster rate than ever, 
recent examples being development of vaccines during the 
pandemic and the James Webb Telescope. However, this expansion 
creates challenges for identifying and supporting the best 
scientist and scientists--science and scientists.
    Job postings for faculty researchers and funding 
opportunities can now receive hundreds or thousands of 
applications, and shortcuts for screening those applications 
become more and more tempting. It's much faster to look up 
metrics about a journal where an article is published than it 
is to actually read the article. And applicants, perhaps 
falsely, believe that those metrics are the key to the 
advancement of their careers. In that atmosphere, it's easy to 
see how many of these issues we see raised on PubPeer find 
their way into the literature. Instead of publishing a boring 
result in a journal with a lower metric, there are incentives 
to select, misrepresent, or even falsify data in an attempt to 
give a falsely positive result that would land in a journal 
with higher metrics.
    A sensational example of the problem is paper mills, which 
produce articles for the sole purpose of artificially inflating 
the publication and citation metrics, while hoping that nobody 
ever reads them to see that they are fake. They do get 
published, sometimes by reputable publishers, but perhaps 
that's not too surprising since journals collect fees for every 
article they publish, regardless of its quality.
    Although I believe these paper mill articles cause little 
harm to the overall progress of science since they would rarely 
be confused for real scientific results by scientists, they do 
highlight the underlying problem with scientific publishing. 
Incentives need to shift to place higher importance on the 
content of articles and not on the metrics.
    How can we fix this problem? The metrics surrounding 
articles are now ingrained in the community and unlikely to 
disappear anytime soon, even if they should. Commentary on 
sites like PubPeer can provide parallel sources of information 
that can be much more informative. If scientific commentary 
continues to grow and involve diverse sections of the 
community, evaluation committees could start relying on it more 
than the current metrics so that incentives might shift back 
toward solid reproducible results that stand up to public 
scrutiny.
    If contributions to this body of evaluation were rewarded 
when evaluating researchers for funding, promotion, and prizes, 
it is likely that scientists would participate to a greater 
extent. The Federal Government through its funding of science 
could play a huge role, but that potential influence is largely 
unrealized today. To our knowledge, funding agencies like the 
NIH (National Institutes of Health) and NSF have no procedures 
to exploit information available through community curation 
sites like PubPeer.
    In addition to providing evaluation of the publications 
referenced in grant applications, the information from these 
sites could be used to reward scientists that make exceptional 
contributions to public evaluation.
    I appreciate the Committee's interest in this very 
important issue, and I thank you for your time and look forward 
to answering any questions you may have.
    [The prepared statement of Dr. Stell follows:]
    
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]    
       
    Chairman Foster. Thank you. And at this point, we will 
begin our first round of questions, and the Chair will now 
recognize himself for five minutes.
    So first, to all our--all of our witnesses here, in your 
testimony, you all discuss the problem of incentives. The 
scientific community, you know, still largely follows the model 
of publish or perish where simply getting papers out the door 
is a big component of advancing a career. And so, briefly, what 
would be involved in generating better matrix--metrics that the 
scientific community could use to assess the quality of the 
research? For example, do the metrics adequately, you know, 
punish the ratings of researchers that have a high rate of 
retracted papers? Or maybe the traditional H index, you know, 
which I presume has been badly corrupted by paper mills you 
know, maybe it should--instead of being recorded as a positive 
integer should be recorded as a complex number with both a real 
and imaginary component for the H index.
    So if you could just speak briefly on what improved metrics 
are under under consideration at this point, and we'll go in 
reverse order. And I'll start with Dr. Stell.
    Dr. Stell. So I personally feel that metrics in general are 
a horrible way of evaluating science. I think that we need to, 
as a community, put more focus on the content of the articles, 
read them, and evaluate scientists for what they publish and 
not the metrics associated with those articles. And I think 
that one way of accomplishing this is by creating this body of 
information from scientists that are commenting on articles and 
creating a secondary source of information in addition to these 
metrics that will hopefully overtake the use of metrics one 
day.
    Chairman Foster. And, Mr. Graf?
    Mr. Graf. Thank you. Yes, that's a good question. The 
things that I read about research assessment will point in the 
same sort of direction that Brandon just described, toward a 
more qualitative approach to understanding the impact of a 
piece of research. And in some nations that--I believe that's 
done through descriptive case studies where researchers prepare 
a report about a piece of work that explains its importance. 
And then the Research Assessment body uses that, perhaps with a 
suite of metrics, not relying on one, but with multiple metrics 
to form an opinion and to reward that researcher. So I 
definitely think the mood in the room is a distinct desire to 
move away from reductive metrics and toward something that's a 
lot more reflective, the complex nature of research itself, and 
much more qualitative.
    Chairman Foster. Yes, Dr. Byrne?
    Dr. Byrne. Thank you. Just to add to those comments, I 
think it's helpful if there are publication requirements 
leveled on professionals, that those requirements or 
expectations recognize people's training and capacity to 
conduct research. Clearly systems where, for example, medical 
doctors are required to publish papers when they have neither 
the time, the resources, or the training to conduct research, 
that seems like a bad system. I would prefer a medical doctor 
to be assessed based upon their capacity to care for patients 
and provide cures.
    In terms of other forms of metrics, I think a number of 
commentators have thought about the desirability of the journal 
impact factor to also reflect other dimensions of publisher 
activity such as the capacity to correct the literature, as 
well as to produce it. Thank you.
    Chairman Foster. And so another thing that I think has 
occurred to probably all of us, reading your testimony, is the 
role of government in supporting more and better automated 
tools on this. Is that--you know, I loved, Dr. Byrne, your 
analogy with waste removal, that we just need a system for 
waste removal in the system. And so are--and we--you know, in 
normal communities throughout the country, part of the budget 
goes into waste removal, and maybe part of the solution is just 
turning up the fraction of our municipal budgets we would 
devote to that.
    Are there specific proposals that you've seen that seem to 
make sense for--you know, for putting more muscle into this 
effort?
    Dr. Byrne. Look, I think--I've thought about this a lot, 
and it is unusual that science is so geared toward publication 
or production and sort of relentless production without that 
capacity to just cleanup every so often. So there could be a 
system where, you know, a certain proportion of research 
budgets could be devoted to that particular activity. Another 
system could be that if you are given research funding to do 
original research, that perhaps a proportion of that budget 
could be also devoted to a certain kind of quality improvement 
activity within the literature.
    Chairman Foster. Thank you, and my time is expired. But I 
love the idea of financial incentive somehow. I've learned 
and--when I moved from science to politics is that people 
respond awesomely to them, so that if there was some bounty on 
taking down garbage papers, I suspect we've got a very active 
international community collecting that bounty.
    And so at this point, I'd like to recognize the Ranking 
Member for five minutes of questions.
    Mr. Obernolte. Thank you, Mr. Chairman. And thanks to the 
witnesses, a really interesting topic here.
    So, you know, I think it's always interesting to 
concentrate on the--you know, the most negative consequences of 
the problem that we're discussing. So the attorneys call that 
the parade of horribles, the worst thing that could possibly 
happen. And in my opinion, when you're talking about paper 
mills, we've got two potential consequences. You've got 
fraudulent research getting rewarded by unqualified people 
graduating or by unqualified people earning promotion, right? 
So we can attack that perhaps with tools to identify these 
fraudulent papers and hopefully create consequences for the 
people that are using the paper mills.
    But to me, the more consequential problem is the impact on 
research integrity, this idea--and I think Dr. Byrne brought it 
up a little bit--that we're--researchers might be misled in 
their daily work by the presence of these fraudulent papers in 
the body of literature.
    But I want to tunnel down a little bit on the likelihood 
that that would happen. And I'd like to open this up to anyone 
that wants to comment of our witnesses on this because my first 
question is, wouldn't--if you were an editor of a journal and 
you're reviewing a paper that was submitted and it came from a 
paper mill, wouldn't it be pretty immediately apparent? I mean, 
right now, the paper that Dr. Foster generated, you know, 
that--anyone with a technical background could read it and 
realize that there were inconsistencies. But, you know, we're 
looking at maybe one more generation ahead, you know, where AI 
tools might create something that would convince someone that 
wasn't an expert in the field. But if you're an editor in a 
journal, I mean, you're familiar with what the state-of-the-art 
is in various fields. Wouldn't you immediately know, you know, 
reading a paper, well, this doesn't make any sense because, you 
know, really what people--the topics of research are A, B, and 
C, and, you know, this isn't even consistent with that. 
Wouldn't people know about that?
    Dr. Byrne. Thank you. Look, I might just--I'll stop and 
just give a brief answer. I think that some paper mill papers 
are actually highly plausible. They have already been accepted 
for publication, which means they've passed editorial review, 
they've passed peer review, and they've moved into the 
literature. So they're actually highly plausible, and they very 
closely resemble genuine papers. And I think that's the great 
danger. I mean, I can certainly think of many researchers who 
would read the kinds of papers that we study, particularly 
students and early career researchers that have not spent 30 
years reading the literature as I have. These papers are highly 
plausible, and they are capable of misleading people purely 
because of that.
    Mr. Obernolte. Interesting.
    Dr. Byrne. Thank you.
    Mr. Obernolte. So a follow-on question, you mentioned peer 
review. And I am astonished that peer review wouldn't help us 
take care of this problem because, ostensibly, if you've got a 
panel that's doing peer review on a paper, those people are 
familiar with, you know, the cutting--the--where the cutting 
edge of the research in that field is. I mean, I remember when 
I was in graduate school. For, you know, one brief shining 
moment, you're supposed to be the world's expert at a certain 
very narrow field, and so one brief shining moment, you know, I 
knew everything there was about public sector budgeting and 
research on various budgeting methodologies because that's what 
my dissertation was on. And at that moment if you had given me 
a paper on something related to that field, I would have been 
able to say, wait, I know there hasn't been any research on 
that at all and this doesn't actually make any sense because 
here are the topics that people are researching. So doesn't 
good--a good peer-review system help us fix that problem?
    Mr. Graf. May I take up the response there?
    Mr. Obernolte. Dr. Graf, go ahead.
    Mr. Graf. If I could try and tell a brief story to explain 
why the--how the paper mills navigate around both editors and 
peer reviewers, that might help. So, last year, I became aware 
of an editor-in-chief of a journal who'd become concerned about 
the volume of content being published and the scope of the 
content being published in a guest-edited issue that he had 
appointed a guest editor for. He emailed the guest-edited 
issue, who he thought was--guest editor who he thought was the 
guest editor for that issue. And the guest editor, she 
responded saying this has got nothing to do with me. So then we 
checked the email addresses and noticed that the guest editor 
was using an email address very close to the email address of 
the legitimate researcher who the editor-in-chief thought he'd 
appointed but not the email address. And so our assumption is 
probably the truth, that a paper mill has--was using identity 
theft and identity fraud to place a fake guest editor in charge 
of this guest-edited issue. That fake guest editor was then 
appointing fake peer reviewers to fake peer review the content 
and passing the content through as if it were legitimate and 
essentially, you know, then triggering the publication of that 
content.
    So paper mills are devious and have worked ways around the 
largely trust-based and professional courtesy-based system that 
has been working quite well for decades, if not hundreds of 
years, and we need to tool up to prevent that better.
    Mr. Obernolte. Right. Well, it's interesting. You just 
raised a--well, perhaps we'll have a second round here. I see 
I'm out of time, and I don't want to abuse the process. But 
it's interesting that, Mr. Graf, you've just raised, you know, 
kind of another follow-on problem, which is the corruption of 
the peer-review process, in the service of these paper mills, 
which, you know, I think is--you know, I put that up, along 
with the other consequences of what we're discussing here. But 
thank you very much. I'll yield back, Mr. Chairman.
    Chairman Foster. Thank you. And I think we will attempt to 
have a second round of questions if time and Members' 
attendance allow.
    We'll now recognize Representative Casten for five minutes.
    Mr. Casten. Thank you so much. This is a fascinating 
conversation. I want to just start--I want to understand from 
Dr. Byrne in your research, do you have any sense--and I 
realized that you've--you're trying to quantify it. Do you have 
any sense of how much of the issue is plagiarism versus fraud?
    Dr. Byrne. That's a very good question. We mostly--we don't 
really study plagiarism in detail. My background as a molecular 
biologist is the kinds of reagents that are used to construct 
experimental results. So we study the reagents that are used 
and whether those reagents are correctly identified. Sometimes 
we do see reagents appearing across model papers when that 
seems highly unlikely, so that indicates that there may be some 
role for plagiarism. But we feel that it is more likely that 
most paper mills are creating papers according to templates, so 
they have a--kind of a basic skeleton of structure and then 
they fill in the gaps in different ways. And they often----
    Mr. Casten. OK. And----
    Dr. Byrne [continuing]. Target topics that are not very 
well understood and so peer reviewers have no knowledge of 
these papers and so can't critique them.
    Mr. Casten. And I ask this because it seems like the the 
tools that we might have to address them are different, you 
know, and there's ways to detect fraud, and there's different 
ways to detect plagiarism. I'm curious for your thoughts--and I 
guess this would apply in either case. There certainly has been 
some talk in the--you know, especially in sort of the bioethics 
community about should we mandate that researchers publish 
negative results since there's no real incentive to do that? 
And all of us who have ever worked in a lab knows that most 
experiments don't get you publishable results. If--and I don't 
know how we do that. But if we were to mandate that--you know, 
that labs or researchers had to publish, you know, all their 
experimental results even if they were negative, would that 
help or hinder this problem?
    Dr. Byrne. Look, I don't know. I think that the publication 
of negative results is very important, but one of the issues 
around publishing negative results is all of these publications 
take time, and I think it will be difficult to incentivize that 
process. I mean, researchers, like everybody, are more 
interested in something that they are intrinsically interested 
in, and sometimes they don't find negative results particularly 
interesting. But I agree that I think we have to find ways of 
removing this incessant focus upon results that must be 
positive, and we need to be teaching our students that a 
negative result is just as important as a positive result. It's 
a result.
    Mr. Casten. Yes, I mean, I wonder sort of, you know, to use 
a bad baseball analogy, if I only knew that a hitter--if I only 
knew the stats on a hitter when they got hits, I might not--you 
know, I wouldn't know the difference between Ted Williams and, 
you know, whoever's the third string on a baseball team. And so 
if I knew a researcher was batting 800, I'd be a little 
skeptical, right? Anyway----
    Dr. Byrne. Yes, that's----
    Mr. Casten. The--this is a very wonky one, and I suppose 
this would only work for falsified data. There's this wonderful 
little numerical trick that, you know, in any sequence of 
numbers that sequential, if you look at the probability of the 
digits, ones are more likely than twos, twos are more likely 
than threes. Is there a way to automate that? You know, because 
I would imagine if I'm--you know, if I'm reporting the number 
of colonies in an agar plate, that's a very hard thing to fake 
and--but it's algorithmically testable. Is that worth time? Are 
people already doing that? Is that just a dumb thing that I 
read about years ago that's not relevant today?
    Dr. Byrne. No, no, it's certainly not a dumb thing. I mean, 
people are certainly looking at that kind of thing, 
particularly clinical trial data, patient data, where it's 
actually very, very difficult to fake random data. So I think 
the answer is that we need different kinds of tools for 
different kinds of data and different kinds of science. And we 
don't have all of those tools. We have some of them now, but we 
don't have them all.
    Mr. Casten. So my last question is sort of deeply 
philosophical, and I'll try to get this off in a minute. It's 
always struck me that whether you're doing basic science, you 
know, at the lab bench or doing research, you know, going 
through the literature review, I always thought Immanuel Kant 
got it right, that, you know, all you can do is prove things 
wrong, you can never totally prove them right. And things are 
more likely to be true when you try to falsify them and fail. 
And I think our human brains are really good at that kind of 
analysis, you know, the--our ability to say, well, if this 
thing is true and the causality arrow points in this direction, 
then that would imply that this other thing is true. Let me see 
if that's the case. And if not, I got some problems over here. 
I don't know how you write algorithms to do that. I think our 
brains are just sort of uniquely set up to do that. And it's 
what the traditional peer review process is really good at.
    Is that even a--is it even algorithmically possible to do 
that sort of Kantian falsifiability? And if so, or if not, is 
there any way to sort of satisfy that a paper that is deemed 
worthy of publication someone has attempted to do that 
falsifiability and failed? Does that make sense? I realized I'm 
getting very philosophy of science there, but does that make 
sense as an approach?
    Dr. Byrne. Look, it makes sense. I don't know if it's 
possible, but that's--I'm not an algorithm person, so, you 
know, perhaps one of the other witnesses could answer.
    Dr. Stell. So yes, I think that this would be possible. 
This sort of thing might be possible. But I think that the more 
algorithms we build, people are going to find ways around these 
algorithms. And so I think that, you know, this is maybe not--
shouldn't be our focus is trying to find every individual 
instance of fraud but changing the incentives so that fraud is 
no longer a winning strategy, that we put the focus on the 
content of the articles and make it so that it's just not 
important to do fraud anymore.
    Mr. Casten. Thanks. And I yield back.
    Chairman Foster. Thank you. I'll now recognize 
Representative Bice for five minutes.
    Mrs. Bice. Thank you, Chairman Foster, and thank you for 
the witnesses for being with us this morning.
    My first question really is to any of the witnesses, and 
that is how long do you think that paper mills have been 
impacting academia? And how many of them do you believe that 
are based in Russia or China?
    Dr. Byrne. I can start. In terms of how long they've been 
operating, I don't think we have a clear answer for that. I 
would estimate that at least since 2008, so possibly for about 
15 years. In terms of the numbers of paper mills, again, if I 
refer to the literature, a paper was written in 2013 that 
estimated the number of of ghostwriters that were operating in 
China at that time in 2011, they estimated nearly 1,800 full-
time-equivalent ghostwriters in China. There's been very little 
research done on this topic, and so I think the answer is--you 
know, the answers today are not clear.
    Mrs. Bice. Thank you. Mr. Graf, did you want to follow up 
with that?
    Mr. Graf. Thank you. I could add a little. I think it's 
true that the strategies and tactics that paper mills have been 
using since data Jenny Byrne cited have changed. In my--the 
information I've studied, the use of AI and algorithms to 
generate text in articles probably began around 2019, and 
that's when you've got a massive change really in the tools 
that paper mills have got available to them. So that talks both 
to sort of how long and also to change in their practices, 
which refers back to what Brandon said, which is the sort of 
arms race thing. It really would be--it would be beneficial to 
get out of the arms race because--and really address the 
incentive system. I think that's probably the right way 
forward.
    Mrs. Bice. How do you--to follow up on that particular 
point, Dr. Stell, how do you--what do you think we should be 
doing to try to address this in a way that still allows for 
researchers to be able to publish papers, but it's not solely 
focused on the data? Where's the fine line there to be able to 
publish a research paper that has value, that can be utilized 
by the community, but yet isn't solely--isn't isn't focused on 
specifically the data that's contained within it?
    Dr. Stell. Thanks for the question. Yes, I think that 
anything we can do to put more focus on the content of articles 
is going to help enormously. And I think the one thing that 
hasn't been done that could be done is rewarding people for 
contributing to a body of commentary. If we have this body of 
commentary, this is going to take the focus off of metrics and 
put it on expert opinion of scientists. And if we can reward 
expert scientists for their commentary, we're going to get more 
participation, and then these evaluation committees are going 
to start using that expert evaluation. And so I think that for 
me the way forward is to just create another body of evaluation 
parallel to metrics, which are going to continue to exist for a 
little while but perhaps could be replaced by more expert 
informative opinion.
    Mrs. Bice. And on Dr. Graf's--I'm sorry, Mr. Graf's point 
and, Dr. Stell, you're welcome to comment as on this, but I 
think that the--one of the concerns I have being a Member of 
the House Armed Services Committee is how do we ensure that 
these research papers aren't being influenced by foreign 
governments in a way that could have a negative impact on 
security? Any thoughts there?
    Dr. Stell. Go ahead, Chris.
    Mr. Graf. No, I think it's a question about misinformation, 
isn't it? And I don't think that the motives behind all of this 
are to promote misinformation. I think they're the simplest of 
motives. Paper mills want to earn money. They earn it when 
researchers give it to them. Researchers want to earn money. 
They earn it when they get a paper published and when they get 
promoted. So I don't know. I don't have evidence to suggest 
there's a conspiracy from foreign agencies going on.
    Mrs. Bice. You don't think that there's sort of 
falsification in a way that American academia would be impacted 
by that falsification?
    Mr. Graf. I don't think that the motives are to impact 
American academia, no. I think the motives are very isolated to 
the individual researcher who's buying this bogus service from 
this--from the paper mill.
    Dr. Stell. Can I jump in?
    Mrs. Bice. I think my time has expired. I yield back.
    Chairman Foster. Well, thank you. And I think we will now 
start a second round of questions, and so I will recognize 
myself for five minutes.
    So I'd like to return to the subject of trying to get the 
financial incentives right. Because, as Mr. Graf pointed out, 
that really drives a lot of this. You know, I've often 
fantasized that if it cost people 50 cents to send me an email 
and then they got the 50 cents back if I saw fit to respond to 
it, that my spam filter would have a lot less work to do. And 
so, first off, has anyone estimated how many man-hours would be 
actually required to adequately peer review this flood of 
papers and what the total cost of that would be and the cost 
per paper?
    Mr. Graf. Not to my knowledge.
    Chairman Foster. Even order of magnitude? OK. If you can 
just reply for the record, I'd be interested in that. Because 
that's--you know, it seems like a daunting number of man-hours, 
and you want to engage the best and brightest in any field 
toward reviewing those papers, and it's a huge burden to put on 
their time and might not be the best use of their time.
    Mr. Graf. I can add a little information, but it's not 
really about peer review. It's about the internal review in an 
investigation of potential paper mill papers that I've been 
conducting with my team. And that review has been ongoing since 
September last year and has taken--I hadn't added up the 
person-hours, but there's been a team of, let's say, five 
people. It's more than five, so maybe it's between five and 10 
people working at least part-time, sometimes full-time, on the 
exercise, and they have spent money on consultants as well. 
So--and that's been focusing on a total kind of universe of 
about--where we looked at about 3,000 papers, so that's not 
looking at the whole world, right? That's only looking at a 
small part of the world. And there you go. That's the sort of 
sketch that I've got for you, the amount of--for you about the 
amount of effort that it's taken post-publication, not with the 
peer-review community but with colleagues internally at a 
publishing company.
    Chairman Foster. Now, would a more significant cost to 
submit a paper for review be an effective partial remedy, or 
would that really place an unacceptable burden on emerging 
researchers? You know, for example, if--you know, when you got 
a grant, five percent of the grant money you could allocate 
toward getting any publications reviewed, and then you'd spend 
that money where you thought it would do the most good. Are 
there incentives like that that could be put into place?
    Mr. Graf. That's an interesting idea.
    Chairman Foster. Has that ever been talked about?
    Mr. Graf. Publicly, there's a couple of different campaigns 
to--led--there's one led by James Heathers to--a gentleman 
called James Heathers to claim payment for peer review. And 
perhaps if--I don't know how much sort of ground that movement 
has been made--has made. I do worry about equity and access to 
publishing services.
    Chairman Foster. Yes, so presumably, this would be based--
you know, the fees will be based on ability to pay based on 
your situation and your country's situation.
    Mr. Graf. Yes.
    Chairman Foster. But--and that--but maybe that's a partial 
solution. You also mentioned identity fraud, and there is a lot 
of progress. In fact, where I'm marking up this week a bill 
that we're pushing forward to secure digital identity. This is 
just providing tools for individuals to prove they are who they 
say they are online and also to attach verifiable credentials 
to that digital identity so you can't fraudulently claim, you 
know, basically, credentials that you haven't earned. And so is 
that--is there--are there things underway in the academic 
community already? It also provides a mechanism for punishing 
people that abuse the system or at least identifying them so 
that they can be dealt with with appropriate suspicion? Are 
there any--anything along those lines underway?
    Dr. Byrne. I think I can just speak to that briefly. So 
there's a system of author identity called ORCID (Open 
Researcher and Contributor ID) that has been running for some 
time. Some journals are now requesting that all authors have 
ORCID identities as a way of combating paper mills. But I think 
paper mills are very adept at getting around these kinds of 
fairly small hurdles that we place in front of them. So there 
is evidence that paper mills then simply take out ORCID 
identifiers for their potentially real or fake authors. So 
that's a major issue, I think. I think that probably also 
pertains to paying fees for submitting manuscripts. Paper mills 
would probably be willing to pay those fees.
    Chairman Foster. OK. But presumably when you claimed an 
academic credential from--and then attempted to attach it to a 
fake identity, at some point, the university whose academic 
credentials are being stolen would blow the whistle on you.
    Anyway, I will--I'm out of time here, and I'll yield to the 
Ranking Member for five minutes.
    Mr. Obernolte. Well, thank you, Chairman Foster.
    I want to talk a little bit more about research integrity 
because I think that that is the most dangerous consequence of 
these false and fraudulent papers floating around. So, you 
know, the presence of misinformation is an issue that we are 
dealing with as a larger society. It's not just academia, and 
it's not just research, certainly no social media. There's lots 
of information out there that's true, there's lots that's not 
true, and there aren't a lot of tools for a user to try and 
figure out what's--how to differentiate between those two.
    But we've got a tool in academic literature that is not 
available to social media, and that is the hierarchy of 
different scientific publications. I mean to call something 
literature ignores the complexity of the fact that scientific 
publishing is not monolithic. And we've got journals that are 
highly respected in their field and then journals that are less 
so.
    So, you know, a question for anyone who would like to 
answer. Does--is there a way we can leverage that hierarchy? 
You know, the fact that there are journals that can serve as 
highly trusted sources of information, can we leverage that to 
help us solve this problem?
    Mr. Graf. If I may start, one of the things that we can do 
is really get behind the transition, or transformation even, 
away from subscription publishing to open-access publishing. 
When we make all of the more trustworthy information that's 
available in journals, including those of the top end of the 
hierarchy, more open, it's there as a counterbalance to the 
misinformation that is freely and openly available on the web. 
So I think there's an argument there for that transformation to 
open.
    Mr. Obernolte. Well, Mr. Graf, I agree that that would be 
great, but, I mean, there are monetization problems with doing 
that. How would you solve those? How would you alter the 
monetary incentives that empower this current subscription 
model?
    Mr. Graf. That's a whole query of its own. But yes, we're 
working on that as--across the research publishing sector, and 
we're, you know, intent on making the transformation--moving 
the money that is currently being spent on subscriptions into a 
way to enable the--those journals to then be open. So it's--
yes, it's complicated. And one type of openness won't suit all 
research disciplines and all journals or regions on the planet, 
but that's our goal.
    Dr. Stell. One thing I would just like to add to that is 
that there have been studies looking at the impact factor of a 
journal and the number of errors that are found in the journal. 
And the higher-ranking journals are not necessarily immune to 
the problem. So I think if we start relying on these tiers to 
tell us which is reliable information, it's not as accurate as 
we would hope it would be.
    Mr. Obernolte. Interesting. I wouldn't have guessed.
    Dr. Stell, while I've got you, one of the things that I was 
fascinated by is in the previous round of questioning you were 
talking about how some papers, fraudulent papers had been 
identified on PubPeer. And I'm wondering, you know, as we 
grapple with this issue of eliminating the commercial and 
academic incentives for publishing fraudulent papers, what were 
the consequences for the authors of the papers that your 
website identified? Did they have consequences?
    Dr. Stell. There are examples of absolutely zero 
consequences, but there are also examples of people being fired 
from their positions for having been caught cheating and 
exposed on PubPeer. So the consequences range.
    Mr. Obernolte. So when people--when there were 
consequences, what triggered the consequence? So if someone on 
PubPeer looks at a paper and says this was generated by a paper 
mill, here's my proof, here are the, you know, inconsistencies. 
And so, you know, the PubPeer community agrees this paper is 
fraudulent, does someone reach out to the employer of the 
person, the author of this fraudulent paper and say that, you 
know, if they cited this paper in their resume, then they were 
hired under fraudulent, you know, circumstances? I mean, did 
someone take that affirmative action?
    Dr. Stell. That's a very good question. We're not part of 
that process. What we do is we provide a platform for people to 
discuss these things and make it public. And I have to say that 
paper mills are a real minority of the discussions that happen 
on PubPeer. But the fraudulent work is usually some sort of 
image that has been copied, some data that has been 
misrepresented. It's been exposed on PubPeer. That is public 
for everyone to view, including the people's employers and 
other committees. And so it's--presumably, it's taken up by 
those people, and they're taking actions. It's not--we're not 
part of anything other than making that information public.
    Mr. Obernolte. Right. Well, I see I'm out of time. I want 
to thank everyone for the, you know, really interesting and 
consequential hearing here. But let me reiterate my conviction 
that when we're talking about the spread of disinformation, and 
academic literature is no exception, I think that trying to 
focus on eliminating all fraudulent papers is a fool's errand 
because I think they're going to be out there, so I think the 
better solution is to try and create trusted sources of 
information where some peer-reviewed vetting takes place and 
people can have a higher degree of trust in. And, you know, to 
your point, it's clear that more work needs to be done there. 
But, you know, certainly I think that's where the solutions are 
going to be found.
    Anyway, thank you, everyone, for the discussion. It's been 
really interesting. I yield back.
    Chairman Foster. Thank you. And I just want to second the 
Ranking Member's comments about the importance of trusted 
sources. You know, it used to be--I--at least, I felt when I 
was in my career in science, if you stood up and said something 
that you know was not true, that that was a career-ending 
thing. And now, you know, as I moved from science to politics, 
it seems like you stand up and say something that's not true, 
it seems to only increase your chances of reelection. And so we 
have to do a better job, and science should continue to lead by 
example of taking a very hard line when outright fraud is 
detected. And I was encouraged to hear that at least some 
universities are--you know, you're losing tenure and you're out 
of here if you participate in any of this because that--it's 
important. Science always operates at the edge of what is 
known, and we cannot tolerate deliberate lying when we're 
trying to flesh out the details of nature's complexity.
    Now, before we bring this hearing to a close, I wanted to 
thank again our witnesses for testifying. And the record will 
remain open for two weeks for additional statements from the 
Members and for any additional questions the Committee may ask 
of our witnesses.
    So the witnesses are now excused, and the hearing is now 
adjourned.
    [Whereupon, at 11:07 a.m., the Subcommittee was adjourned.]

                               Appendix I

                              ----------                              


                   Answers to Post-Hearing Questions




                   Answers to Post-Hearing Questions
Responses by Dr. Jennifer Byrne

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]


Responses by Mr. Chris Graf

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]


Responses by Dr. Brandon Stell

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]




                              Appendix II

                              ----------                              


                   Additional Material for the Record




           Documents submitted by Representative Bill Foster
           
 [GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]