Monday, December 26, 2022

Scientific failure rate of up to 92% for ChatGPT in botanical essay on Symbolanthus (ring-gentians)

A lot has been written recently about the AI-written essays and how hard it is to tell them from ones written by humans.  The construction and grammar of the texts can be elaborate, error-free, and include a variety of details that make it hard to discern if it was a human or AI bot that wrote it.  

But, there is one thing that the AI bots seem really bad at, and that is evaluating if the information they include in their AI written essays are actually correct, especially if you ask it to write about a more obscure subject.  

To make an evaluation of the factual content of a AI-generated essay, not just the writing style and grammar, you have to ask the ChatGPT (or other AI 'writer') to produce text about something you know well and can evaluate when it comes to content and facts.  So I did.

About Symbolanthus: I am a botanist at a large research university, and I am the world expert on the ring gentians, Symbolanthus.  There are 38 species, but not all of these are described yet, but at least 25 species are present online in various floras, databases, iNaturalist, and such.  It is a member of the plant family Gentianaceae, and it found in the wet, tropical parts of South and Central America from Bolivia to Costa Rica along the Andean mountain chain, and also in the Guayana Highlands of Brazil, Venezuela, and Guyana, plus a few Caribbean islands.  The flowers are gorgeous, large and pretty in pink, magenta, light green to white, often with stripes on the inside. They are shrubs, sometimes small in height.  

image of Symbolanthus macranthus from Ecuador, large pink flower and glossy green leaves.

The pilot test: I wanted to test ChatGPT to check if a student writing an essay from my botany classes could use it successfully. In my classes I primarily grade based on factual content, not on perfect grammar.   So, on Dec 9, 2022, I asked ChatGPT to write an essay about Symbolanthus,  I did this ten times, by repeatedly requesting it to "Write an essay on Symbolanthus". 

Fact analysis: I saved all ten short essays (see pdf here).  Then I categorized the type of information ChatGPT had included in the essay into ten information categories in a table in Excel (available upon request), tabulated all information from the essays into the right category for each essay and calculated how many information categories it had gotten majorly wrong in each essay. Not all categories were mentioned in each analysis, and those were not included in the count. 

Results (see table pdf here for details): 

For the ten categories, ChatGPT produced erroneous, wrong facts in 67-92% of scientific information categories.  Those are categories of information.  The number of misstated facts in each essay amounted to up to 50 errors per essay.

  • ChatGPT never classified Symbolanthus into its right family, the Gentianaceae, instead it said Asparagaceae, Asteraceae (6 times), Acanthaceae, or Melastomataceae. 
  • ChatGPT never correctly described any of Symbolanthus' morphological features, such as leaves, flowers, seeds, or fruits. 
  • ChatGPT often said it is grown in gardens due to its beautiful flowers, used as a cut flower, used in flower arrangments, etc..  It is not used in horticulture. 
  • ChatGPT included various ethnobotanical and herbal medicine uses.  There is no record that I am aware of with Symbolanthus having any medicinal uses. 
  • ChatGPT invented species it says belongs to Symbolanthus that do not exist (Symbolanthus tatei, for example)
  • ChatGPT said sunflower (Helianthus annuus) is a member of Symbolanthus.
  • ChatGPT invented a new ecological adaptation - raising air temperatures increases the color of the flower. Very interesting. But fake.

The result is that every essay on Symbolanthus written by ChatGPT is a blend of scientific facts from any plant in the world, and contains nearly nothing scientifically correct.  The result becomes an absurd hodge-podge, 'blender facts'.  These are not minor errors, in fact, these are an abundance, a tornado, of major errors.  

You cannot reshuffle information as part of AI writing, the facts need to stay stable.  Unless you don't care about reality, then facts don't matter.

If you don't know the topic, however, it all looks completely reasonable in these essays because the facts seems solid, but they are not.  

Now, if you use ChatGPT to write an essay or gather information about any other subject, how much would you trust it?  

(This post was written fast, and any grammar errors and typos are my own.)