Happy International Workers’ Day! » « I never cared much for Nate Silver

Social media 1, ChatGPT 0

Way back in February, I made a harsh comment about ChatGPT on Mastodon.

I teach my writing class today. I’m supposed to talk about ChatGPT. Here’s what I will say.
NEVER USE CHATGPT. YOU ARE HERE TO LEARN HOW TO WRITE ABOUT SCIENCE, YOU WILL NOT ACCOMPLISH THAT BY USING A GODDAMNED CRUTCH THAT WILL JUST MAKE SHIT UP TO FILL THE SPACE. WRITE. WRITE WITH YOUR BRAIN AND YOUR HANDS. DON’T ASK A DUMB CYBERMONKEY TO DO IT FOR YOU.
I have strong opinions on this matter.

Nothing has changed. I still feel that way. Especially in a class that’s supposed to instruct students in writing science papers, ChatGPT is a distraction. I’m not there to help students learn how to write prompts for an AI.

But then some people just noticed my tirade here in April, and I got some belated rebuttals. Here, for instance, kjetiljd defends ChatGPT.

Wow, intense feelings. Have you ever written something, crafted a proper prompt to ask ChatGPT-4 to critique your text? Or asked it to come up with counter-arguments to your point of view? Or asked it to analyze a text in terms of eg. thesis/antithesis/synthesis? Or suggest improvements in readability? You know … done … (semi-)scientific … experiments with it? With carefully crafted prompts my hypothesis is that it can be used to improve both writing and thinking…

Maybe? The flaw in that argument is that ChatGPT will happily make stuff up, so the foundation of its output is on shaky ground. So I said I preferred good sources. I didn’t mention that part of this class was teaching students how to do research using the scientific literature, which makes ChatGPT a cheat to get around learning how to use a library.

I prefer to look up counter-arguments in the scientific literature, rather than consulting a demonstrable bullshit artist, no matter how much it is dressed up in technology.

kjetiljd’s reply is to tell me I should change the focus of my class to be about how to use large language models.

And if I were a student I would probably prefer advice on the use of LLMs from a scientific writing teacher who seemed to have some experience in the field, or at least seemed to … how should I say this … have looked up counter-arguments from the scientific literature …?

I guess I’m just ignorant then. Unfortunately, this class is taught by a group of faculty here, and I had a pile of sources about using ChatGPT as a writing aid, that were included in course’s Canvas page. I didn’t find them convincing.

Sure, I’ve looked at the counter-arguments. They all seem rather self-serving, or more commonly, non-existent.

So kjetiljd hands me some more sources. Ugh.

Here are a few more or less random papers on the topic – they exist, are they all self-serving? https://www.semanticscholar.org/paper/ChatGPT-4-and-Human-Researchers-Are-Equal-in-A-Sikander-Baker/66dcd18c0f48a14815edca1d715fa8be8909cca6 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10164801/ https://www.semanticscholar.org/paper/Chat

I read the first one, and was unimpressed. They trained ChatGPT on a small set of review articles and then asked it to write a similar review, and then had some people judge on whether it was similar in content and style. Is ChatGPT a dumb cybermonkey? This article says yes.

I was about done at this point, so I just snidely pointed out that scientists scorn papers written by AIs.

Don’t get caught!

https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/

I was done, but others weren’t. Chaucerburnt analyzed the three articles kjetiljd suggested. They did not fare well.

The first paper describes a trial where researchers took 18 recent human-written articles, got GPT-4 to write alternate introductions to them, and then got eight reviewers to read and rate these introductions.

Some obvious points:

– 18 pairs of articles is not a lot. With only a small number of trials, there’s a significant risk that an inferior method will win a “best of 18” over a superior method by pure luck.
– 8 reviewers, likewise, is not a very large number. Important here is that the reviewers were recruited “by convenience sampling in our research network” – that is, not a random sample, but people who were already contacts of the authors. This risks getting a biased set of reviewers whose preferences are likely to coincide with the researchers’.
– The samples were reviewed on dimensions of “publishability” (roughly, whether the findings reported are important and novel), “readability”, and “content quality” (here apparently meaning whether they had too much detail, not enough, or just right.)

What’s missing here?

None of the assessment criteria have anything to do with *accuracy*. There’s no fact-checking to evaluate whether the introduction has any connection to reality.

Under the criteria used here, GPT could probably get excellent “publishability” scores by claiming to have a cure for cancer. It could improve “readability” by replacing complex truths with over-simple falsehoods.

And it could improve “content quality” by inventing false details or deleting important true ones in order to get just the right amount of detail, since apparently “quality” doesn’t depend on whether the details are *true*, only on how many there are.

The reviewers weren’t even asked to read the rest of the article and evaluate whether the introduction accurately represented the content.

I daresay the human authors could’ve scored a lot higher on these metrics if they weren’t constrained by the expectation that their content should be truthful – something which this comparison doesn’t reward.

They also note “We removed references from the original articles as GPT-4’s output does not automatically include references, and also since this was beyond the scope of this study.” Because, again, truthfulness is not part of the assessment here.

(FWIW, when I tried similar experiments with an earlier version of GPT, I found it was very happy to include references – I merely had to put something like “including references” in the prompt. The problem was that these references were almost invariably false, citing papers that never existed or which didn’t say what GPT claimed they said.)

I concur, and that was my impression, too. The AI written version was not assessed for originality or accuracy, but only on superficial criteria of plausibility. AI is very good at generating plausible sounding text.

Chaucerburnt went on to look over the other two articles, which I hadn’t bothered to read.

The second article linked – which feels very much like it was itself written by GPT – makes a great many assertions about the ways in which GPT “can help” scientists in writing papers, but is very light on evidence to support that it’s good at these things, or that the time it saves in some areas is greater than the time required to fact-check.

It acknowledges plagiarism as a risk, and then offers suggestions on how to mitigate this: “When using AI-generated text, scientists should properly attribute any sources used in the text. This includes properly citing any direct quotations or paraphrased information”… – this seems more like general advice for human authors than relevant to AI-generated text, where the big problem is *not knowing* when the LLM is quoting/paraphrasing somebody else’s work.

It promotes the use of AI to improve grammar and structure – but the article itself has major structural issues. For instance, it has a subsection on “the risk of plagiarism” followed by “how to avoid the risk of plagiarism”.

But most of the content in “the risk of plagiarism” is in fact stuff that belongs in the “how to avoid” section.

Some of it is repeated between sections – e.g. each of those sections has a paragraph advising authors to use plagiarism-detection software, and another on citing sources.

On the grammatical side, it has a bunch of errors, e.g.:

“AI tools like ChatGPT is capable of…”

“The risk of plagiarism when use AI to write review articles”

“Use ChatGPT to write review article need human oversight”

“Conclusion remarks”

“Are you tired of being criticized by the reviewers and editors on your English writings for not using the standard English, and suggest you to ask a native English speaker to help proofreading or even use the service from a professional English editor?”

(Later on, it contradicts that by noting that “AI-generated text usually requires further editing and formatting…Human oversight is necessary to ensure that the final product meets the necessary requirements and standards.”)

If that paper is indeed written by GPT, it’s a good example of why not to use GPT to write papers.

The third aricle gets the same treatment.

The last of the three papers you linked is a review of other people’s publications about ChatGPT. It’s more of a summary of what other people are saying for and against GPT’s use than an assessment of which of these perspectives are well-informed.

(Of 60 documents included in the study, only 4 are categorised as “research articles”. The most common categories are non-peer-reviewed preprints and editorials/letters to the editor.)

It does note that 58 out of 60 documents expressed concerns about GPT, and states that despite its perceived benefits, “the embrace of this AI chatbot should be conducted with extreme caution considering its
potential limitations.”

Not exactly an enthusiastic recommendation for GPT adoption.

Going a step further, Chaucerburnt reassures me that my role in the class is unchallenged.

I’ve seen people use AI for critique, and my impression is that it does more harm than good there.

If a human reviewer tells me that my sentences are too long and complex, there’s a very high probability that they’re saying this because it’s true, at least for them.

If an AI “reviewer” tells me that my sentences are too long and complex, it’s saying it because this is something it’s seen people say in response to critique requests and it’s trying to sound like a human would. Is it actually true, even at the level that a human reviewer’s subjective opinion is true? No way to know.

Beyond that, a lot of it comes down to Barnum statements: https://medium.com/@herbert.roitblat/this-way-to-the-egress-barnum-effect-or-language-understanding-in-gpt-type-models-597c27094f35

Many authors can benefit from generic advice like “consider your target audience”, but we don’t need to waste CPU cycles to give them that.

This term I had a couple of student papers here at the end that would not have benefited from ChatGPT at all. Once a student gets on a roll, you’ll sometimes get sections that go on at length — they’re trying to summarize a concept, and the answer is to keep writing until every possible angle is covered. The role of the editor is to say, “Enough. Cut, cut, cut — try to be more succinct!” I’ve got one term paper that is an ugly mess at 30 pages, but has good content that would make it an “A” paper at 20 pages. ChatGPT doesn’t do that. It can’t do that, because its mission is to generate glurge that mimics other papers, and there’s nobody behind it that understands the content.

Anyway, sometimes social media comes through and you get a bunch of humans writing interesting stuff on both sides of an argument. I’d hate to see how ugly social media could get if AIs were chatting, instead.

Happy International Workers’ Day! » « I never cared much for Nate Silver

Comments

Kagehi says

1 May 2024 at 7:44 am

Only comments I am going to make on the subject is that a) its plausible (there is that word again) that this strain of chat systems could make for a decent front end for real AI, which can self check, and self reference, but they are literally designed to do neither, so lack even the self awareness of a house fly at the moment.(making them at best idiot savants whose specialty is churning out data, without validation of the data), 2) some people don’t do much better, in specific contexts, such as the clown who lamented the waste of water needed to “cool” servers running ChatGPT systems, by rambling about image generation programs – which can literally be run on your home freaking computer, and do not use ChatGPT. Point being – if someone is going to whine and complain about technology, its misuse, and the waste of resources put into using it, they better bloody well have an argument that makes more sense than something that ChatGPT itself would hallucinate.
gijoel says

1 May 2024 at 8:24 am

People seem to forget that the purposes of these assignments is to learn.Getting an AI to do it for you, means you don’t learn a damn thing.
birgerjohansson says

1 May 2024 at 8:36 am

45 years ago I sat down to write a modest paper for school about the history of science fiction. It just swelled and swelled. I realised then how hard it is to be succint about a topic you love. ChatGPT would have been useless.
billseymour says

1 May 2024 at 9:34 am

I once tried to write an introduction to C++ for Java coders.  I thought it would be quick and easy because the target audience would already know what a class is and be familiar with the statement syntax that both C++ and Java inherit from Algol by way of C.

No such luck.  Even writing in a less formal more conversational style, the paper still had to be true and as nearly unambiguous as is possible in a natural language.  It was easy enough to say, for example,

You know how, in Java, things of class type have pointer semantics and things of primitive type have value semantics and never the twain shall meet?  Well, in C++, the twains shall meet because they’re wunning on the same twack.  Everything in C++ fundamentally has value semantics; and you can achieve pointer and reference semantics, for everything, you just have to do it explicitly.

But then I had to explain the difference between pointers and references, the latter of which Java lacks; and I had to explain how to avoid the inefficiency of passing around objects of class type by value.

Even something as simple as

“: public” is what Java means by “extends”.

is complicated by the need to explain that it’s “public inheritance” that’s the O-O IS-A, and that C++ has the concept of “private inheritance” which is something else entirely.

And imagine explaining the difference between C++ templates and Java generics.  Or the preprocessor that C++ inherits from C.

I’ve already gone way too long in this comment, so I guess you get the idea. 8-)

I can’t imagine ChatGPT writing anything that wouldn’t lead readers seriously astray.
Siggy says

1 May 2024 at 9:36 am

While I’m open to the idea of using AI tools as a writing aid, that study is bafflingly bad. Write an introduction to a scholarly paper, but don’t include references, don’t assess accuracy, and make no effort to match with the rest of the paper? That’s not an introduction, that’s an indictment of the authors’ understanding of what an introduction requires.

And it’s obvious why they wouldn’t assess references, or accuracy, or matching to the wider context, because those are precisely the things that current LLMs are bad at. Even in the spirit of teaching students how to use new technological tools, this is bad advice, because it completely fails to highlight LLMs’ weaknesses, and does not demonstrate any practical means of compensating for them.
Raging Bee says

1 May 2024 at 10:41 am

Wow, intense feelings. Have you ever written something, crafted a proper prompt to ask ChatGPT-4 to critique your text? Or asked it to come up with counter-arguments to your point of view? Or asked it to analyze a text in terms of eg. thesis/antithesis/synthesis? Or suggest improvements in readability?

Wow, what a smug condescending twit. Those first few sentences alone should get them dismissed to the Smugasso Sea where they belong.
Scott Petrovits says

1 May 2024 at 12:02 pm

@#6 – Yeah, almost every die-hard “AI” proponent I’ve read or talked to sounds like this. If you don’t like “AI”, you’re using it wrong, you silly person. Like Jordan Peterson fans, all you have to do is read more, get more into it, steep yourself in the bad thing, and it won’t be bad anymore. It’s kinda hard to watch.
Dunc says

1 May 2024 at 12:21 pm

@ #6 & #7: See also coiners, and other assorted crypto types.
anthrosciguy says

1 May 2024 at 1:26 pm

It seems to me that basic grammar should be one of the most easy to accomplish tasks AI could and should be able to do. That it fails at this is a very bad sign.
rockwhisperer says

1 May 2024 at 2:50 pm

I haven’t used ChatGPT, and don’t intend to at this moment. It isn’t going to make me a better writer, either of scientific papers or fiction (I write both). For my MS degree, I happened to have a master writer as a thesis adviser, and benefited enormously from his coaching. Learning to write well was itself worth the degree.

Maybe I’m weird, but when I want to convey information, I don’t want some[one,thing] else to say it for me.
outis says

1 May 2024 at 4:35 pm

Many good comments here, but apart from all that has been said, even if one were to use the damn things, then the necessity would arise to check everything that was written with microscopic care. And that negates the whole “AI will make us more productive” argument.
I mean, should I use one of those dinguses to write up a technical procedure, so much time would be needed to check that all details match that I might just write the damn thing myself. The last thing I want to see is those electronic idiots hallucinating up wrong reaction times and equipment! And if I need to “train” the things just to write one procedure, this again negates the need.
I’d venture to say that LLMs are not ready for use in the science and tech fields, not by a long way.
beholder says

1 May 2024 at 5:40 pm

@6, 7, 8

Wow, intense feelings. Have you ever written something, crafted a proper prompt to ask ChatGPT-4 to critique your text? Or asked it to come up with counter-arguments to your point of view? Or asked it to analyze a text in terms of eg. thesis/antithesis/synthesis? Or suggest improvements in readability?

He’s not wrong. LLMs can be a useful tool. They increase access and they help some people communicate ideas they otherwise could not. Useful tools can always be used badly, of course, but that’s not a mark against them.

I’m more worried about using a service someone else controls, which they can pull the plug on anytime and with the inherent, intractable security issues of trusting a model someone else trained. It’s problematic to suggest using those proprietary services as a disability aid or a learning tool for those reasons. But if you have the resources to train a LLM yourself? Go for it.

Opposing the use of an entire technology on principle will only make you seem more and more ridiculous as time goes on.

Pharyngula

Evolution, development, and random biological ejaculations from a godless liberal

No ghosts in the brain

How the security thugs operate

Thoughts and plans for future book reviews

Real Science Hurts Brains: Crow Edition

Self-Sustainability Tangent – Part 11 – Fields

The Probability Broach: Search and seizure

Social media 1, ChatGPT 0

Comments

Share this:

Comments