When OpenAI’s ChatGPT burst onto the scene in November 2022, it also created a bit of distrust between writers and readers — so much so, that the International Conference on Machine Learning (ICML) recently banned the tool’s use in papers submitted to the conference.
ChatGPT is a chatbot that uses OpenAI’s large language model (LLM) GPT-3.5 to generate long strings of text that resemble content produced by human writers. In its most recent call for submissions (which opens Monday, Jan. 9), the ICML included a note in its “Ethics” section prohibiting the use of text generated by ChatGPT and other LLMs, unless “presented as part of the paper’s experiential analysis.”
As it stands now though, the conference’s LLM policy amounts to little more than a sternly worded letter to potential contributors, let alone a truly enforceable policy.
“As many, including ourselves, have noticed, LLMs released in the past few months, such as OpenAI’s ChatGPT, are now able to produce text snippets that are often difficult to distinguish from human-written text,” the program chairs wrote in a post further clarifying its LLM policy. “Such rapid progress often comes with unanticipated consequences as well as unanswered questions.”
The program chairs admit that the policy will essentially be enforced on an honor system, giving submissions the benefit of the doubt in most cases. It’s not easy to detect whether or not a string of text was generated by an LLM — some tools like OpenAI’s GPT-2 Output Detector and Edward Tian’s GPTZero attempt to identify LLM-generated text, but they’re not exactly foolproof.
I spent New Years building GPTZero — an app that can quickly and efficiently detect whether an essay is ChatGPT or human written
— Edward Tian (@edward_the6) January 3, 2023
In an examination of GPTZero, MultiLingual found that the tool suggested a handful of human-written texts, from sources such as simple encyclopedia entries and web copy, could have been generated by an LLM.
While OpenAI’s tool was more reliable in discerning human-written text from LLM-generated text, it remains unclear how the ICML’s chairs could determine whether or not a text warrants further scrutiny through one of these tools. In October, MultiLingual reposted an article generated entirely by GPT-3; while some readers on Linkedin found the article poorly written, it’s still hard to say what exactly — if anything — differentiates it from text written by an unskilled human writer.
Indeed, the ICML noted in its explanation that it would not run automated or semi-automated tests to identify LLM-generated text snippets.
“As we learn more about consequences and impacts of LLMs in academic publication, and as we redesign the LLM policy in future conferences (after ICML 2023), we will consider different options and technologies to implement and enforce the latest LLM policy in future iterations,” the ICML’s statement reads.