In 2022, the year ChatGPT was launched, “artificial intelligence” was Fundeu’s term of the year. Attention, because in 2023 the word of the year was “polarisation”. Well, a sort of amalgam of the two is what we have observed in the reactions – some angry, some celebratory – to the judgement issued by the New York District Court on 7 November 2024 in Raw Story Media, Inc. and Alternet Media, Inc. v. OpenAI.
This judgement ruled on a lawsuit brought by two entities involved in the exploitation of journalistic content against OpenAI, which, according to the plaintiffs, had infringed their copyrights by training ChatGPT’s models with the plaintiffs’ protected works and performances.
The judge in the case, Colleen McMahon, ruled in favour of OpenAI, leaving some statements that have been widely reported in the media, such as that the likelihood of ChatGPT generating content that plagiarises the content with which it has been trained is “remote”:
“(…) given the quantity of information contained in the repository – it is previously alluded to that the plaintiffs accuse ChatGPT of scraping most of the internet – the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs’ articles seems remote.”
Some have wanted to see this statement and the dismissal of the lawsuit itself as a general victory for AI development companies over the holders of the rights of the works with which their training is carried out. Now we have the precedent endorsement we need! Fair use for all!
However, for a correct understanding of the scope of this ruling – which, as we have already said, is not as decisive as it seems – we must go deeper into the context in which Judge McMahon’s statements were made and provide all the nuances about the court case and the ruling being discussed. We will try to offer a few key points below.
First of all, it should be borne in mind that it is a dismissal on procedural grounds, in this case – very briefly – because of a lack of standing to claim damages and seek injunctive relief, taking into account the type of alleged infringement. In common law, this institute is known as motion to dismiss, and although it is true that it determines an initial dismissal of the claim, there is nothing to prevent the plaintiff from being able – in the press, he has announced that he will do so – to re-file the claim after having remedied the defects. In that case, the merits of the case would be retried, which would require a review of the case in light of the new documents and approach provided.
Moreover, we must put into context this assessment of the likelihood of ChatGPT plagiarising content as remote – which, from a first reading, would appear to be the main reason for the dismissal and where it has been suggested that this statement can be extrapolated to the ongoing lawsuits against large AI developers. Well, these statements were made in a very specific context: whether it was appropriate to grant injunctive relief to cease the use of the plaintiffs‘ content, given that the premise for this was that the likelihood of infringement , which was limited to the plaintiffs’ content being reproduced, verbatim or almost verbatim, as output by the tool, materialised. It was that likelihood and no other that was the subject of the analysis and assessment.. In that context, the Court found that, although the plaintiffs provided statistics from third parties indicating that an earlier version of ChatGPT did generate responses with significant amounts of plagiarised content, that substantial likelihood had not been established in relation to the current version of ChatGPT.
In the same vein, the statements on whether or not it has been possible to prove harm to the plaintiffs must be assessed: if the harm, according to their strategy, only derives from the verbatim or almost verbatim reproduction of their protected contents and it is understood, in a first assessment, that the risk of this happening is not high, the possibilities of proving real harm are substantially reduced. This, of course, does not mean – as some hasty headlines seemed to imply – that the judgment generally denies that right holders may suffer harm due to the use of their protected works in AI systems.
In general, the main problem that the plaintiffs seem to have encountered was precisely their procedural strategy. On the one hand, they brought the claim on the basis that the infringing conduct could only materialise through the verbatim reproduction of content as output – and not, for example, through the use itself of the content for training purposes. On the other hand, they based their strategy not on copyright law, but on the infringement of the Digital Millennium Copyright Act, a regulation that serves to protect the proper functioning of electronic markets, but not so much to safeguard the property rights of the agents or individuals who operate in them. The plaintiffs consider that this law has been infringed by the removal by OpenAI of the security measures – CMI or Copyright Management Information – that its content had, which in itself constitutes an infringement. However, this is an infringement of a different type from those based on infringement of intellectual property rights. This approach, which narrows the scope of the lawsuit considerably, has made it difficult for them to obtain the legal remedies they were seeking.
In conclusion, without detracting from the interest or relevance of the court decision, the truth is that this is not going to be the flag that will allow, once and for all, AI system developers in the US to use protected content for training their models without infringing rights. The reflections it makes – and which, as we have said, could even change if the amended lawsuit is brought – are framed in the assessment of very specific aspects that do not admit the generalisation that was attempted to be made of them.
Sometimes, despite ourselves, we must accept that reality – in this case, embodied in the small print of a judgement – can spoil a great headline.
Violeta Arnaiz. Director of Technological Intellectual Property of the Technological Innovation Consultancy Area at PONS IP.