GenAI is by its very nature a copyright despoiler, plagiarist

The Yuan requests your support! Our content will now be available free of charge for all registered subscribers, consistent with our mission to make AI a human commons accessible to all. We are therefore requesting donations from our readers so we may continue bringing you insightful reportage of this awesome technology that is sweeping the world. Donate now

By Nigel Morris-Cotterill | Jan 30, 2024

Image courtesy of and under license from Shutterstock.com

OpenAI recently informed the UK government its ChatGPT cannot function if it observes copyright laws. Telling ChatGPT and other GenAI to respect IP is almost like telling a cat not to chase pigeons, argues financial crime risk and compliance specialist Nigel Morris-Cotterill.

LOCATION: No one knows, and very few can find out.

OpenAI is a commercial software developer - not open source, and definitely not free. The company’s primary product, ChatGPT, is a search engine with many ‘bots’ that ‘scrape’ the world wide web for material which it stores and uses to build a database and an ‘analytical model’ - which it uses to respond to queries and then generate (apparently) original text.

Some say the original (scraped) data is deleted after the model is built. This is a dubious claim - every skeptical cell in one’s being should scream, ‘That can’t be right!’ This is because the cost of reacquiring that data would be far too great - and the data would have to be reacquired if the next generation of the model is to have a full dataset as its foundation.

OpenAI recently made a written submission to the United Kingdom’s House of Lords Communications and Digital Select Committee that said the following: “Because copyright today covers virtually every sort of human expression - including blogposts, photographs, forum posts, scraps of software code, and government documents - it would be impossible to train today’s leading artificial intelligence [AI] models without using copyrighted materials.”

In other words, unless ChatGPT collects, stores, and processes the intellectual property generated by individuals and companies, it cannot produce derivative works. The ‘generative AI community’ does not like it, however, when this is spelled out so clearly. They claim that “it is not derivative.”

OpenAI had more to say on the subject, too: “Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

That seems to imply the needs of a nebulous and undefined group it describes as “citizens” are actually needs - which, according t

The content herein is subject to copyright by The Yuan. All rights reserved. The content of the services is owned or licensed to The Yuan. Such content from The Yuan may be shared and reprinted but must clearly identify The Yuan as its original source. Content from a third-party copyright holder identified in the copyright notice contained in such third party’s content appearing in The Yuan must likewise be clearly labeled as such.

GET STARTED

- or -