According to a new report from 404 Media Tuesday, Automattic, the parent company of websites like WordPress and Tumblr, is in talks to sell material from its platforms to AI businesses like MidJourney and OpenAI for training reasons. Automattic is also attempting to reassure users that they can opt-out at any time, even if the specifics of the agreement are still unclear.
404 records Automattic is facing internal disagreement because private content that was not meant for the company to save was among the data that was being scraped for AI companies. Further complicating matters, it has also been revealed that advertisements from an earlier Apple Music campaign and other non-Automatic commercial items had found their way into the training data set.
According to 404, the concepts at Automattic have generated so much internal debate that a product manager has begun removing his personal images from Tumblr to ensure AI isn’t being trained with them.
Since OpenAI initially released ChatGPT in late 2022, generative AI has grown significantly in popularity. Text-prompt image creators from several startups quickly followed. Through massive volumes of data, the system is “trained” to produce apparently original texts, photos, and videos. Yet, significant publishers have voiced their displeasure, with some even launching lawsuits claiming that a large portion of the data used to train these systems was either stole or did not meet the requirements of “fair use” under the current copyright laws.
According to 404 Media, Automattic aims to provide a new setting as soon as Wednesday that will allow users to choose not to train AI systems. It’s unclear, though, if most users will have this setting turned on by default. A similar option to refuse to allow your data to be used for AI training was introduced by Squarespace, a competitor of WordPress, last year.
Tuesday, in response to inquiries via email, Automattic pointed Gizmodo toward a recent post that essentially confirmed the reporting of 404 Media, all the while seeking to position the change as a chance to “give you more control over the content you’ve created.”
“Almost every part of our life is changing quickly due to artificial intelligence, including how we produce and consume media. We at Automattic have always supported individual freedom and a free and open internet. We’re actively monitoring these developments, like other digital businesses, and we know how to collaborate with AI companies while honoring our users’ wishes,” the blog post says.
However, the long statement comes across as extremely defensive, claiming that users have the choice to decide whether or not they want their content used for AI training and that “no law exists that requires crawlers to follow these preferences.” The company is simply sticking to industry best practices.
“We want to give you tools that give you as much power as possible, no matter where you live. According to Automattic, these settings are the most effective way to control how material is crawled on the internet because reputable businesses do adhere by them.
“All opt-out options will be respected by our partnerships. We also intend to go a step further and notify all partners on a regular basis about individuals who have recently opted out, requesting that their information be eliminated from previous sources and any upcoming training.