Apple launched a new tool that allows website owners to opt out of Apple’s AI training. This resulted in many high-traffic companies using it and not participating in it.
To train Artificial Intelligence Siri the crawler and scrape web content are needed. However, using web content or other digital information to train AI models without the owner’s permission is unethical, because these types of content are human-generated and copyrighted.
For several years, Apple has been using Applebot to train Siri and Spotlight. Three months ago, Apple developed the Applebot-Extended tag for publishers allowing them to opt out of Apple Intelligence training.
Moreover, Apple also declared that the team tends to train Siri honorably. “We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control […]”
Apple continued, “We apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet.”
According to a Wired report, “WIRED can confirm that Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIRED’s parent company, Condé Nast, are among the many organizations opting to exclude their data from Apple’s AI training […]”
“In a separate analysis conducted this week, data journalist Ben Welsh found that just over a quarter of the news websites he surveyed (294 of 1,167 primarily English-language, US-based publications) are blocking Applebot-Extended.”
Apart from this, Apple also allures prominent financial deals and pays prices to utilize some website data to instruct AI-generative Siri. Hence, this is a truly impactful way to influence such companies.