Deal or No Deal, Publishers Are Pushing Back Over AI Platform’s Content Use

Publishers are furious some AI platforms are using their content to train their models without their permission.
"The AI Economy," a newsletter exploring AI's impact on business, work, society and tech.
This is "The AI Economy," a weekly LinkedIn-first newsletter about AI's influence on business, work, society and tech and written by Ken Yeung. Sign up here.

It’s a short week in the U.S. with the Fourth of July holiday, but that hasn’t stopped AI news from being made.

For today’s issue of “The AI Economy,” I examine the relationship between AI providers and publishers. Some companies have opted to license the content. However, some tech firms have decided to pursue a different route, which involves scraping and claiming they’re helping drive traffic back to the source. Fed up with the intrusion, at least one publisher has filed a lawsuit alleging copyright infringement. So what happens next?

Plus, we look at an update from Stability AI following user backlash over its recently released Stable Diffusion 3 Medium model.

And don’t forget about this week’s roundup of AI news you may have missed!

Defense Against the Dark AI Crawlers

Leading AI platforms are constantly searching for new sources of information to ingest to help train their models. One reliable and never-ending source of original content is publishers. Initially, these AI companies pursued an “ask for forgiveness, not permission” strategy, scraping authors’ work and providing meek attribution (if any) before public outrage forced some to rethink their ways.

That has led AI companies‚ specifically OpenAI, to strike licensing deals with media outlets. The ChatGPT-maker has partnered with the Associated Press, Axel Springer (Business Insider and Politico), FT Group (Financial Times), Dotdash Meredith (People, Better Homes & Gardens, Food & Wine), News Corp (The Wall Street Journal, MarketWatch, the Daily Telegraph, Barron’s), Vox Media (The Verge, Polygon, Eater, SB Nation), The Atlantic, and TIME.

Others appear to be avoiding such arrangements for now, but they’re not immune to pushback from publishers. Even with all its media arrangements, OpenAI is facing a lawsuit from The New York Times, which alleges copyright infringement. Most recently, AI search engine Perplexity has come under fire after work by Forbes was lifted and summarized without proper attribution. To make matters worse, Perplexity is said to have taken the content from behind the publisher’s paywall. Forbes has threatened to sue the AI company. And it’s not the only media outlet to be victimized, with Wired accusing Perplexity of plagiarizing at least one of its stories.

Absent a licensing deal, these publications, along with other websites, have enacted a Robots Exclusion Protocol, meaning there are instructions on its server, stored in the robots.txt file, intended to block web crawlers from web services, indexing engines and even AI providers from visiting their site. However, adherence to these rules is voluntary, and it appears that Perplexity and other AI companies aren’t paying it any heed.

“Perplexity is not ignoring the Robots Exclusions Protocol and then lying about it,” Perplexity Chief Executive Aravind Srinivas tells Fast Company. “I think there is a basic misunderstanding of the way this works. We don’t just rely on our own web crawlers, we rely on third-party web crawlers as well.”

Coincidentally, Perplexity was in the process of unveiling a revenue-sharing deal with “high-quality publishers” right around the time it came under fire over allegedly misusing content. As Semafor notes, this would be a “first-of-its-kind revenue stream for media companies, providing a framework to earn recurring income. In contrast, OpenAI is paying media companies upfront to use their archives for training new AI models.”

Srinivas did not disclose the third-party web crawler service Perplexity is using. It doesn’t necessarily matter who it was. What’s more significant is that the company felt it acceptable to use a workaround to take someone else’s work for their benefit and assume that a simple attribution would be enough to right any wrong. It’s not. It’s taking a playbook directly from Microsoft’s AI boss, Mustafa Suleyman, who recently said:

I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.

No, not all content on the open web is free for anyone to copy, recreate or reproduce. If that were the case and web crawlers don’t comply with the Robots Exclusion Protocol, would there be enough protections to prevent AI platforms from scraping content from publisher websites and private companies? What if it lifted entire courses from Salesforce Trailhead, research from Gartner, merchandise information from a major retailer, etc.?

One tech company helping those who don’t want their sites crawled is Cloudflare. This week, the company released a no-code feature that promises to block AI developers from accessing their content. Every website visit processed is scored between 1 and 99—the lower the number, the greater the likelihood that the request is from a bot.

In any event, this will continue to play out as more technologies are invented to try to prevent web crawlers from AI companies from visiting websites and the court system. Of course, if these AI platforms want to avoid litigation and criticism, they could always pay the creators to license their work without stealing it first.

Further Reading:

▶️ What Perplexity gets wrong about aggregation (The Media Copilot)

Making Stable Diffusion Stable Again

After Stability AI released its “most sophisticated image generation model yet” last month, the company faced heavy criticism over Stable Diffusion 3 Medium’s usage license. At issue are its license fees and image generation limits, which some have complained are “unreasonably low” at 6,000 images per month. The plan costs $20 per month, even if you’re running SD3 locally on your computer.

Users claim the new policies threaten the viability of the AI community, as well as small artists and designers, while also violating open source values.

On Friday, Stability AI issued a community update, acknowledging that SD3 Medium failed to meet “our community’s high expectations” and released improvements to address user concerns.

But before we get to that, here’s a background on SD3 Medium: It’s a text-to-image open model containing 2 billion parameters, promising photorealistic results without complex workflows. The company claims that it overcomes common artifacts in the hands and faces and can understand complex prompts that involve spatial relationships, compositional elements, actions and styles.

While SD3 Medium is a sophisticated model, its license policy turned people off from using it because people didn’t want to pay the fees and not receive the benefits they expected.

Stability AI has launched a new “Stability AI Community License,” under which models released can be used for free “much more broadly than they could under the previous licenses,” including SD3 Medium. As such:

  • Non-commercial use remains free: You can continue to run Stability AI models on your device for non-commercial use for free
  • Free commercial use appropriate for individual and small business use: If you’re using Stability AI models under “Stability AI Community License,” finetune SD3, or integrate Stability models into your product or service, it’ll be free as long as annual revenues don’t exceed $1 million (or local currency equivalent)
  • Fewer limits: Stability AI has eliminated the restrictions on the number of media files that can be created with its Community License Agreement
  • Only commercial users must self-report: If you’re integrating Stability AI’s models or products into your commercial products or services and have annual revenues over $1 million, the company asks you to contact them

The Takeaway: Stability AI has been having a rough time lately. While its text-to-image AI generation models are impressive, the company continues to inflict PR damage on itself. Late last month, Sean Parker, Greycroft and others gave it a lifeline, but it needs to rebound to be competitive against more prominent AI model providers.

▶️ Read more about Stability AI’s latest community update

Today’s Visual Snapshot

How do adults worldwide believe generative AI will impact their lives over the next five years? eMarketer has generated the above graph, pulling from multiple sources to present areas where respondents believe AI will be the most disruptive.

Social media and search are the fields most poised to be impacted, followed by news media and science. In fact, except for “ordinary people” and “retailers,” a plurality believes that gen AI will have a “very or somewhat large impact” on healthcare, banking and finance, the military, the national government, politics and law enforcement.

The takeaway: People worldwide acknowledge that the AI boom is here and know its influence in many parts of society. However, while this chart measures the level of impact, it does not detail the respondent’s positive or negative sentiment toward gen AI. Nevertheless, it paints a picture that artificial intelligence will have woven itself into many, if not all, major parts of our daily lives over the next five years, and people realize that it will significantly affect society.

Quote This

“Machines don’t learn. Right? Machines copy, and then they basically match a user’s prompt with an analysis of patterns in what they’ve copied. And then they finish the pattern based on predictive algorithms or models. Right? That’s not what humans do. Humans have lived experiences. They have souls. They have genius.

They actually listen, get inspired, and then they come out with something different, something new. They don’t blend around patterns based on machine-based algorithms. So nice try, but I don’t think that argument is very convincing…”

— RIAA Chairman and Chief Executive Mitch Glazier spoke on The New York Times’ “Hard Fork” podcast, addressing Suno’s defense in an ongoing copyright infringement lawsuit. Suno, an AI startup, claims it values originality, but Glazier emphasizes that the company’s methods fundamentally differ from human creativity.

This Week’s AI News

🏭 Industry Insights

🤖 General AI and Machine Learning

✏️ Generative AI

🛒 Retail and Commerce

☁️ Enterprise

⚙️ Hardware and Robotics

🔬 Science and Breakthroughs

💼 Business and Marketing

📺 Media and Entertainment

💰 Funding

💥 Disruption and Misinformation

🔎 Opinions, Analysis and Research

🎧 Podcasts

End Output

Thanks for reading. Be sure to subscribe so you don’t miss any future issues of this newsletter.

Did you miss any AI articles this week? Fret not; I’m curating the big stories in my Flipboard Magazine, “The AI Economy.”

Follow my Flipboard Magazine for all the latest AI news I curate for "The AI Economy" newsletter.
Follow my Flipboard Magazine for all the latest AI news I curate for “The AI Economy” newsletter.

Connect with me on LinkedIn and check out my blog to read more insights and thoughts on business and technology. 

Do you have a story you think would be a great fit for “The AI Economy”? Awesome! Shoot me a message – I’m all ears!

Until next time, stay curious!

Subscribe to “The AI Economy”

New issues published on Fridays, exclusively on LinkedIn

Leave a Reply

Discover more from Ken Yeung

Subscribe now to keep reading and get access to the full archive.

Continue reading