News

AWS Investigates Perplexity AI: Allegations of Ignoring Web Crawler Rules

Published

June 29, 2024

AWS Investigates Perplexity AI Allegations of Ignoring Web Crawler Rules

Amazon Web Services (AWS) is investigating Perplexity AI for potential violations of its policies, as reported by Wired. The investigation centers on accusations that Perplexity AI is using a web crawler hosted on AWS servers that disregards the Robots Exclusion Protocol.

This protocol allows web developers to control whether bots can access specific pages through a robots.txt file. While adherence to this protocol is voluntary, it has been a standard practice for reputable companies since the 1990s.

Wired previously reported discovering a virtual machine, identified with the IP address 44.221.181.252 and hosted on AWS, that ignored robots.txt instructions to scrape content from its site.

This virtual machine, reportedly linked to Perplexity AI, also visited other Condé Nast properties and major news outlets like The Guardian, Forbes, and The New York Times multiple times over three months. Wired tested Perplexity AI’s chatbot with their article headlines, finding that the bot produced paraphrased content with minimal attribution, indicating potential scraping activities.

AWS Investigates Perplexity AI Allegations of Ignoring Web Crawler Rules

A Reuters report highlighted that Perplexity AI might not be the only AI company bypassing robots.txt files to collect data for training large language models.

Despite this broader issue, Wired specifically provided AWS with details about Perplexity AI’s crawler. AWS’s statement emphasized that its terms prohibit abusive and illegal activities, and it investigates all reports of potential violations, including the information provided by Wired about Perplexity AI.

In response, Perplexity AI’s spokesperson, Sara Platnick, stated that the company has addressed AWS’s inquiries and denied any wrongdoing. Platnick affirmed that PerplexityBot respects robots.txt and does not violate AWS terms of service.

She noted that AWS’s inquiry was part of a standard investigation process, and Perplexity AI had not been contacted about any issues before Wired’s report. However, she did acknowledge that PerplexityBot might bypass robots.txt if a user specifically includes a URL in a chatbot query.

Perplexity AI’s CEO, Aravind Srinivas, has also denied allegations of ignoring the Robots Exclusion Protocol and lying about it. He admitted that Perplexity AI uses both its own and third-party web crawlers, with the bot identified by Wired being one of the third-party crawlers. Srinivas’s admission highlights the complexity of the issue, as Perplexity AI relies on multiple sources for its data collection activities.

In this article:

Click to comment

Tech

Threads Tests 24-Hour Timer for Ephemeral Posts, Enhancing Content Flexibility

Threads is experimenting with a new feature that allows users to set a 24-hour timer on their posts. After this period, the post and...

DrishtyAugust 26, 2024

AU10TIX Exposes Admin Credentials, Potentially Compromising Client Data for Over a Year

News

AU10TIX Exposes Admin Credentials, Potentially Compromising Client Data for Over a Year

AU10TIX, an Israeli company that verifies IDs for clients like TikTok, X, and Uber, accidentally left important admin credentials exposed for over a year....

Richie Dela CruzJune 27, 2024

Live2Diff - AI Transforms Live Video into Real-Time Stylized Content

Tech

Live2Diff – AI Transforms Live Video into Real-Time Stylized Content

A team of international researchers has developed Live2Diff, an AI system that transforms live video streams into stylized content in near real-time. Named for...

Mason HaleJuly 17, 2024

Charles Hoskinson Criticizes Tron’s USDD for Removing Bitcoin Collateral, Raising Concerns About Decentralization

News

Charles Hoskinson Criticizes Tron’s USDD for Removing Bitcoin Collateral, Raising Concerns About Decentralization

Charles Hoskinson, the founder of Cardano, has voiced dissatisfaction with recent changes to Tron’s native stablecoin, USDD. He reacted to a report indicating that...

Mason HaleAugust 26, 2024

Gizmo Writeups

News

AWS Investigates Perplexity AI: Allegations of Ignoring Web Crawler Rules

Leave a Reply
Cancel reply

Leave a Reply

You May Also Like

Tech

Threads Tests 24-Hour Timer for Ephemeral Posts, Enhancing Content Flexibility

News

AU10TIX Exposes Admin Credentials, Potentially Compromising Client Data for Over a Year

Tech

Live2Diff – AI Transforms Live Video into Real-Time Stylized Content

News

Charles Hoskinson Criticizes Tron’s USDD for Removing Bitcoin Collateral, Raising Concerns About Decentralization

Leave a Reply Cancel reply

Leave a Reply

You May Also Like

Tech

Threads Tests 24-Hour Timer for Ephemeral Posts, Enhancing Content Flexibility

News

AU10TIX Exposes Admin Credentials, Potentially Compromising Client Data for Over a Year

Tech

Live2Diff – AI Transforms Live Video into Real-Time Stylized Content

News

Charles Hoskinson Criticizes Tron’s USDD for Removing Bitcoin Collateral, Raising Concerns About Decentralization

Leave a Reply
Cancel reply