Sunday, October 5, 2025

Cloudflare Blames Perplexity Of Stealth Knowledge Scraping


Just lately, Cloudflare and Perplexity got here at odds not too long ago as the previous alleged Perplexity of stealth information scraping. Cloudflare noticed Perplexity bots to crawl web sites even with specific no-crawl requests. Perplexity, nevertheless, denies such claims.

Cloudflare Alleges Perplexity Of Stealth Knowledge Scraping

In a latest publish, Cloudflare claimed to have noticed Perplexity aggressively scraping information from web sites in a stealth method. By stealth, Cloudflare refers to Perplexity’s net crawling and information scraping even with websites disallowing such crawls.

Particularly, Cloudflare turned suspicious of this exercise when a number of clients complained to them about Perplexity crawlers crawling their web sites even when disallowed. Cloudflare then examined this habits by creating web sites with dummy domains and querying Perplexity in regards to the domains. Regardless of implementing all measures to dam Perplexity crawlers on these websites, the responses from Perplexity to their queries in regards to the websites hinted in any other case.

We carried out an experiment by querying Perplexity AI with questions on these domains, and found Perplexity was nonetheless offering detailed info concerning the precise content material hosted on every of those restricted domains. This response was sudden, as we had taken all crucial precautions to stop this information from being retrievable by their crawlers.

When web sites like to dam crawlers (similar to Perplexity crawlers, on this case), they particularly add such guidelines within the robots.txt information. Nevertheless, from Cloudflare’s experiment, it will definitely turned out that the Perplexity crawlers use to bypass the robots.txt information and allowlists for crawlers.

Though Perplexity’s web site clarifies that one among its crawlers Perplexity-Consumer might ignore the robots.txt guidelines following consumer actions. Since this crawler helps consumer actions with Perplexity, it solely accesses a web site following a consumer request and isn’t used for net crawling or information scraping. However Cloudflare discovered the service doing greater than what’s acknowledged.

Cloudflare noticed Perplexity to even use undeclared crawlers, utilizing a generic browser mimicking Google Chrome for macOS, to entry the content material upon detecting a block.

To match normal practices, Cloudflare even noticed OpenAI’s ChatGPT and located it complying with the perfect practices for bot operations. Even their ChatGPT-Consumer crawler additionally stops when it finds a disallowed directive.

Perplexity Refutes Cloudflare’s Statements

Following this disclosure from Cloudflare, Perplexity denied the claims. In response to their assertion to TechCrunch, Perplexity spokesperson Jesse Dwyer dubbed Cloudflare’s weblog a “gross sales pitch” (since Cloudflare has introduced strengthening its WAF guidelines to dam Perplexity crawlers for the web sites that disallow them). In addition to, Dwyer even denied any hyperlink with the bot talked about in Cloudflare’s publish.

Whereas it’s but ambiguous if Perplexity is implementing stealth information scraping, its normal information scraping actions are additionally not as liked by the web site house owners. Just lately, a Japanese newspaper, Yomiuri Shimbun, filed a lawsuit towards Perplexity AI within the Tokyo District Court docket, alleging them they of “free-riding” on their information and copyright infringement. The newspaper seeks $14.7 million in damages, citing the utilization of 120,000 articles by Perplexity between June 2023 and July 2025. A choice on it’s but to reach, which could additionally set a precedent about how these AI companies may use the knowledge accessible on-line.

Tell us your ideas within the feedback.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

PHP Code Snippets Powered By : XYZScripts.com