Skip to main content

Market Overview

With TikTok Ban Looming Large, Parent ByteDance's Web Scraping Bot Draws Attention For Being More Aggressive Than The One ChatGPT Uses: Report

Share:
With TikTok Ban Looming Large, Parent ByteDance's Web Scraping Bot Draws Attention For Being More Aggressive Than The One ChatGPT Uses: Report

ByteDance, the company behind TikTok, has introduced a powerful web scraper named “Bytespider.” Launched in April, Bytespider is recognized as one of the most aggressive data collectors online, outpacing other major tech firms significantly in terms of data collection speed.

What Happened: Research conducted by Kasada, a bot management company, and Dark Visitors, a group monitoring scraper bots, confirmed Bytespider’s activity. According to Kasada CEO Sam Crowther, Bytespider collects data 25 times faster than GPTbot, utilized by OpenAI for ChatGPT, and 3,000 times faster than ClaudeBot from Anthropic, Fortune reported on Friday.

Despite the looming threat of a U.S. ban on TikTok, ByteDance continues its aggressive data collection strategy. President Joe Biden has demanded the sale or shutdown of TikTok due to national security concerns. Bytespider’s disregard for robots.txt, a voluntary code that advises scrapers to avoid certain websites, adds to the controversy.

See Also: Elon Musk Mocks Vinod Khosla After OpenAI Investor Mixes Up Argentina’s Poverty And Unemployment Rates To

The increase in web scraping is linked to ByteDance’s development of a new large language model (LLM) to improve TikTok’s search capabilities. A recent update to TikTok’s search function allows real-time keyword searches for ads, potentially enhancing ad visibility.

ByteDance has yet to respond to Benzinga’s queries.

Why It Matters: The aggressive web scraping by ByteDance follows a trend among major tech companies. In June, OpenAI and Anthropic were reported to have ignored web scraping rules, bypassing the robots.txt protocol to gather free data for AI model training. This practice has sparked controversy, highlighting the tension between AI development and data privacy.

In August, NVIDIA faced scrutiny for scraping videos from platforms like YouTube to train its AI models. This revelation raised concerns about content creators’ rights and the ethical implications of using publicly available data without explicit consent.

Similarly, in September, Microsoft’s owned LinkedIn was criticized for using user data for AI training without updating its terms of service, particularly affecting users in the U.S.

Read Next:

Photo by XanderSt on Shutterstock

This story was generated using Benzinga Neuro and edited by Pooja Rajkumari

 

Related Articles

View Comments and Join the Discussion!

Posted-In: ByteDance ChatGPT Consumer Tech GPTbot OpenAiNews Tech General

Don't Miss Any Updates!
News Directly in Your Inbox
Subscribe to:
Benzinga Premarket Activity
Get pre-market outlook, mid-day update and after-market roundup emails in your inbox.
Market in 5 Minutes
Everything you need to know about the market - quick & easy.
Fintech Focus
A daily collection of all things fintech, interesting developments and market updates.
SPAC
Everything you need to know about the latest SPAC news.
Thank You

Thank you for subscribing! If you have any questions feel free to call us at 1-877-440-ZING or email us at vipaccounts@benzinga.com