Bluesky Users Weigh In on User Data & AI Training Plans

Bluesky Users Weigh In on User Data & AI Training Plans

Bluesky Considers User Control Over Data Scraping for AI

March 15,2025

Bluesky is exploring a proposal that would give users more control over how their data is used for purposes like AI training and archiving,sparking debate among its user base.

The Proposal: User Intent Signals

Bluesky, the burgeoning social network, is considering a notable shift in its data policy. The company recently unveiled a proposal on GitHub that outlines options for users to specify whether they consent to having their posts and data scraped for various purposes, including generative AI training and public archiving.

This move comes as the platform attempts to navigate the growing concerns surrounding data privacy and the use of user-generated content in the development of artificial intelligence. Tho, the proposal has also ignited a debate among users, with some expressing skepticism about its effectiveness.

User Reactions: From Alarm to Acceptance

The proposal, initially discussed by CEO Jay Graber at South by Southwest on March 10, 2025, gained traction after she posted about it on Bluesky. Some users reacted negatively, interpreting the move as a departure from Bluesky’s previous stance against selling user data or training AI on their posts.

One user, Sketchette, expressed their dismay, stating, “Oh, hell no! The beauty of this platform was the NOT sharing of information. especially gen AI.Don’t you cave now.” This sentiment reflects a broader worry that the platform might be compromising its commitment to user privacy.

Bluesky’s Rationale: A New Standard for Scraping

Graber responded to the concerns, explaining that generative AI companies are “already scraping public data from across the web,” including Bluesky, as “everything on bluesky is public like a website is public.” To address this reality, Bluesky aims to establish a “new standard” for governing data scraping, similar to the robots.txt files that websites use to communicate crawling permissions.

The proposed system would allow users of the Bluesky app, or other apps using the underlying ATProtocol, to manage their data preferences across four categories:

  • Generative AI training
  • Protocol bridging (connecting different social ecosystems)
  • Bulk datasets
  • Web archiving (e.g., Internet Archive’s Wayback Machine)

The proposal states that if a user opts out of having their data used for generative AI training, “Companies and research teams building AI training sets are expected to respect this intent when they see it, either when scraping websites, or doing bulk transfers using the protocol itself.” However, the enforceability of this expectation remains a point of contention.

Expert Outlook: The Challenge of Enforcement

Molly White, who publishes the “Citation Needed” newsletter and the “Web3 is Going Just Great” blog, described the proposal as “a good proposal,” adding that it was “weird to see people flaming BlueSky for it,” since it’s not so much “welcoming in AI scraping” but rather “trying to add a consent signal to allow users to communicate preferences for the scraping that is already happening.”

white also highlighted a potential weakness: “I think the weakness with this and [Creative Commons’] similar proposal for ‘preference signals’ is that they rely on scrapers to respect these signals out of some desire to be good actors,” she noted. “We’ve already seen some of these companies blow right past robots.txt or pirate material to scrape.” This raises questions about the actual impact of the proposal, as it hinges on the ethical behavior of data scrapers.

Bluesky’s proposal to give users control over their data represents a step toward greater openness and user empowerment. However, the success of this initiative depends on the willingness of AI companies and researchers to respect user preferences. As the debate unfolds, it remains crucial for users to stay informed and actively participate in shaping the future of data privacy on social media. What are your thoughts? Share your opinion in the comments below.

Does Bluesky’s data scraping proposal strike a balance between user control and AI advancement, or does it primarily benefit AI companies?

bluesky’s data scraping Proposal: An Expert’s Viewpoint

March 15, 2025

We interview Dr.Anya Sharma, a data privacy expert, on Bluesky’s plan to give users more control over how their data is used for AI training.

Understanding Bluesky’s User Intent Signals

Bluesky, the rising social network, is making waves with its new proposal for user intent signals regarding data scraping. To unpack this, we’re joined by Dr. Anya sharma, a leading expert in data privacy and online ethics. Dr. Sharma, thank you for being with us.

Dr. Sharma: It’s my pleasure to be here.

Archyde: Dr. Sharma, can you explain in simple terms what Bluesky’s “user intent signals” proposal entails?

Dr.Sharma: Essentially,Bluesky is exploring giving users the ability to specify how they want their data handled when it comes to things like AI training,protocol bridging,bulk datasets,and even web archiving. It’s about allowing users to express consent or denial for various uses of their publicly available data.

Addressing User Concerns About AI Data Scraping

Archyde: The proposal has sparked mixed reactions, with some users worried about Bluesky potentially “caving” to AI companies. How valid are these concerns?

Dr. Sharma: It’s understandable why some users are apprehensive. The initial reaction stems from a place of wanting their data protected. Though, bluesky’s argument is that data scraping is already happening, and they’re trying to create a system where users can express their preferences, like a robots.txt for social media content. It’s not about encouraging scraping, but rather managing it with greater user input.

The practicality and Enforceability of the Proposal

Archyde: Bluesky aims to set a “new standard” for governing data scraping. Do you think this is achievable, considering that current scraping practices frequently enough ignore existing standards like robots.txt?

Dr. Sharma: That’s the million-dollar question. The success hinges on the willingness of AI developers and other organizations to respect these user intent signals. There’s no technical enforcement mechanism, so it relies on good faith and potential reputational consequences for those who ignore it. It would be interesting to see regulatory bodies step in and impose penalty for violating these user preferences in the future.

The Future of Data Privacy on social Media

Archyde: What are the broader implications of Bluesky’s proposal for data privacy and user empowerment in the social media landscape?

Dr.Sharma: If successful, this could set a precedent for other platforms to follow, potentially leading to a more user-centric approach to data privacy. It empowers users to have a voice in how their publicly shared data is used, shifting the balance of power to some degree. It also puts pressure on organizations to be more transparent and ethical about their data scraping practices.

Thought-Provoking Questions for Our Readers

Archyde: dr. Sharma, thanks for your insights. what shoudl individuals consider when deciding whether to opt-in or opt-out of data scraping on platforms like Bluesky? Also, is this a real solution or just a PR move?

Dr. sharma: If it offers you more control, I would say yes, sign up for the control. But, if there are limited options, don’t give up hope that someone will fight for you.

Archyde: An interesting perspective.What do you,our readers,think? Share your comments below. Do you believe Bluesky’s approach is genuinely empowering, or is it merely a symbolic gesture? What measures would truly ensure user data privacy in the age of AI?

Leave a Replay