A Hugging Face employee made a huge dataset of Bluesky posts, and it’s already very popular.

    • Successful_Try543@feddit.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 hours ago

      For those interested:

      I’ve removed the data from this dataset since there was a lot of community pushback about its creation/uploading. I will leave the dataset repository up to allow room for discussion of how datasets can be used to help improve Bluesky and allow people to build the tools they need to build their own open models and approaches to creating feeds that work for their needs. Please feel free to continue to leave feedback in the discussions here.

      https://huggingface.co/datasets/bluesky-community/one-million-bluesky-posts