A Hugging Face employee made a huge dataset of Bluesky posts, and it’s already very popular.
You must log in or # to comment.
Already taken down.
For those interested:
I’ve removed the data from this dataset since there was a lot of community pushback about its creation/uploading. I will leave the dataset repository up to allow room for discussion of how datasets can be used to help improve Bluesky and allow people to build the tools they need to build their own open models and approaches to creating feeds that work for their needs. Please feel free to continue to leave feedback in the discussions here.
https://huggingface.co/datasets/bluesky-community/one-million-bluesky-posts