Pretty sure there’s loads of communist theory in all different flavors in english (and dozens of other languages) available for free on the Internet. Kinda makes me wonder: there are some ai image models where you can look at a part of their datasets, is there actually some way to check whether the dataset of an LLM contains any amount of narcist theory?
it is already compiled/trained. it’s open source so you could re-train it but they did spend in the low millions on training so fully retraining from scratch is impractical for an individual. Maybe there’s a way to do supplemental/reinforcement training on the released model but I have no idea.
sure, there’s theory, but in terms of raw amount of stuff in the training material there’s going to be a lot less high quality english discussion of marxism I’d bet and a lot of psuedo-marxist junk mixed in there, probably much less of that in Chinese. Some models publish their training datasets I believe but not all
Pretty sure there’s loads of communist theory in all different flavors in english (and dozens of other languages) available for free on the Internet. Kinda makes me wonder: there are some ai image models where you can look at a part of their datasets, is there actually some way to check whether the dataset of an LLM contains any amount of narcist theory?
I thought this could run off-line? Doesn’t that mean we could just dump prolewiki into it or something? (Or is it already compiled? Idk)
it is already compiled/trained. it’s open source so you could re-train it but they did spend in the low millions on training so fully retraining from scratch is impractical for an individual. Maybe there’s a way to do supplemental/reinforcement training on the released model but I have no idea.
sure, there’s theory, but in terms of raw amount of stuff in the training material there’s going to be a lot less high quality english discussion of marxism I’d bet and a lot of psuedo-marxist junk mixed in there, probably much less of that in Chinese. Some models publish their training datasets I believe but not all