Skip Navigation

HUGE dataset released for open source use

together.ai

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models — Together AI

Hacker News @lemmy.smeargle.fans

RedPajama v2 Open Dataset with 30T Tokens for Training LLMs

Hacker News @derp.foo

RedPajama v2 Open Dataset with 30T Tokens for Training LLMs

4 comments