RedPajama v2 Open Dataset with 30T Tokens for Training LLMs
RedPajama v2 Open Dataset with 30T Tokens for Training LLMs

together.ai
RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models — Together AI

RedPajama v2 Open Dataset with 30T Tokens for Training LLMs
RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models — Together AI