Calaveras AI
We found a 100B token code dataset that no one else found.
We sold it to four of the top six labs on Chat Arena.
We have two more datasets coming soon:
- A new pretraining data mix with office data.
- Large multimodal, multiturn RL coding environment.
If you are interested in these datasets, please contact us at
datasets@calaveras.ai