Calaveras AI

We found a 100B token code dataset that no one else found.
We sold it to four of the top six labs on Chat Arena.
We have two more datasets coming soon:
  1. A new pretraining data mix with office data.
  2. Large multimodal, multiturn RL coding environment.
If you are interested in these datasets, please contact us at datasets@calaveras.ai