Several on sentence classification use the (two versions: 5 classes vs 2 classes). I’m looking for this and cannot find it.

The ULMFit paper says the 5-class dataset has 650K samples, while the binary one has 560K samples. They refer to the paper on char-level convnets from NIPS 2015. The latter paper says that they took 1 569 264 samples from the Yelp Dataset Challenge 2015 and constructed two classification tasks, but the paper does not describe the details.

The current version of the Yelp dataset has ~6M reviews. The version on Kaggle has 5.2M samples.

Does anyone know how to obtain the version used in the papers?

