I’m observing a weird behavior when CNNs on CIFAR10/0 with : https://imgur.com/a/cAFr0.

This happens with ResNets, DenseNets and even a vanilla VGG without batch-norm. I haven’t experienced this while using augmentation or decaying learning rates. The image above is the training log of a VGG on CIFAR10, with 0.00 learning rate (SGD+Nesterov) and 0.000 weight decay. The frequency of these “cycles” seem to be very dependent on the learning rate and weight decay, and they only happen at 100% training accuracy.

Also, I’m using the imagenet example from pytorch’s repo (except I’m training on CIFAR instead). I have tried removing most of the code almost to the point of only having forward / zero_grad / backward / step, and this still happens. Tried other training scripts / repos, no luck. I’m guessing this is not a bug / pytorch-related issue, and probably a general issue on network optimization?

Has anyone observed this before? This is a huge obstacle for a current project that I have, which involves collecting a lot of statistics regarding generalization and regularization to model network dynamics.

Source link
thanks you RSS link
( https://www.reddit.com/r//comments/81cq0g/d_sudden___when_training_cnns/)


Please enter your comment!
Please enter your name here