Hi All,

Those who are working on parallel distributed for neural networks, you might have come across this paper in your life. https://arxiv.org/abs/1712.01887 on (DGC). This work solves the communication bottleneck in data parallel DNN training by reducing the data transmission size by sending only values that crosses a threshold. I was fascinated by the ideas in the paper and wanted to quickly try it out. So I made a version of DGC on MXNet.

Here is the code: https://github.com/anandj91/anand-mxnet (Branch: dgc)

I tried training ResNet-1 on CIFAR- with DGC. I don’t know what I’m doing wrong, but I’m not able to reproduce the results mentioned in the paper. The validation accuracy is around 92.8% as opposed to baseline (sending full gradients) accuracy of 93.66%. Mostly I’m not using the right hyper parameters.

If anybody has tried this before or is working on this area and interested to take a stab at it, feel free to contact me. I would be grateful if you can help me with the hyperparameter tuning and make my DGC converge to the baseline accuracy.



Source link
thanks you RSS link
( https://www.reddit.com/r//comments/9bv7ri/_deep_gradient_compression_implementation/)


Please enter your comment!
Please enter your name here