Distillation of Deepnets

Training modern deepnets can take an inordinate amount of time even with the best GPU hardware available. Inception-3 on ImageNet 1000 using 8 NVIDIA Tesla K40s takes about 2 weeks (Google Research Blog).

Even when a large network is trained successfully, the memory footprint and the prediction latency (due to the number of its parameters) can make it challenging to put it into production.

