    Nice List! I would include two others, that push forward our understanding of “why” deep neural networks work the way they do:

    “Generalization in Deep Learning”: https://arxiv.org/abs/1710.05468 which explores a new regularization technique they name Directly Approximately Regularizing Complexity (DARC).

    “Dynamic Routing between Capsules”: https://arxiv.org/abs/1710.09829 which introduces the Capsule network architecture.