If this interests you, you should read this paper. IMO the most in-depth proof that after sufficient training, NNs learn simple, parsimonious, highly-general rules, even in the absence of overwhelming data.