Many complex tasks can be decomposed into simpler, independent parts. Can neural networks systematically capture such discrete, compositional structure despite their continuous, distributed nature? The impressive capabilities of large-scale neural networks suggest that the answer to this question is yes. However, even for the most capable models, there are failure cases that raise doubts about their compositionality. Do we need to endow neural network architectures with modular or even symbolic structure, or will scaling suffice? In this talk, Simon will shed light on this question and identify conditions under which compositional generalization succeeds.