I was curious about this paper:
arxiv.org/pdf/2202.05826.pdf
It explains how creating an algorithm able to solve complex tasks while being trained on simple ones only.
The idea is to use multiple time the same block until it finds the solution.
The more complex the problem, the more you iterate before finding the solution.
For the maze, it generates intermediate solutions like this.
This way, a fully convolutional network can learn on small labyrinths, and it became intuitive why it scales if you run it iteratively.
Not that impressive after all.
Stable Diffusion is also based on training a network to solve a small part of a problem, and it's by running it through several steps that you can generate an interesting issue.
It feels that this way of training network is more profound and is underused.