I’ve just now taken an interest in learning how LLMs work and I’m thankful that you shared this! To those more in the know: how are such mechanisms like attention and neural network layers usually implemented? All of the learning materials discuss these concepts by showing drawings of boxes with arrows going every which way, but I’ve not once seen mentioned where someone could actually read an existing implementation to find out how it is actually laid out. Also, is a model “just” an executable on disk with inputs and outputs or is it more involved?
Sorry for not answering your questions directly, but since you’re just starting out and expressing an interest in how things work, I would refer you to Andrej Karpathy’s YouTube channel, in particular this playlist (the first video of which is probably the single best resource for the average programmer to understand the fundamental building blocks of all deep learning).
I’ve just now taken an interest in learning how LLMs work and I’m thankful that you shared this! To those more in the know: how are such mechanisms like attention and neural network layers usually implemented? All of the learning materials discuss these concepts by showing drawings of boxes with arrows going every which way, but I’ve not once seen mentioned where someone could actually read an existing implementation to find out how it is actually laid out. Also, is a model “just” an executable on disk with inputs and outputs or is it more involved?
Sorry for not answering your questions directly, but since you’re just starting out and expressing an interest in how things work, I would refer you to Andrej Karpathy’s YouTube channel, in particular this playlist (the first video of which is probably the single best resource for the average programmer to understand the fundamental building blocks of all deep learning).