Llama 1b. Doing this achieves brr – on an H100, we use 78% of memory bandwidth and outp...

Llama 1b.  Doing this achieves brr – on an H100, we use 78% of memory bandwidth and outp...Llama 1b.  Doing this achieves brr – on an H100, we use 78% of memory bandwidth and outp...