49 points | by lmxyy16 hours ago
Interesting to see how poor the prompt adhesion is in these examples. The cyanobacteria one is just "an image of the ocean". The skincare one completely ignores 50% of the ingredients in the prompt, and makes coffee beans the size and shape of almonds.
If I found the correct docs https://docs.nvidia.com/deeplearning/cudnn/frontend/latest/o... NVFP4 means 16 4-bit floating-point values (1 sign bit, 2 for the exponent, 1 for the mantissa) each have one shared 8-bit floating point scaling factor (1 sign bit, 4 exponent, 3 mantissa), so strictly speaking it's 4.5 bits per value.
This grouped scaling immediately makes me wonder whether the quantization error could be reduced even more by permuting the matrix so values of similar magnitude are quantized together.