2023-03-18 UTC
# [KevinMarks] The token embeddings have a high dimensionality but are very sparse in the raw form. The networks that model them are already more concentrated, but due to some degree of randomisation during training will have a different degree of sparseness, so coming up with representations that compress them well is a challenging problem. I expect that we will see ways to aggressively prune or compress models for specific applications.