News
A new study links layer-time dynamics in Transformer models with real-time human processing. The findings suggest that AI models may not only reach similar outputs as humans but could also follow ...
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy? Authors: Guan, Y., Trinh, V.A., Voleti, V., and Whitehill, J.
Models like LLM2Vec and NV-Embed enhance text-based representation learning by modifying the attention mechanisms in decoder-only LLMs. Despite these innovations, challenges such as handling long ...
When I test quantized int8 model with TP, the following error occurred: only Tensors of floating point dtype can require gradients [rank0]: File "/opt/conda/envs/py_3 ...
Despite their popularity, we question the necessity of making both the encoder and decoder learnable. To address this, we propose LessNet, a simplified network architecture with only a learnable ...
Experimental results demonstrate that our method not only achieves high ... network structures, such as multilayer perceptron (MLP), convolutional neural networks (CNNs), recurrent neural networks ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results