Revisiting Transformer Layer Parameterization Through Causal Energy Minimization
Jin Xu, Camille Couturier, Victor Ruhle, Saravan Rajmohan, James Hensman
May 2026
Jin Xu, Camille Couturier, Victor Ruhle, Saravan Rajmohan, James Hensman
May 2026
C. Trojan, Pavel Myshkov, P. Fearnhead, James Hensman, Tom Minka, Chris Nemeth
AISTATS 2026 | April 2026
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024, Vol abs/2412.08585
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
2024 Neural Information Processing Systems | March 2024
Preprint
Jin Xu, Camille Couturier, Victor Ruhle, Saravan Rajmohan, James Hensman
May 2026
C. Trojan, Pavel Myshkov, P. Fearnhead, James Hensman, Tom Minka, Chris Nemeth
AISTATS 2026 | April 2026
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024, Vol abs/2412.08585
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
2024 Neural Information Processing Systems | March 2024
Preprint
Jin Xu, Camille Couturier, Victor Ruhle, Saravan Rajmohan, James Hensman
May 2026
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
C. Trojan, Pavel Myshkov, P. Fearnhead, James Hensman, Tom Minka, Chris Nemeth
AISTATS 2026 | April 2026
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
2024 Neural Information Processing Systems | March 2024
Preprint
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024, Vol abs/2412.08585