xT: Nested Tokenization for Larger Context in Large Images

Published in arXiv preprint, 2024

Recommended citation: Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, and Karttikeya Mangalam. "xT: Nested Tokenization for Larger Context in Large Images." arXiv preprint arXiv:2403.01915 (2024). https://arxiv.org/abs/2403.01915