| | How Confident are Video Models? Empowering Video Models to Express their Uncertainty | | 1 | 1 |
| | Self-Improvement in Multimodal Large Language Models: A Survey | | 3 | 1 |
| | Improving GUI Grounding with Explicit Position-to-Coordinate Mapping | | 1 | 1 |
| | Rethinking Thinking Tokens: LLMs as Improvement Operators | | 3 | 2 |