Maowei Jiang (蒋茂苇)

I am a graduate student at Tsinghua University, working on large language models, multimodal agents, Vision-Language-Action (VLA) models, robot learning, and policy optimization. Before joining Tsinghua, I studied at the University of Chinese Academy of Sciences / Shenyang Institute of Automation, Chinese Academy of Sciences, where I worked on deep learning and intelligent perception.

My current research focuses on building embodied agents that can connect perception, reasoning, decision making, and action. I am especially interested in world-model-conditioned VLA policies, online policy refinement for real robots, reinforcement learning for foundation models, and multimodal systems that can generalize to long-horizon tasks.

Email / Google Scholar / GitHub / CV

Research keywords: LLMs, MLLMs, VLA, Robot Learning, Embodied World Models, Multimodal Agents, Policy Optimization, Reinforcement Learning, Long-Horizon Decision Making.

News

2026.06: ReCon and FutureVLA, two first-author works on VLA policy refinement and future-conditioned VLA decision making, are under review.
2026.06: FDVLA is under review at Information Fusion; Prompt2Act focuses on mapping natural-language prompts to executable robot action sequences.
2026: TAPO was selected as an AAAI 2026 Oral paper, focusing on policy optimization for LLMs with dynamic teacher signals and perturbed answer injection.
2025: DAAC was accepted to NeurIPS 2025, studying discrepancy-aware adaptive contrastive learning.
2024: Contributed to open-world agent and generative design works, including CARD, MRED-14, and GreenPlanner.

Selected Publications and Submissions

AAAI 2026 Oral

TAPO: Dynamic Teacher and Perturbed Answer Injection for Policy Optimization
Maowei Jiang, et al.
AAAI 2026 Oral
Introduces a dynamic teacher and perturbed answer injection mechanism for improving LLM policy optimization efficiency and mitigating reward hacking.

NeurIPS 2026 Under Review

FutureVLA: Acting on Predicted Futures with Vision-Language-Action Models
Maowei Jiang, et al.
First-author submission
Connects visual world-model prediction with robot action generation by feeding predicted future visual tokens into VLA policies. Achieves strong results on LIBERO and real-robot tasks.

NeurIPS 2026 Under Review

ReCon: Reference-Conditioned Online Refinement for Vision-Language-Action Policies
Maowei Jiang, et al.
First-author submission
Studies online residual correction for frozen VLA policies. In real-robot contact-rich tasks, the approach improves average success rate from 46.3% to 98.7%.

Information Fusion

Prompt2Act: Mapping Prompts into Sequence of Robotic Actions with Large Foundation Models
Maowei Jiang, et al.
Information Fusion, IF 15.5, Q1 Top, CCF-B
[GitHub]
Maps natural-language prompts into robot action sequences, bridging LLM/MLLM reasoning, task planning, and executable robot actions.

Information Fusion Under Review

FDVLA: A Flow-Diffusion Vision-Language-Action Framework with Dual Reasoning Modulation
Maowei Jiang, et al.
First-author submission
[GitHub]
Explores flow-diffusion VLA modeling and reasoning modulation for complex robot manipulation and action generation.

ACM MM 2026 Under Review

RL2VLA: Reinforcement Learning Fine-tuning for Vision-Language-Action Models
Maowei Jiang, et al.
First-author submission
[GitHub]
Studies reinforcement learning fine-tuning for VLA models by combining supervised behavior learning with policy search and self-improvement.

NeurIPS 2025

DAAC: Discrepancy-Aware Adaptive Contrastive Learning
Maowei Jiang, et al.
NeurIPS 2025
Studies robust representation learning under distribution discrepancy through adaptive contrastive learning.

Agents / Generative Design

CARD / MRED-14 / GreenPlanner
NeurIPS 2024 Workshop on Open-World Agents; ACM MM; CVPR
Contributed to cross-modal agents for editable residential design, a benchmark for low-energy residential floor-plan generation, and function-feasible generative layout planning.

Open Source

Awesome-LLM-Robotics
4.4k+ stars. A curated list of LLM/MLLM + Robotics/RL papers, code, and resources. I contribute to tracking the fast-moving embodied AI literature.

Second-Me
15.5k+ stars. Contributed to interface development for a personalized AI self system and WeChat bot integration.

Prompt2Act / RL2VLA / FDVLA
Research repositories for prompt-to-action generation, RL fine-tuning for VLA, and flow-diffusion VLA modeling.

Zero-coder GitHub
98 public repositories covering LLMs, VLA, multimodal learning, computer vision, open-source notes, and research prototypes.

Honors and Competitions

Kaggle CMI Child Mind Institute: Silver Medal, global rank 75 / 1878.
Huawei Ascend AI Innovation Competition: Excellent Solution Award.
BMW Hackathon: Finalist / second place.
Alibaba Tianchi Few-shot Trademark Detection: Global rank 239 / 2135.
Asia-Pacific Ophthalmology Big Data Competition: Global rank 142 / 10006.

Research Taste

I like problems where language, vision, action, and feedback meet each other. My long-term goal is to build agents that do not merely describe the physical world, but can reason about it, act in it, learn from failures, and improve through real interaction.