About Company
Founded in 2022, XG Tech is driving the future of smart vehicles. Its mission is to empower the digital transformation of automobiles, moving from distributed computing to a centralized, cross-domain platform.
XG Tech focuses on the intelligent cockpit—the next frontier of differentiation—while seamlessly integrating advanced driving systems. By reimagining cars as mobile living spaces, XG Tech aligns with the evolving trend of vehicles becoming the “third living space.”
Role Summary
As an R&D Engineer (Multimodal AI Agents), you will lead the development of next-generation AI agents capable of perceiving, reasoning, and acting within real-world graphical user interfaces. You will work at the intersection of Computer Vision, Natural Language Processing, and Reinforcement Learning to transform cutting-edge research into scalable, production-ready systems for Android and cross-platform environments.
Key Responsibilities
- Design, train, and optimize Large Multimodal Models (LMMs) for GUI understanding, including screen parsing, grounding, and action prediction
- Build end-to-end agent pipelines covering perception, planning, reasoning, and execution loops across dynamic UI environments
- Track, reproduce, and improve cutting-edge research in GUI agents (e.g., DroidRun, Surfer, AutoGLM) and adapt them to real-world use cases
- Develop robust evaluation frameworks to measure agent performance, reliability, and safety in complex software environments
- Translate research concepts into working prototypes and production-ready systems with strong engineering rigor
- Contribute to patents, technical publications, and internal knowledge building aligned with company innovation goals
How will you stand out:
- Master’s or PhD in Computer Science, Artificial Intelligence, Robotics, or a related field
- Demonstrated research or technical impact through publications, open-source contributions, or significant projects -- (Please include a link to your Google Scholar profile or portfolio in your application)
- Proficiency in Python, with experience in large-scale model training, data processing, and system implementation
- Solid understanding of Vision-Language Models, Transformer architectures, or multimodal learning systems
- Familiar with agent optimization or training techniques, such as Reinforcement Learning (e.g., PPO, RLHF) and agent evaluation frameworks or benchmarking environments
- Experience in UI understanding tasks, such as object detection and OCR
- Proficiency with deep learning frameworks such as PyTorch or JAX
XG Tech is committed to providing equal employment opportunities by country, state, and local laws. XG Tech does not discriminate against employees or applicants based on conditions such as race, color, gender identity and/or expression, sexual orientation, marital and/or parental status, religion, political opinion, nationality, ethnic background or social origin, social status, disability, age, indigenous status, and union.