NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.
Join NVIDIA, a groundbreaking leader in AI computing and visual technologies, at the forefront of innovation. As an AI in Industry Solution Architecture Intern, you'll be integral to our mission of redefining industries through AI and HPC. Our Solution Architect team builds innovative AI computing platforms, analyzes applications, and delivers outstanding value to our customers. This role offers a remarkable opportunity to harness NVIDIA's newest technologies to optimize large models, develop sophisticated AI workflows, and empower our clients with advanced AI solutions.
What you will be doing:
Develop and gain in‑depth understanding of open‑source inference frameworks such as SGLang and vLLM; collaborate with the community on new features and operators, performance optimization, and model enablement.
Design and implement CUDA kernels/operators (e.g., GEMM, attention and related primitives) for efficient and scalable LLM inference and training.
Develop and optimize KV‑cache offloading frameworks for LLM scenarios, enabling multi‑level KV‑cache offloading and reuse on CPU/SSD/remote storage to accelerate inference (team project: https://github.com/taco-project/FlexKV).
Take ownership of R&D work related to compute performance in distributed training, continuously exploring methods and techniques for performance optimization.
Conduct in‑depth research on computational problems in machine learning, summarize common computational patterns and requirements, and develop sample code, acceleration libraries, or framework components.
What we need to see:
Pursuing a Bachelor or Master or PhD in Electrical Engineering, Automation, Computer Science, Computational Mathematics, or related fields.
Strong interest in accelerated computing, parallel computing, and heterogeneous computing, and willingness to dive deep into these areas.
Solid programming skills; good understanding of data structures and general concepts of computer systems.
Strong ability to learn and adapt, with good skills in analyzing and formulating problems and exploring solutions independently.
Ways to stand out from the crowd:
Familiarity with heterogeneous computing, distributed training, parallel computing, or other high‑performance computing areas.
Experience in performance analysis, performance modeling, or performance optimization, and contributions to open‑source frameworks.
Strong capability in defining new problems and exploring solution spaces; this is critical for the role.
Proficiency with AI‑assisted programming tools.