System Architect (Lead) – | Up to $18,000
2 months ago
About the RoleWe are seeking a highly skilled System Architect to lead the deployment and optimization of NVIDIA GB200 NVL72 clusters in our cutting-e.....
About the Role
We are seeking a highly skilled System Architect to lead the deployment and optimization of NVIDIA GB200 NVL72 clusters in our cutting-edge AIDC facilities. This is a rare opportunity to work at the intersection of high-performance computing, data center engineering, and precision system architecture. You will translate NVIDIA’s global reference architectures into actionable, site-specific implementations, ensuring peak operational efficiency and strict compliance with technical standards.
Key Responsibilities
NVIDIA Reference Architecture Implementation
Lead deployment of GB200 NVL72 clusters by adhering to NVIDIA Reference Architectures. Customize global blueprints to local facility constraints while maintaining 100% compliance with power, thermal, and networking standards.
Field Technical Guidelines (“Show Tech”)
Develop and maintain master technical manuals translating NVIDIA’s high-level specifications into actionable procedures, including:
Fluidic Operational Windows: Define pressure drop and flow-rate limits per rack.
Connectivity Integrity: Set pass/fail thresholds for 800G link-training and Bit Error Rate (BER).
Fiber Hygiene: Implement standardized “Inspect-Clean-Inspect” protocols for optical interfaces.
Infrastructure Validation & Modeling
Conduct gap analysis and validation modeling to ensure facility CDUs and 54V DC power distribution meet 48-rack cluster requirements.
Environmental & Dew Point Control
Author guidelines for high-humidity environments, defining coolant temperature offsets relative to ambient dew points to prevent condensation and ensure safe operations without custom software.
Compliance & Signal Integrity Standards
Audit NVIDIA-certified Fat-Tree topologies to ensure strict adherence to Signal Integrity (SI) requirements. Validate component selection and routing for 800G/1.6T performance compliance.
Tier-3 Technical Escalation
Serve as the final authority for resolving conflicts between site limitations and reference architecture requirements, delivering data-driven solutions.
Vendor Technical Governance
Act as the primary technical liaison between NVIDIA, OEM vendors, and M&E contractors, ensuring seamless integration of third-party components into the cluster ecosystem.
Qualifications & Experience
Bachelor’s or Master’s degree in Electrical, Mechanical, Computer, or related Engineering disciplines.
Proven experience in large-scale HPC or AI cluster deployment, ideally with NVIDIA systems.
Strong expertise in data center infrastructure, including power, thermal management, and networking.
Deep knowledge of 800G/1.6T connectivity, Fiber Optics, and high-performance computing environments.
Experience in drafting technical manuals, operational guidelines, and compliance documentation.
Excellent problem-solving skills and ability to resolve complex system-level conflicts.
Strong vendor management and cross-functional collaboration skills.
Official account of Jobstore.