Role Overview
Experienced Senior HPC Engineer / Architect specializing in Linux-based high-performance computing (HPC) environments, EDA workflows, and automation-driven infrastructure. Proven expertise in designing, managing, and optimising large-scale distributed HPC clusters supporting ASIC EDA workloads.
Key Responsibilities
- Architect, deploy, and manage large-scale distributed HPC environments across global locations, supporting ASIC and GPU compute clusters
- Design and implement infrastructure automation using Ansible, Shell, and Python for system lifecycle management
- Administer and optimize workload schedulers (LSF, Slurm, NC) including queue configuration, fair-share policies, and job prioritization
- Perform deep troubleshooting and root cause analysis across compute, storage, networking, and scheduler layers
- Collaborate with engineering teams to improve EDA workload performance and efficiency in global HPC environments
- Develop and deploy self-service automation solutions to reduce manual effort and improve system reliability
- Manage and support EDA ecosystem including tool deployment (Cadence, Synopsys), licensing, and workflow optimization
- Implement monitoring & observability frameworks using tools like Splunk, Grafana for proactive issue detection
- Drive capacity planning, performance tuning, and resource optimization for HPC workloads
- Create and maintain technical documentation, runbooks, and operational standards
- Provide technical leadership and mentoring, influencing HPC architecture and long-term strategy
Techncal Skills
HPC & Scheduling: LSF, Slurm, Network Computer (NC), Grid/Batch scheduling
Operating Systems: RedHat Enterprise Linux (RHEL), CentOS
Automation & Scripting: Ansible, Shell/Bash, Python
EDA Tools: Cadence, Synopsys, EDA workflows & design environments
Monitoring & Observability: Splunk, Grafana, Prometheus
Storage & Filesystems: NFS, AutoFS, distributed storage systems
Authentication & Access: UNIX/Linux integrated with Active Directory
Infrastructure: On-premises & Hybrid HPC environments
Remote Access & VDI: Exceed TurboX, VNC, nomachine
Preferred Skills
- Extensive experience with job schedulers such as LSF, Slurm, or equivalent platforms
- Experience supporting EDA / semiconductor design environments
- Exposure to GPU computing and accelerator-based workloads
- Knowledge of EDA licensing systems and optimization
- Experience with Infrastructure as Code (IaC) and platform standardization
- Familiarity with cloud or hybrid HPC architectures (AWS/Azure HPC)