We are seeking a hands-on Platform Engineer to support a high performance computing platform used by computational scientists in Research & Development. This role focuses on AWS infrastructure, DevOps automation, container platforms, and high throughput storage, with heavy use of infrastructure as code. You will own cloud and HPC infrastructure end-to-end and work closely with scientists and engineers to deliver scalable, reliable, and automated platform solutions.
Key Responsibilities:
- Design, build, and operate scalable and high performance cloud infrastructure on AWS
- Manage infrastructure as code using Terraform, Terragrunt, and CloudFormation
- Build immutable infrastructure with Packer
- Develop and maintain CI/CD pipelines using GitLab CI/CD
- Operate containerized workloads across:
- Amazon EKS
- Docker on EC2
- Singularity (Apptainer) for HPC workloads
- Configure systems using Ansible
- Design and operate high throughput cloud and HPC storage solutions
- Monitor, troubleshoot, and optimize platforms for performance, reliability, and cost
- Document architectures and operational best practices