DevOps/DevSecOps Engineer
4 days ago
Key ResponsibilitiesCloud Infrastructure & AutomationDesign, deploy, and maintain cloud environments on AWS or GCP.Develop and manage infrastructure u.....
Key Responsibilities
Cloud Infrastructure & Automation
- Design, deploy, and maintain cloud environments on AWS or GCP.
- Develop and manage infrastructure using Infrastructure-as-Code tools such as Terraform or similar technologies.
- Implement secure networking architectures, including VPCs, private connectivity, VPNs, peering, and network segmentation.
- Establish scalable multi-environment and multi-account cloud strategies.
Container Platform & Kubernetes
- Build and manage Kubernetes environments for production workloads.
- Configure cluster security, RBAC, autoscaling, workload isolation, and governance controls.
- Create standardized deployment frameworks for microservices, backend services, and AI-related workloads.
- Support both real-time and batch processing environments.
CI/CD & Release Automation
- Design and optimize CI/CD pipelines to streamline software delivery.
- Implement automated testing, deployment validation, and release management processes.
- Support deployment strategies such as blue-green, canary, and phased rollouts.
- Improve developer productivity through automation and deployment standardization.
Observability & Site Reliability
- Implement monitoring, logging, tracing, and alerting solutions.
- Define service reliability metrics, performance objectives, and operational standards.
- Support incident response processes, root-cause analysis, and continuous service improvement.
- Develop operational runbooks and support on-call readiness.
Security & DevSecOps
- Strengthen cloud security through identity and access management best practices.
- Manage secrets, certificates, and sensitive configuration securely.
- Implement vulnerability scanning, dependency analysis, and container security controls.
- Enforce security policies through automation and governance frameworks.
- Support audit readiness and compliance requirements.
AI/ML Platform Support
- Collaborate with data science and machine learning teams to operationalize AI solutions.
- Support model deployment, inference services, batch processing, and ML infrastructure.
- Monitor performance, availability, and resource consumption of AI workloads.
- Contribute to platform capabilities that improve AI product scalability.
Cost Optimization
- Monitor cloud spending and resource utilization.
- Implement tagging, budgeting, and optimization initiatives.
- Drive efficiency improvements without compromising performance or reliability.
Developer Experience
- Build self-service infrastructure capabilities and reusable engineering templates.
- Improve internal tooling and automation workflows.
- Maintain clear technical documentation and operational guidelines.
- Help create a smooth developer experience across engineering teams.
Requirements
Essential Skills & Experience
- Minimum 4 years of experience in DevOps, Site Reliability Engineering, Platform Engineering, or related infrastructure roles.
- Strong hands-on experience with AWS or Google Cloud Platform.
- Proven expertise with Infrastructure-as-Code tools, preferably Terraform.
- Solid Kubernetes administration and containerization experience (Docker, Helm, Kustomize, etc.).
- Strong knowledge of CI/CD pipelines and modern software delivery practices.
- Good understanding of Linux systems, networking, and automation scripting using Python and/or Bash.
- Experience implementing monitoring, logging, and observability solutions.
- Strong understanding of cloud security principles, IAM, secrets management, and infrastructure hardening.
- Ability to collaborate effectively with software engineers, product stakeholders, and technical leadership.
- Strong documentation and communication skills.
Preferred Qualifications
- Experience with GitOps practices and tools such as ArgoCD or Flux.
- Exposure to policy-as-code and infrastructure testing frameworks.
- Familiarity with data engineering or machine learning ecosystems.
- Experience working with distributed systems, streaming platforms, or large-scale data processing technologies.
- Knowledge of enterprise compliance, governance, and security review processes.
- Exposure to reliability engineering practices such as resilience testing or disaster recovery exercises.
Official account of Jobstore.