This role focuses on diagnosing and resolving issues
escalated from Level 1 (L1) support, maintaining system stability, and
coordinating with engineering and vendor teams to ensure continuous service
availability and optimal performance. Key Responsibilities: Incident
Management:
• Investigate and resolve incidents escalated by the L1
support team.
• Conduct root cause analysis (RCA) for recurring or
critical issues.
• Escalate complex incidents to Level 3 (L3) or vendor teams
where necessary. System Monitoring & Maintenance:
• Monitor system performance, data flows, and integrations
• Perform routine system health checks, log reviews, and
preventive maintenance tasks.
• Implement minor configuration changes and system
optimizations as required. Problem & Change Management:
• Support problem management activities by identifying and
addressing underlying technical issues.
• Participate in change management processes, including
testing and deployment of patches, upgrades, and new features. Collaboration
& Communication:
• Work closely with L1 support, infrastructure, data
engineering, and vendor teams to ensure timely issue resolution. • Document
troubleshooting procedures, resolutions, and updates for knowledge base
improvement. 1 |
• Communicate technical updates and incident statuses to
relevant stakeholders. Continuous Improvement:
• Recommend improvements to monitoring tools, workflows, and
processes to enhance system reliability.
• Support system audits, compliance checks, and performance
reviews. Requirements / Qualifications - Diploma or Degree in Information
Technology, Computer Science, or a related field. - Experience in application
or platform support, preferably in data systems or cloud environments. -
Familiarity with DataRobot, or data streaming technologies is an advantage. -
Strong analytical and troubleshooting abilities. - Proficient in incident and
problem management processes (e.g., ITIL framework). - Good communication and
stakeholder management skills.