Careers at Electronikmedia: Join Our Innovative Team

Enquiry

Your journey to success starts here.Let us help you with your inquiries!

Name ^* Company Name

Email Address ^* Phone (Optional)

Description

By submitting this form I accept the Privacy Policy of this site.

Go Back

AWS Infrastructure Application Support

Location: Remote

About the job

As a Level 3 AWS Infrastructure Support Engineer, you will own overnight monitoring and response for Electronikmedia’s Clients' AWS-based production environment. You will:
- Monitor system health using Datadog and AWS-native tools
- Investigate alerts and anomalies using established runbooks
- Resolve production incidents when possible
- Escalate complex issues quickly and accurately
- Maintain clean, auditable incident documentation
- This role is ideal for someone who thrives in high-trust, high-impact operational environments.
Key ResponsibilitiesOn-Call & Incident Response
- Provide initial response within 15 minutes for all high-priority production alerts
- Investigate, mitigate, and resolve production outages when feasible
- Escalate unresolved or complex issues using the defined escalation matrix
- Act as the owner of the production system stability
Monitoring, Alerting & Observability
- Analyze and respond to Datadog monitor alerts across infrastructure and application layers
- Identify abnormal patterns, trend-line deviations, and early indicators of systemic risk
- Proactively notify stakeholders of significant performance or stability concerns
- Contribute insights for preventive and corrective actions
Root Cause & Trend Analysis
- Track recurring alerts and incidents
- Provide analysis and recommendations to reduce alert noise and improve system resilience
- Participate in weekly validation of Datadog alert configurations and thresholds
Communication & Documentation
- Maintain clear, concise, and timely communication during incidents
- Document all incidents, alarms, and observations in Jira during each shift
- Ensure handoff notes are complete and actionable for daytime engineering teams
Technical Environment Core AWS Services
- ECS (Fargate)
- RDS
- ElastiCache
- EC2
- Lambda
- API Gateway
- S3
Tooling
- - Datadog (monitoring, alerts, dashboards)
  - Jira (incident tracking and documentation)

Qualifications & Experience

5+ years of hands-on AWS infrastructure administration and support
Proven experience supporting production-grade, high-availability systems
Strong background in incident response within enterprise or scale-up environments

Skills

Deep operational knowledge of AWS services and distributed systems
Strong troubleshooting and root-cause analysis skills under tight SLAs
Ability to follow runbooks while also knowing when to think beyond them
Calm, structured decision-making during production incidents

Certifications (Preferred)

AWS Certified Solutions Architect – Associate or Professional
AWS Certified DevOps Engineer – Professional (Nice to Have)

Service Level Expectations

Alert Escalation SLA: ≤ 15 minutes for high-priority alarms
Availability: Consistent overnight coverage ( IST Day Shift )
Reliability: Zero missed critical alerts during assigned coverage windows

Deliverables

Monthly Service Performance Report, including:
Alerts monitored
Incidents resolved
Escalations
SLA adherence metrics
Weekly Datadog Validation, ensuring alert accuracy and functionality

Apply