Resident Engineer - Data Center
--Fortek Pvt Ltd.--
Job Description:
Position Structure:
Department:
Line Manager:
Stream:
Job Requirements
-
Base Qualifications:
- Master Degree
- Graduate
Skills & Tools:
Education
Bachelor’s Degree in Electrical Engineering, Mechanical Engineering, Electronics, Mechatronics, or Computer Engineering
Diploma holders with extensive data center experience may also be considered
Key Skills & Competencies
Technical Skills
Strong knowledge of UPS, generators, electrical distribution, cooling systems
Hands-on experience with BMS, DCIM, and monitoring tools
Understanding of IT infrastructure, cabling standards, and rack power management
Strong troubleshooting and fault isolation skills
Operational & Soft Skills
SLA and KPI-driven mindset
Excellent incident handling and RCA skills
Strong documentation and reporting ability
Ability to work under pressure in critical environments
Excellent communication and coordination skills
Strong customer service orientation
Preferred Certifications
(highly desirable)
Uptime Institute:
Accredited Tier Specialist (ATS)
Data Center Operations Specialist (DCOS)
Vendor Certifications:
UPS / Cooling OEM certifications
IT & Service Management:
ITIL Foundation
ISO 27001 / ISO 20000 Awareness
Safety:
NEBOSH / IOSH
First Aid & Fire Safety
Job Experience Required:
Experience
5–8 years of hands-on experience in data center operations, preferably in managed services or SLA-driven environments
Proven experience working in Tier II / Tier III / Tier IV data centers
Experience managing mission-critical infrastructure with 24/7 operations
Working Conditions
24/7 operational environment with shift-based or on-call requirements
On-site presence at customer data center
High accountability role in mission-critical infrastructure
Duties & Responsibilities
-
Key Responsibilities
1. Data Center Operations & SLA Management
Act as the primary on-site engineer responsible for compliance with contractual Service Level Agreements (SLAs).
Ensure 24/7 availability, reliability, and performance of data center infrastructure.
Monitor and manage KPIs, MTTR, MTBF, uptime metrics, and service response times.
Ensure timely escalation, coordination, and resolution of incidents as per SLA matrix.
Maintain service continuity during planned and unplanned activities.
2. Electrical Systems Management
Operate and maintain LV/MV panels, UPS systems, battery banks, PDUs, ATS, STS, and grounding systems.
Monitor power quality, load balancing, redundancy (N, N+1, 2N), and capacity utilization.
Supervise UPS battery health checks, discharge tests, and replacement activities.
Coordinate shutdowns, switchovers, and power maintenance activities with zero impact to live IT loads.
3. Mechanical & Cooling Systems
Operate and monitor precision air conditioning systems (CRAC/CRAH), chillers, AHUs, DX units, and cooling towers.
Manage temperature, humidity, airflow, hot/cold aisle containment, and energy efficiency.
Perform root cause analysis for cooling alarms and thermal incidents.
Ensure compliance with ASHRAE thermal guidelines.
4. IT & Low Voltage Infrastructure
Support server racks, structured cabling, fiber/copper links, patching, labeling, and rack power distribution.
Coordinate installations, de-installations, and migrations with client IT teams.
Monitor network, BMS, DCIM, CCTV, access control, fire detection and suppression systems.
Ensure documentation accuracy for rack layouts, power maps, and connectivity diagrams.
5. Preventive & Corrective Maintenance
Plan and execute preventive maintenance (PM) schedules for all DC assets.
Supervise OEMs and subcontractors during PM and corrective maintenance (CM) activities.
Ensure all maintenance is performed in accordance with OEM guidelines and safety standards.
Review maintenance reports, punch lists, and corrective actions.
6. Incident Management & Root Cause Analysis
Lead incident response for alarms, faults, and outages.
Perform detailed Root Cause Analysis (RCA) and submit incident reports within defined timelines.
Implement corrective and preventive actions (CAPA) to avoid recurrence.
Participate in post-incident reviews and service improvement plans.
7. Monitoring, Reporting & Documentation
Monitor alarms and alerts through BMS, DCIM, EMS, and NMS platforms.
Prepare and submit daily logs, weekly summaries, and monthly SLA reports.
Maintain asset registers, O&M manuals, SOPs, EOPs, MOPs, and escalation matrices.
Ensure documentation readiness for audits and client reviews.
8. Compliance, Safety & Best Practices
Ensure compliance with ISO 27001, ISO 20000, ISO 22301, ISO 45001, and local safety regulations.
Enforce HSE policies, LOTO procedures, and risk assessments.
Support internal and external audits, certifications, and inspections.
Promote continuous improvement and operational excellence.
9. Client & Stakeholder Coordination
Act as the on-site technical interface between client, vendors, and internal teams.
Participate in service review meetings, change management discussions, and planning sessions.
Provide technical guidance, recommendations, and capacity planning insights to clients.
Maintain professional communication and customer satisfaction at all times.
Reporting Responsibilities (Daily, Weekly & Monthly)
-
Daily Reporting:
Daily Tasks:
Monitor DC infrastructure alarms and system health
Log readings for power, cooling, and environmental parameters
Respond to incidents, alarms, and service requests
Update shift logs and incident trackers
Coordinate minor maintenance and vendor activities
Weekly Reporting:
Weekly Tasks:
Review SLA performance and incident trends
Conduct preventive checks on critical systems
Validate backup systems, redundancy paths, and failover readiness
Update asset and configuration documentation
Participate in coordination and planning meetings
Monthly Reporting:
Monthly Tasks:
Perform and oversee scheduled preventive maintenance
Prepare and submit monthly SLA, availability, and performance reports
Conduct capacity, risk, and compliance reviews
Review RCAs and implement improvement actions
Support audits, drills, and management reviews