Resident Engineer - Data Center
--Fortek Pvt Ltd.--
Job Description:
The role ensures: Continuous uptime Preventive & corrective maintenance execution Emergency handling Energy optimization (PUE/WUE) Documentation compliance (4P model) Vendor & OEM coordination Uptime CTOS alignment Regulatory & HSSE compliance The engineer acts as the on-site technical authority and ensures adherence to contractual obligations, operational procedures, and safety standards.
Position Structure:
Department:
Line Manager:
Stream:
Job Requirements
-
Skills & Tools:
Education
Bachelor’s Degree in Electrical Engineering, Mechanical Engineering, Electronics, Mechatronics, or Computer Engineering
Diploma holders with extensive data center experience may also be considered
Key Skills & Competencies
Technical Skills
Strong knowledge of UPS, generators, electrical distribution, cooling systems
Hands-on experience with BMS, DCIM, and monitoring tools
Understanding of IT infrastructure, cabling standards, and rack power management
Strong troubleshooting and fault isolation skills
Operational & Soft Skills
SLA and KPI-driven mindset
Excellent incident handling and RCA skills
Strong documentation and reporting ability
Ability to work under pressure in critical environments
Excellent communication and coordination skills
Strong customer service orientation
Preferred Certifications
(highly desirable)
Uptime Institute:
Accredited Tier Specialist (ATS)
Data Center Operations Specialist (DCOS)
Vendor Certifications:
UPS / Cooling OEM certifications
IT & Service Management:
ITIL Foundation
ISO 27001 / ISO 20000 Awareness
Safety:
NEBOSH / IOSH
First Aid & Fire Safety
Job Experience Required:
Experience
5–8 years of hands-on experience in data center operations, preferably in managed services or SLA-driven environments
Proven experience working in Tier II / Tier III / Tier IV data centers
Experience managing mission-critical infrastructure with 24/7 operations
Working Conditions
24/7 operational environment with shift-based or on-call requirements
On-site presence at customer data center
High accountability role in mission-critical infrastructure
Duties & Responsibilities
-
Key Responsibilities
1. Data Center Operations & SLA Management
- Act as the primary on-site engineer responsible for compliance with contractual Service Level Agreements (SLAs).
- Ensure 24/7 availability, reliability, and performance of data center infrastructure.
- Monitor and manage KPIs, MTTR, MTBF, uptime metrics, and service response times.
- Ensure timely escalation, coordination, and resolution of incidents as per SLA matrix.
- Maintain service continuity during planned and unplanned activities.
2. Electrical Systems Management
Operate and maintain LV/MV panels, UPS systems, battery banks, PDUs, ATS, STS, and grounding systems.
Monitor power quality, load balancing, redundancy (N, N+1, 2N), and capacity utilization.
Supervise UPS battery health checks, discharge tests, and replacement activities.
Coordinate shutdowns, switchovers, and power maintenance activities with zero impact to live IT loads.
3. Mechanical & Cooling Systems
Operate and monitor precision air conditioning systems (CRAC/CRAH), chillers, AHUs, DX units, and cooling towers.
Manage temperature, humidity, airflow, hot/cold aisle containment, and energy efficiency.
Perform root cause analysis for cooling alarms and thermal incidents.
Ensure compliance with ASHRAE thermal guidelines.
4. IT & Low Voltage Infrastructure
Support server racks, structured cabling, fiber/copper links, patching, labeling, and rack power distribution.
Coordinate installations, de-installations, and migrations with client IT teams.
Monitor network, BMS, DCIM, CCTV, access control, fire detection and suppression systems.
Ensure documentation accuracy for rack layouts, power maps, and connectivity diagrams.
5. Preventive & Corrective Maintenance
Plan and execute preventive maintenance (PM) schedules for all DC assets.
Supervise OEMs and subcontractors during PM and corrective maintenance (CM) activities.
Ensure all maintenance is performed in accordance with OEM guidelines and safety standards.
Review maintenance reports, punch lists, and corrective actions.
6. Incident Management & Root Cause Analysis
Lead incident response for alarms, faults, and outages.
Perform detailed Root Cause Analysis (RCA) and submit incident reports within defined timelines.
Implement corrective and preventive actions (CAPA) to avoid recurrence.
Participate in post-incident reviews and service improvement plans.
7. Monitoring, Reporting & Documentation
Monitor alarms and alerts through BMS, DCIM, EMS, and NMS platforms.
Prepare and submit daily logs, weekly summaries, and monthly SLA reports.
Maintain asset registers, O&M manuals, SOPs, EOPs, MOPs, and escalation matrices.
Ensure documentation readiness for audits and client reviews.
8. Compliance, Safety & Best Practices
Ensure compliance with ISO 27001, ISO 20000, ISO 22301, ISO 45001, and local safety regulations.
Enforce HSE policies, LOTO procedures, and risk assessments.
Support internal and external audits, certifications, and inspections.
Promote continuous improvement and operational excellence.
9. Client & Stakeholder Coordination
Act as the on-site technical interface between client, vendors, and internal teams.
Participate in service review meetings, change management discussions, and planning sessions.
Provide technical guidance, recommendations, and capacity planning insights to clients.
Maintain professional communication and customer satisfaction at all times.
Reporting Responsibilities (Daily, Weekly & Monthly)
-
Daily Reporting:
Daily Tasks:
Monitor ELECT & HVAC parameters
Log UPS, DG, battery status
Monitor BMS alarms
Check FSS panel status
Update shift log
Close open work orders
Weekly Reporting:
Weekly Tasks:
Validate PM completion
Inspect fire suppression system
Review spare inventory
Conduct redundancy checks
Review incident trends
Monthly Reporting:
Monthly Tasks:
Prepare SLA report
Calculate PUE/WUE
Conduct emergency drill
Vendor performance review
Update asset lifecycle register