Resident Engineer - Data Center

Islamabad, Pakistan

Job Description:

The Resident Engineer – Data Center is responsible for the end-to-end on-site technical operations, maintenance, and SLA governance of mission-critical data center infrastructure. This role acts as the single on-site technical authority and service representative, ensuring maximum uptime, compliance with SLAs, operational excellence, and seamless coordination between the client, OEMs, subcontractors, and internal support teams.

The position requires deep hands-on expertise across electrical, mechanical, IT, and monitoring systems, strong troubleshooting capabilities, and the ability to manage incident response, preventive maintenance, compliance audits, and service reporting in line with industry best practices (Uptime Institute, TIA, ISO, ITIL).


Position Structure:

Department:
Technical & Projects

Line Manager:
Muhammad Emmad

Stream:
Data Centers


Job Requirements

  • Key Qualifications:
    • Master Degree
    • Graduate

    Position Type:
    • FULL-TIME
  • Education

    • Bachelor’s Degree in Electrical Engineering, Mechanical Engineering, Electronics, Mechatronics, or Computer Engineering

    • Diploma holders with extensive data center experience may also be considered

    Key Skills & Competencies

    Technical Skills

    • Strong knowledge of UPS, generators, electrical distribution, cooling systems

    • Hands-on experience with BMS, DCIM, and monitoring tools

    • Understanding of IT infrastructure, cabling standards, and rack power management

    • Strong troubleshooting and fault isolation skills

    Operational & Soft Skills

    • SLA and KPI-driven mindset

    • Excellent incident handling and RCA skills

    • Strong documentation and reporting ability

    • Ability to work under pressure in critical environments

    • Excellent communication and coordination skills

    • Strong customer service orientation


    Preferred Certifications

    (highly desirable)

    • Uptime Institute:

      • Accredited Tier Specialist (ATS)

      • Data Center Operations Specialist (DCOS)

    • Vendor Certifications:

      • UPS / Cooling OEM certifications

    • IT & Service Management:

      • ITIL Foundation

      • ISO 27001 / ISO 20000 Awareness

    • Safety:

      • NEBOSH / IOSH

      • First Aid & Fire Safety


    Required experience:

    Experience

    • 5–8 years of hands-on experience in data center operations, preferably in managed services or SLA-driven environments

    • Proven experience working in Tier II / Tier III / Tier IV data centers

    • Experience managing mission-critical infrastructure with 24/7 operations

    Working Conditions

    • 24/7 operational environment with shift-based or on-call requirements

    • On-site presence at customer data center

    • High accountability role in mission-critical infrastructure

Duties & Responsibilities

  • Key Responsibilities

    1. Data Center Operations & SLA Management

    • Act as the primary on-site engineer responsible for compliance with contractual Service Level Agreements (SLAs).

    • Ensure 24/7 availability, reliability, and performance of data center infrastructure.

    • Monitor and manage KPIs, MTTR, MTBF, uptime metrics, and service response times.

    • Ensure timely escalation, coordination, and resolution of incidents as per SLA matrix.

    • Maintain service continuity during planned and unplanned activities.


    2. Electrical Systems Management

    • Operate and maintain LV/MV panels, UPS systems, battery banks, PDUs, ATS, STS, and grounding systems.

    • Monitor power quality, load balancing, redundancy (N, N+1, 2N), and capacity utilization.

    • Supervise UPS battery health checks, discharge tests, and replacement activities.

    • Coordinate shutdowns, switchovers, and power maintenance activities with zero impact to live IT loads.


    3. Mechanical & Cooling Systems

    • Operate and monitor precision air conditioning systems (CRAC/CRAH), chillers, AHUs, DX units, and cooling towers.

    • Manage temperature, humidity, airflow, hot/cold aisle containment, and energy efficiency.

    • Perform root cause analysis for cooling alarms and thermal incidents.

    • Ensure compliance with ASHRAE thermal guidelines.


    4. IT & Low Voltage Infrastructure

    • Support server racks, structured cabling, fiber/copper links, patching, labeling, and rack power distribution.

    • Coordinate installations, de-installations, and migrations with client IT teams.

    • Monitor network, BMS, DCIM, CCTV, access control, fire detection and suppression systems.

    • Ensure documentation accuracy for rack layouts, power maps, and connectivity diagrams.


    5. Preventive & Corrective Maintenance

    • Plan and execute preventive maintenance (PM) schedules for all DC assets.

    • Supervise OEMs and subcontractors during PM and corrective maintenance (CM) activities.

    • Ensure all maintenance is performed in accordance with OEM guidelines and safety standards.

    • Review maintenance reports, punch lists, and corrective actions.


    6. Incident Management & Root Cause Analysis

    • Lead incident response for alarms, faults, and outages.

    • Perform detailed Root Cause Analysis (RCA) and submit incident reports within defined timelines.

    • Implement corrective and preventive actions (CAPA) to avoid recurrence.

    • Participate in post-incident reviews and service improvement plans.


    7. Monitoring, Reporting & Documentation

    • Monitor alarms and alerts through BMS, DCIM, EMS, and NMS platforms.

    • Prepare and submit daily logs, weekly summaries, and monthly SLA reports.

    • Maintain asset registers, O&M manuals, SOPs, EOPs, MOPs, and escalation matrices.

    • Ensure documentation readiness for audits and client reviews.


    8. Compliance, Safety & Best Practices

    • Ensure compliance with ISO 27001, ISO 20000, ISO 22301, ISO 45001, and local safety regulations.

    • Enforce HSE policies, LOTO procedures, and risk assessments.

    • Support internal and external audits, certifications, and inspections.

    • Promote continuous improvement and operational excellence.


    9. Client & Stakeholder Coordination

    • Act as the on-site technical interface between client, vendors, and internal teams.

    • Participate in service review meetings, change management discussions, and planning sessions.

    • Provide technical guidance, recommendations, and capacity planning insights to clients.

    • Maintain professional communication and customer satisfaction at all times.

Reporting Responsibilities


  • Daily Tasks:

    Monitor DC infrastructure alarms and system health
    Log readings for power, cooling, and environmental parameters
    Respond to incidents, alarms, and service requests
    Update shift logs and incident trackers
    Coordinate minor maintenance and vendor activities

    Weekly Tasks:

    Review SLA performance and incident trends
    Conduct preventive checks on critical systems
    Validate backup systems, redundancy paths, and failover readiness
    Update asset and configuration documentation
    Participate in coordination and planning meetings

    Monthly Tasks:

    Perform and oversee scheduled preventive maintenance
    Prepare and submit monthly SLA, availability, and performance reports
    Conduct capacity, risk, and compliance reviews
    Review RCAs and implement improvement actions
    Support audits, drills, and management reviews