...
Site Reliability Engineer

Octal Philippines Inc.

Philippines, Makati City, National Capital Region

Remote

Full-time

Python

Python

Java

Java

Bash

Bash

AWS

AWS

Azure

Azure

Docker

Docker

Kubernetes

Kubernetes

posted 3 days ago

Octal Philippines Inc. is looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team. The SRE will be responsible for ensuring that our systems are reliable, scalable, and efficient. You will play a critical role in maintaining the uptime of our services, improving system performance, and automating processes to enhance productivity. The ideal candidate is a passionate technologist who thrives in a fast-paced environment and enjoys tackling complex challenges.

Responsibilities:

  • Monitor and maintain the reliability and availability of production systems
  • Implement automation to reduce operational toil and improve system reliability
  • Identify and resolve performance issues and outages
  • Collaborate with development teams to design scalable and robust systems
  • Create and maintain SRE documentation and runbooks
  • Participate in on-call rotation and incident response activities
  • Continuously improve tooling and processes to enhance the efficiency of operations

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field
  • At least 3 years of experience in a Site Reliability Engineering or related role
  • Strong experience with cloud platforms such as AWS, Azure, or GCP
  • Proficiency in scripting and programming languages (e.g., Python, Go, Bash)
  • Experience with containerization and orchestration technologies like Docker and Kubernetes
  • Strong understanding of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
  • Excellent troubleshooting and analytical skills
  • Ability to work collaboratively in a team-oriented, fast-paced environment

Responsibilities:

• Bachelor's degree in computer science, Engineering, or a related technical field, or equivalent practical experience.

• Proven experience in a Site Reliability Engineer or similar role, with a focus on designing and implementing scalable systems.

• Strong proficiency in programming languages, scripting and automation (Java, ReactJS, etc.).

• Experience with cloud platforms such as AWS, Azure, or GCP, and container orchestration tools like Kubernetes.

• Deep understanding of networking, system administration, Windows, and Linux/Unix-based environments.

• Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.

• Strong communication skills and the ability to work effectively in a collaborative team environment and to stakeholders

Benefits

Communication Allowance, Health & Life Insurance & Others

Other similar jobs


Popular learning modules