Skip to content

Senior Site Reliability Engineer

Omilia

AUonsitePosted Jun 4, 2026

Skills

kubernetesprometheusterraformpostgresansiblegrafanadockerpythonredismysqlawsgo

About the role

Description

We are looking for a Senior Site Reliability Engineer with Cloud platform experience. This individual will be part of a team responsible for operating and maintaining production clusters and developing our observability solutions; they will collaborate with team members to develop automation strategies, monitoring & alerting, and ensuring overall platform reliability. Your goal will be to become an integral part of the team, making every challenge of the platform – your own challenge, and solving them accordingly.

Responsibilities

Ensure platform reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation.

First response for incidents, contribute to problem management and root cause analysis.

Supporting the development team's effort towards reliability, creating a solid reliability culture within the development lifecycle.

Develop troubleshooting documentation for production support resources.

Collaborate with Engineering teams to develop optimised and productive runbooks, operational documentation and automation of operational tasks.

Collaborate with development and cloud engineering teams to embed reliability and performance into the software delivery lifecycle.

Design, implement, and evolve observability solutions (metrics, logs, traces, dashboards) using tools such as Prometheus, Grafana, and ELK.

Participate in on-call rotations and continuously improve alert quality and response processes.

Champion a culture of reliability, performance, and continuous improvement across teams.

Requirements

Bachelor's Degree or MS in Engineering or equivalent.

Experience in operating at least one container orchestration cluster (Kubernetes, Docker Swarm).

Experience developing or maintaining software for production services at scale.

Experience with ELK.

Experience with AWS.

Experience with Grafana/Prometheus stack.

Strong scripting skills (Bash, Python or Go).

Excellent communication skills.

Thinking out of the box and anticipating challenges. It is imperative we are not simply reactive; we must expect challenges and question technologies, procedures and thinking already in place. You will be expected to constantly review and challenge at all levels.

Versatility. We work with agile/lean methods. We'd much rather iterate and learn than assume we know all the answers.

Being a team player. You don't (always) work in isolation and are excited by the thought of using your team whilst involving product, experience design, engineering, and more in the process.

Will be considered as a plus:

Telephony knowledge (SIP, VoIP);

Experience in Linux Administration (RedHat, CentOS, AL);

Working knowledge in Configuration Management tools (Terraform, Ansible);

Experience with TCP/IP and general networking concepts;

RDBMS knowledge (MySQL, Postgres);

NoSQL knowledge (Redis).

Benefits

Fixed compensation;

Long-term employment with the working days vacation;

Development in professional growth (courses, training, etc);

Being part of successful cutting-edge technology products that are making a global impact in the service industry;

Proficient and fun-to-work-with colleagues;

Apple gear.

Questions about this role

  • How do I apply to this Senior Site Reliability Engineer role at Omilia?

    Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.

  • What's the typical salary for DevOps / SRE in Australia?

    Compensation for DevOps / SRE roles in Australia varies widely by seniority, employer size, and remote vs onsite arrangement. Check the salary range on this listing when published, or browse our DevOps / SRE hub for Australia medians across recent openings.

  • How fast does AI Applyd auto-apply?

    Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.

  • What ATS does Omilia use?

    AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.

Want AI Applyd to auto-apply to roles like this?

We tailor your resume per posting, fill the forms, and track replies for you.