Site Reliability Engineer at Levi Strauss in Ciudad De México, MX

Skills

kubernetesprometheusterraformbigqueryworkdaygrafanadatadoggithubpythonazurehelmcicdgooglecloudllmgoml

About the role

Job Location: Mexico City, Mexico

Calling all originals: At Levi Strauss & Co., you can be yourself — and be part of something bigger. We’re a company of people who like to forge our own path and leave the world better than we found it. Who believe that what makes us different makes us stronger. So add your voice. Make an impact. Find your fit — and your future.

We're seeking a curious and driven Site Reliability Engineer to join our Data & AI Platform Engineering team. In this role, you'll help keep our data and AI platforms running reliably, efficiently, and securely — platforms that power decisions across our global retail operations.

You'll work alongside experienced SREs and engineers to monitor production systems, respond to incidents, reduce operational toil, and build the automation that makes our infrastructure more resilient. This is an excellent opportunity to grow your SRE craft in a fast-paced, collaborative environment on Google Cloud Platform, with exposure to multi-cloud technologies and modern data engineering.

About the Job

Reliability & Incident Response

Monitor production systems using observability tooling — dashboards, alerts, and logs — to detect and triage issues before they impact end users

Participate in on-call rotations , respond to incidents following established runbooks, and escalate appropriately when needed

Contribute to blameless post-mortems , documenting root causes and follow-up action items to prevent recurrence

Help maintain and improve SLO dashboards and alerting thresholds to ensure platform health is visible and measurable

Toil Reduction & Automation

Identify repetitive manual tasks and build automation to eliminate them , reducing toil for yourself and the broader team

Write and maintain scripts, tooling, and CI/CD pipeline components that improve deployment reliability and operational efficiency

Support self-serve infrastructure initiatives that allow engineering teams to safely provision and manage their own resources

Platform Operations & Cloud Infrastructure

Operate and maintain workloads running on GCP — including GKE, Cloud Run, BigQuery , Pub/Sub, GCS, and Composer

Apply Infrastructure-as-Code practices (Terraform, Helm) to consistently and safely manage and version infrastructure changes

Support multi-cloud awareness across GCP and Azure, following team standards for consistency and security across environments

Adhere to data security and governance policies — IAM best practices, secrets management, encryption, and audit logging

Collaboration & Growth

Work closely with Data Engineering, AI Platform, and Software Engineering teams to ensure reliability is considered from design through deployment

Participate in reliability reviews, design discussions, and team ceremonies , contributing ideas and raising operational concerns early

Engage with AI and agentic platform workloads , gaining exposure to the operational patterns of LLM-based systems and data pipelines

Continuously develop your technical skills and SRE craft, supported by team knowledge-sharing, documentation, and hands-on experience

About You

Required Qualifications

Bachelor's degree in Computer Science , Engineering, or related field (or equivalent practical experience)

6+ years of experience in Site Reliability Engineering, DevOps, or Platform/Infrastructure Engineering in production environments

Hands-on experience with GCP services — particularly GKE, Cloud Run, BigQuery , Pub/Sub, and GCS

Working proficiency with Infrastructure-as-Code tools such as Terraform or Helm

Familiarity with observability tooling — metrics, logging, tracing, and alerting (e.g., Cloud Monitoring, Datadog, or Prometheus/Grafana)

Understanding of SLO/SLI concepts and how they relate to production reliability and on-call operations

Exposure to data security fundamentals : IAM, encryption, secrets management, and network policies

Proficiency in at least one scripting or systems language ( Python, Bash, or Go ) for automation and operational tooling

Strong communication skills with the ability to clearly document incidents, runbooks, and technical processes

Technical Familiarity

Experience with container orchestration — Kubernetes or GKE — and the operational patterns around deploying and managing containerized workloads

Basic understanding of CI/CD pipelines and GitOps workflows ( ArgoCD , GitHub Actions, or similar)

Comfort working with data platforms — familiarity with batch or streaming data pipelines is a plus

Awareness of multi-cloud concepts , particularly across GCP and Azure

Desirable Experience

Experience working in retail, e-commerce, or consumer goods environments

Familiarity with Google's SRE principles — error budgets, toil tracking, and production readiness reviews

Exposure to AI or ML platform operations , including monitoring model serving infrastructure

Experience with FinOps or cloud cost visibility tooling

Why Join Us?

If you're an engineer who is passionate about reliability, loves solving operational problems, and wants to grow your SRE craft at a global iconic brand, we'd love to hear from you .

LOCATION

Mexico, D.F., Mexico

FULL TIME/PART TIME

Full time

Current LS&Co Employees, apply via your Workday account.

Questions about this role

Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.

Compensation for DevOps / SRE roles in Mexico varies widely by seniority, employer size, and remote vs onsite arrangement. Check the salary range on this listing when published, or browse our DevOps / SRE hub for Mexico medians across recent openings.

Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.

AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, Personio, Teamtailor and other major ATS platforms. If we can submit through the platform, we do.

Want AI Applyd to auto-apply to roles like this?

We tailor your resume per posting, fill the forms, and track replies for you.

Start free Report this listing

Skills

About the role

Questions about this role

How do I apply to this Site Reliability Engineer role at Levi Strauss?

What's the typical salary for DevOps / SRE in Mexico?

How fast does AI Applyd auto-apply?

What ATS does Levi Strauss use?