Site Reliability Engineer - 2
Skills
About the role
At Navan, “It’s all about the user. All of them.” We’re passionate about providing a seamless one-stop experience for business travelers, no matter how they travel, where they stay, or where they’re going.
We are constantly striving to make the most reliable and scalable systems possible to ensure that our services are available to our travelers when they need it most. With our exponential growth, we have many exciting challenges ahead and we’re looking for a passionate Site Reliability Engineer to join our team. As an SRE you will design and develop tooling, automation and infrastructure services that power the Navan services, used by thousands of travelers on a daily basis. You will work closely with development teams, release and productivity teams and security teams to identify customer needs and build innovative solutions to solve them.
What You'll Do
Building a fast moving, high growth service. Navan is revolutionizing travel and expense services for the enterprise, and the product is evolving quickly. You are comfortable in a startup environment, enjoy seeing the product take shape, and have strong ownership of the success of your services.
Designing, implementing and operating cloud infrastructure. You’re a fit for us if you think in terms of infrastructure as code, deployment pipelines, and building the guardrails to make going fast also going safely.
Identifying reliability anti-patterns and solving them systemically. You dive deep into the data to evaluate the health of your systems, and you use it to improve visibility and reliability across the fleet of services.
Finding and automating the toil out of our processes. You’d prefer to automate it entirely, or build a tool to empower your users rather than be the gatekeeper to the tool.
Leveraging AI tools and platforms in your daily work to achieve autonomous operations, reduce toil, and improve system observability.
Contributing to the definition and adoption of system reliability standards, including formalizing SLO/SLI frameworks, observability standards, and blameless post-mortem practices.
Assisting in the adoption of AI-assisted developer tools and platforms to increase engineering productivity, enforce code quality standards, and enable real-time architectural validation.
What We’re Looking For
2+ years of progressive experience as an SRE or equivalent role.
Passionate about solving problems and learning new tools and technologies
Excellent communication skills working with stakeholders and domain experts across the company to design solutions to user problems
Thrive in a fast-paced environment
Demonstrated ability to contribute to and take ownership of technical infrastructure projects.
Operate with a strong sense of ownership demonstrated through shipping production-quality code and infrastructure equipped with testing, monitoring and documentation
Hands-on operational experience with Java based applications and services including JVM profiling and performance tuning (python, Node.js and Go are a plus)
Hands-on experience building and operating distributed systems in a public cloud environment (preferably AWS), using CI/CD to deploy, manage and operate production systems, focusing on tooling and automation using tools such as maven and Jenkins.
Hands-on experience with microservice architecture and related reliability and resiliency patterns such as throttling, queueing, and retries
Hands-on experience with writing Infrastructure as Code in Terraform or Cloudformation or similar tools
A passion for automating away everything, using scripting languages such as python, bash groovy (we prefer lazy engineers)
Built, using, and automating monitoring systems such as NewRelic, DataDog, SignalFX, Kibana,
Hands-on experience deploying, operating, and monitoring production-grade AI/ML microservices (e.g., RAG pipelines, agentic systems) on cloud platforms like AWS Fargate/ECS.
Experience leveraging AI/LLM platforms (e.g., Gemini, Braintrust) and managing their secrets and infrastructure using Infrastructure as Code (Terraform) and AWS SSM.
Demonstrated ability to integrate AI-specific telemetry and advanced observability practices to enable predictive insights and systemic root-cause analysis.
The posted pay range represents the anticipated low and high end of the compensation for this position and is subject to change based on business need. To determine a successful candidate’s starting pay, we carefully consider a variety of factors, including primary work location, an evaluation of the candidate’s skills and experience, market demands, and internal parity.For roles with on-target-earnings (OTE), the pay range includes both base salary and target incentive compensation. Target incentive compensation for some roles may include a ramping draw period. Compensation is higher for those who exceed targets. Candidates may receive more information from the recruiter.
Pay Range: $86,325 USD - $191,900 USD
Questions about this role
How do I apply to this Site Reliability Engineer - 2 role at Navan?
Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.
What's the typical salary for DevOps / SRE in United States?
Compensation for DevOps / SRE roles in United States varies widely by seniority, employer size, and remote vs onsite arrangement. Check the salary range on this listing when published, or browse our DevOps / SRE hub for United States medians across recent openings.
How fast does AI Applyd auto-apply?
Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.
What ATS does Navan use?
AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.
Want AI Applyd to auto-apply to roles like this?
We tailor your resume per posting, fill the forms, and track replies for you.