Senior Software Engineer, AIOps

NVIDIA

Tel Aviv, ILonsitePosted Jun 15, 2026

Skills

kubernetesc++rustgoml

About the role

NVIDIA is powering the world's most advanced AI Factories. To ensure their seamless operation, we are building a mission-critical Observability and Prediction platform - delivered as both a high-scale SaaS solution and a robust on-premises deployment for our largest enterprise customers.

We are looking for a Senior Software Engineer to join the AIOps platform team and help build the core distributed systems that ingest massive telemetry streams from GPU clusters and operationalize predictive AI models at scale. You will work at the intersection of high-performance data engineering and production ML, turning research algorithms into reliable, mission-critical software.

What you'll be doing:

Architect and build an agentic AIOps system that autonomously monitors GPU fleet health, aggregates and correlates massive telemetry streams, surfaces intelligent alerts, and orchestrates multi-step diagnostic workflows and corrective actions - powering real-time dashboards, automated root-cause analysis, and proactive incident response.

Research, evaluate, and prototype data storage strategies and data representations across diverse database technologies and modalities, ensuring AI models are trained on high-quality, well-structured data that improves predictive accuracy and generalization.

High-Scale Engineering: Design distributed systems to handle the extreme telemetry density of large-scale AI clusters, ensuring efficient data ingestion, processing, and real-time analysis.

Instrument services with deep observability (metrics, logs, traces) to support rapid debugging and continuous performance improvement.

Build and own the model-serving infrastructure that operationalizes predictive algorithms at scale - packaging, versioning, deploying, and monitoring AI models in both SaaS and on-premises environments.

Contribute to the platform's core libraries and abstractions that accelerate development across the broader AIOps engineering team.

What we need to see:

B.Sc./M.Sc. in Computer Science, Computer Engineering, or a related technical field.

8+ years of software engineering experience building production distributed systems.

Core Systems Programming: Expert-level proficiency in languages such as Go, C++, or Rust, with a focus on high-performance, concurrent architectures.

Solid understanding of Kubernetes and container-based deployments for production services.

Experience deploying, monitoring, and maintaining ML models or data-intensive services in a production environment.

Comfort working in ambiguous, fast-moving environments where the product is still being shaped.

Ways to stand out from the crowd:

Experience building ML model-serving platforms or MLOps tooling (model registries, A/B rollout frameworks, feature stores) at scale.

A track record of taking systems from prototype to stable, production-grade platform serving real enterprise customers.

A "Systems" Thinker: You don't just write software; you understand the full stack, from how data moves across the wire to how it’s processed in a distributed cluster.

Practical Innovation: The ability to simplify complex problems and build internal tools or frameworks that empower other engineering teams to move faster.

With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you are passionate about building mission-critical systems at the frontier of AI infrastructure, we want to hear from you.

Questions about this role

Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.

Compensation for Software Engineer roles in Israel varies widely by seniority, employer size, and remote vs onsite arrangement. Check the salary range on this listing when published, or browse our Software Engineer hub for Israel medians across recent openings.

Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.

AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.

Want AI Applyd to auto-apply to roles like this?

We tailor your resume per posting, fill the forms, and track replies for you.