Senior Software Engineer – Agentic Runtime Safety & Observability

Keysight Technologies

ESonsitePosted Jan 16, 2026

Skills

langchainpythonc++llmml

About the role

Overview:

Keysight is at the forefront of technology innovation, delivering breakthroughs and trusted insights in electronic design, simulation, prototyping, test, manufacturing, and optimization. Our ~15,000 employees create world-class solutions in communications, 5G, automotive, energy, quantum, aerospace, defense, and semiconductor markets for customers in over 100 countries. Learn more about what we do.

Our award-winning culture embraces a bold vision of where technology can take us and a passion for tackling challenging problems with industry-first solutions. We believe that when people feel a sense of belonging, they can be more creative, innovative, and thrive at all points in their careers.

About the Team

Keysight’s Applied AI Autonomy Initiative is building a next-generation agentic orchestration framework that enables AI agents to reason, adapt, and coordinate across complex engineering workflows. The platform combines LLM-based reasoning, reinforcement-inspired feedback loops, and simulation-driven validation to automate and optimize engineering decisions at scale.

This role sits at the core of the initiative, defining how autonomy can be deployed safely, transparently, and predictably in high-assurance engineering environments.

About the Role

As a Senior Engineer – Agentic Runtime Safety, Stability & Observability, you will design and own the runtime safety and reliability layer of Keysight’s agentic orchestration platform.

Your mission is to ensure that AI-driven orchestration remains aligned with human intent, observable, auditable, and recoverable. You will architect guardrails, rollback mechanisms, and observability pipelines that allow autonomous systems to act powerfully—without sacrificing trust, control, or predictability.

This role bridges AI systems, runtime engineering, and safety-critical design, working closely with AI architects, ML engineers, and simulation teams.

Responsibilities:

Runtime Safety & Execution Control

Design runtime guardrails ensuring agent actions remain aligned with intent, policies, and system constraints.

Implement intent validation, semantic checks, and execution contracts before orchestration runs.

Define safety boundaries, escalation paths, and rollback conditions within agent workflows.

Fault Isolation, Rollback & Recovery

Architect deterministic rollback, checkpointing, and recovery mechanisms for multi-agent systems.

Design fault-isolation boundaries to prevent local failures from cascading system-wide.

Build sandboxed execution environments for validating AI-generated orchestration logic.

Observability & Diagnostics

Implement end-to-end observability capturing agent decisions, execution traces, and system health.

Develop anomaly detection and confidence-based safety gating for runtime behavior.

Build introspection APIs and dashboards exposing rationale, safety metrics, and performance signals.

Adaptive Governance

Establish feedback loops that adjust orchestration behavior based on performance and safety signals.

Contribute to continuous safety validation and runtime certification pipelines.

Collaborate across teams to embed transparency and traceability into every orchestration cycle.

Qualifications:

Required Qualifications

PhD or 5+ years of experience in systems engineering, runtime reliability, or safety-critical software.

Strong proficiency in Python and C/C++.

Proven experience designing fault-tolerant, observable, and recoverable systems.

Hands-on experience with agentic orchestration frameworks (e.g., LangGraph, LangChain, or similar).

Solid understanding of execution control, intent alignment, and policy enforcement in automated systems.

Experience building telemetry, monitoring, or diagnostics pipelines in complex runtimes.

Desired Qualifications

Background in safety-critical or regulated domains (e.g. aerospace, industrial systems, EDA, HPC).

Experience with semantic validation, policy modeling, or goal disambiguation.

Familiarity with rollback strategies, dynamic gating, or safety scoring in distributed systems.

Experience with Python/C++ interoperability (e.g. PyBind11, gRPC, ZeroMQ).

Exposure to simulation-driven systems or hybrid AI–physics environments.

Questions about this role

Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.

Compensation for Software Engineer roles in Spain varies widely by seniority, employer size, and remote vs onsite arrangement. Check the salary range on this listing when published, or browse our Software Engineer hub for Spain medians across recent openings.

Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.

AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.

Want AI Applyd to auto-apply to roles like this?

We tailor your resume per posting, fill the forms, and track replies for you.