Skip to content

AI Video Agent Engineer

Toogeza

remote globalPosted Jun 2, 2026

Skills

huggingfacejavascriptlangchainpython

About the role

We are toogeza, a Ukrainian recruiting company that is focused on hiring talents and building teams for tech startups worldwide. People make a difference in the big game, we may help to find the right ones.

Currently, we are looking for AI Video Agent Engineer for Elva.

Location: Remote

Job Type: Full-Time

Overview:

We are building an AI-driven video production system designed as an intelligent multi-agent orchestration layer capable of transforming raw ideas and references into fully structured video content.

The system operates as a black-box creative engine for different types of users:

professional video creators

casual users

creators of short narrative concepts

Users provide an idea, references, or media fragments, and the system automatically orchestrates multiple AI agents responsible for analysis, scripting, editing, and production of video content.

We are looking for an engineer who can design and implement multi-agent pipelines, orchestrate AI tools, and build intelligent workflows that combine video analysis, storytelling logic, and automated editing.

This role sits at the intersection of AI systems architecture, creative tooling, and multimodal content generation.

Responsibilities:

AI Agent Architecture

Design and implement the architecture of a multi-agent video editing system including agents responsible for:

video analysis

narrative generation

editing orchestration

production and output synthesis

Define system prompts, behavioral rules, and structured instructions for agents interacting within the pipeline.

Pipeline Orchestration (n8n)

Develop and maintain complex orchestration pipelines in n8n, including:

multi-agent workflows

tool-calling logic

dynamic routing between tools and models

context passing between agents

Pipelines must be capable of selecting the most appropriate models, tools, and strategies depending on the task.

Multimodal Data Processing

Design robust pipelines for handling:

video materials

image assets

user text prompts

structured metadata

Ensure proper data transformation and context transfer across the pipeline stages.

Tool & API Integration

Integrate both external and internal APIs for multimodal generation and processing, including:

image generation

video generation

speech synthesis

audio generation

video processing services

Rapidly evaluate available APIs and select the best quality tools and models for each task.

Model Orchestration & Optimization

Tune and optimize model interactions, primarily based on Gemini models, including:

prompt engineering

structured outputs

tool-calling workflows

agent collaboration logic

Optimize pipelines for quality, reliability, and execution efficiency.

Future Architecture (RAG & Knowledge Systems)

Design systems that support:

vector databases

retrieval-augmented generation (RAG)

memory and contextual reasoning between agents

Expected Outcomes:

The pipeline system should be capable of:

Taking an idea + references as input

Analyzing the content

Generating a coherent narrative structure

Selecting appropriate visual and audio elements

Producing a high-quality, structured video output

Requirements:

AI Systems & Agent Architecture

Strong experience building multi-agent systems, including:

intent and sub-intent modeling

agent orchestration

agent communication and transport layers

summarization pipelines

context passing between agents

Workflow Orchestration

Hands-on experience with:

n8n

Agent tool-calling

n8n MCP

Experience designing complex automation pipelines is essential.

Programming

Strong practical coding skills with vibe coding mindset:

Primary languages:

Python

JavaScript

Bonus experience:

ComfyUI custom nodes

lightweight APIs (e.g., HuggingFace Spaces or inference endpoints)

Multimodal Tooling Knowledge

Ability to quickly navigate API documentation and integrate tools for:

image generation

video generation

speech synthesis

audio generation

multimodal analysis

You should know where and how to obtain the best generation quality for each modality.

Creative Thinking

Strong sense of visual rhythm and composition

Creative intuition and storytelling awareness

Good taste in video structure and montage

Ability to evaluate AI-generated output not only by metrics, but also by creative quality and narrative coherence

Video & Media Processing

Preferred experience with:

FFmpeg

video processing pipelines

image processing workflows

Nice to Have:

The following qualifications are not mandatory, but will significantly strengthen a candidate’s profile during the evaluation process:

Agentic Frameworks

Deep understanding of frameworks such as LangGraph and LangChain for building complex cyclic state graphs and stateful agent systems.

Advanced RAG & Memory Management

Experience designing long-term memory systems for agents, including mechanisms for storing and retrieving successful execution patterns or scenarios to improve future performance through experience-based retrieval.

Self-Correction & Reflection

Experience building agents with feedback loops such as self-reflection, self-critique, or strategy correction, enabling them to verify their own actions and adapt execution logic dynamically.

Evaluation & Observability

Experience with tools for monitoring, debugging, and evaluating agent chains and prompt behavior, such as LangSmith, Arize Phoenix, or Promptfoo, with the ability to identify logical failures, quality regressions, and orchestration bottlenecks.

Ideal Candidate:

The ideal candidate is someone who naturally operates between engineering and creative production, capable of building systems that automate complex media workflows while preserving narrative coherence and aesthetic quality.

You enjoy designing systems where AI agents collaborate to produce meaningful creative outputs.

What’s next?

If this role sounds like a fit — we’d love to hear from you! Just send over your CV and anything else you’d like us to consider.

We’ll review everything within five working days, and if your background matches what we’re looking for, we’ll get in touch to set up a call and get to know each other better.

Questions about this role

  • How do I apply to this AI Video Agent Engineer role at Toogeza?

    Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.

  • What's the typical salary for Software Engineer in your country?

    Compensation varies by seniority, employer size, and location. When this listing publishes a salary band you'll see it in the badge row above the description.

  • How fast does AI Applyd auto-apply?

    Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.

  • What ATS does Toogeza use?

    AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.

Want AI Applyd to auto-apply to roles like this?

We tailor your resume per posting, fill the forms, and track replies for you.