What is self-healing job coverage?

Self-healing job coverage means the scraper automatically detects when a search query returns zero results, canonicalizes the role (Software Engineer vs SWE vs Software Developer), and retries with alternate queries. AI Applyd built this on Cloudflare Queues plus a headless browser pipeline so dead searches retry with a 12-hour cron instead of failing silently.

How does AI Applyd keep job listings fresh?

The scraper runs on a managed headless browser pool at 5 concurrent sessions per job board (LinkedIn, Indeed, Greenhouse, Lever, Ashby, Workday). Each scrape writes to Cloudflare D1 with a freshness timestamp. A 12-hour cron re-queues stale queries. Failed scrapes drop to a dead-letter queue and retry with canonicalized role names.

How We Built Self-Healing Job Coverage

Two users hit a zero-result search. We built a feedback loop that turns every empty query into scraper training signal. Here is the architecture.

Ava Bagherzadeh

April 16, 20267 min read

TL;DR

Quick answers

Two users in the US opened the app last week, typed in their role, and got zero jobs back.

That is the worst possible first impression. They did the work of installing the extension, onboarding, writing their dream role, and the product returned a blank screen. For more on this, see how AI is reshaping the 2026 job search.

We could have fixed the problem the usual way. Pull the queries from logs, manually eyeball them, ship a hotfix that adds their keywords to the scraper. Done in a day.

But that fixes two users. It does not fix the next 200 who will hit the same empty state next month with queries we have not predicted. So we built something that does.

This post is how we built self-healing job coverage: a system where every empty search result becomes training signal, and coverage grows from real demand instead of guesswork.

0 → 100%

coverage loop closes itself within 24h of a user hitting an empty search

The failure mode

Our scraping pipeline covers about 50,000 live jobs across LinkedIn, Indeed, Greenhouse, Lever, Workday, iCIMS, and direct company careers pages. That is a lot. It is also not enough.

The broad-coverage pass runs on a static seed list. Software Engineer, Product Manager, Data Scientist, the usual. If you are a Veterinary Technician in Portugal or a Certified Scrum Master in Buenos Aires, you are outside the seed. Your search returns zero results. You churn. For more on this, see follow-up email templates.

The real problem is not that the list is incomplete. The real problem is that we had no feedback loop between user demand and scraper behavior. Demand lived in one column of our analytics. The scraper lived in another cron job. Nothing connected them.

The architecture

Four moving pieces, one outcome.

user search → 0 results
       │
       ▼
┌──────────────────────────┐
│ roleSearchMisses table   │  telemetry, never auth-gated
│   query, country, count  │
└──────────┬───────────────┘
           │ dedup via normalizedKey
           ▼
┌──────────────────────────┐
│ userSuggestedRoles       │
│ userSuggestedLocations   │
│   status: pending        │
└──────────┬───────────────┘
           │ Gemini Flash-Lite canonicalizer
           ▼  (daily cron)
┌──────────────────────────┐
│ classify()               │
│  remapped | dynamic      │
│  | rejected              │
└──────────┬───────────────┘
           │ approved
           ▼
┌──────────────────────────┐
│ scrape orchestrator      │
│ broad-coverage pass      │
│ picks up suggestions     │
└──────────┬───────────────┘
           │ jobs land
           ▼
┌──────────────────────────┐
│ fulfillment flip         │
│ status → 'fulfilled'     │
│ React Email to users     │
└──────────────────────────┘

Every layer is boring on its own. The interesting part is that they talk to each other, and the loop closes without a human in it.

Capturing the miss

The first trick is that we log every zero-result query, not just the obvious ones. The search handler writes a row to roleSearchMisses with the raw query string, the country, the user ID if signed in, and a timestamp. No early exits, no sampling, no filtering. If it returned zero jobs, it gets captured.

Why all of them? Because we do not know which miss is a typo and which is the next big niche. Classifying that is a later problem. Capturing is cheap.

Deduplication happens at write time via a normalizedKey column (lowercased, trimmed, punctuation stripped, alpha-sorted tokens). The same miss from 40 different users collapses into a single row with an incremented hitCount. That hitCount is the demand signal that drives everything downstream. For more on this, see stand out in remote applications.

Canonicalizing with Gemini Flash-Lite

Raw user input is messy.

"Sr. SWE", "Senior Software Engineer", "senior eng", and "software engineer III" all point at the same role. Treating them as four different entries bloats the suggestion table and splits the signal. We need a canonicalizer.

We run one every night. It pulls pending suggestions with hitCount >= 3 or from signed-in users, hands them to Gemini Flash-Lite ($0.10/M in, $0.40/M out, fast enough to process 800+ cells in one batch), and asks for a classification. The output is a discriminated union so the caller cannot misread it:

type ClassifyResult =
  | { kind: 'remapped'; canonical: string; reason: string }
  | { kind: 'dynamic'; canonical: string; searchTerms: string[] }
  | { kind: 'rejected'; reason: 'spam' | 'not_a_role' | 'too_narrow' };

async function classify(raw: string, country: string): Promise<ClassifyResult> {
  const result = await generateObject({
    model: openrouter('google/gemini-2.5-flash-lite'),
    schema: ClassifyResultSchema,
    prompt: buildClassifyPrompt(raw, country),
  });
  return result.object;
}

Three outcomes, three different downstream paths.

remapped means the query is a known role with a nickname. "sr. swe" remaps to "senior software engineer" and inherits existing coverage. No scraping changes needed.
dynamic means the query is real but outside the seed list. The classifier also returns a list of search terms to feed the scrapers. "Veterinary Technician" becomes ["veterinary technician", "vet tech", "animal health technician"]. This is where coverage actually grows.
rejected catches spam, slurs, and queries too narrow to be worth indexing ("my cousin Bob's job"). We keep the row for audit but never promote it.

The discriminated union matters. Every branch is handled in the orchestrator with an exhaustive switch, which means TypeScript fails the build if someone adds a new kind without wiring up its path. No silent drops.

Widening the broad-coverage pass

Approved suggestions land in the same table the scrape orchestrator already reads. The scheduler joins userSuggestedRoles WHERE status = 'approved' to the static seed list on every broad-coverage run and treats them identically. The scraper does not know the difference between a seed we hand-wrote and a suggestion that grew out of a user miss.

This is the bit we care most about. The scraper code got exactly one new line. The system got a whole new behavior. For more on this, see what 200 tracked applications revealed.

The fulfillment flip

Jobs come back. We cross-reference every freshly scraped job against pending suggestions. If a job title or description matches a pending suggestion with high confidence, we flip that suggestion to fulfilled and fire a notification.

await db.batch([
  db.update(userSuggestedRoles)
    .set({ status: 'fulfilled', fulfilledAt: new Date() })
    .where(eq(userSuggestedRoles.normalizedKey, match.key)),
  db.insert(notificationQueue).values(
    usersWhoMissed.map((u) => ({
      userId: u.id,
      template: 'role-now-covered',
      payload: { query: u.originalQuery, jobCount: match.count },
    }))
  ),
]);

Batched atomically via D1's db.batch(). Either the flip and the notification both land, or neither does. No half-states.

The React Email template is deliberately quiet. Subject: "We found jobs for 'veterinary technician'". Body: one sentence, one CTA button to /dashboard. No marketing. The user asked, we answered, that is the whole message.

What the loop produces

Coverage grows endogenously from real user demand. We do not guess which roles to seed. Users tell us, the classifier cleans the signal, the scraper absorbs it, the notification closes the loop.

In the first 48 hours after shipping, 23 net-new canonical roles were promoted from suggestion to seed.
11 of those roles produced matching jobs within the same scrape cycle.
Every user whose original miss triggered one of those 11 got a "now covered" email.
Two of them converted to paid within the same week.

That last number is the one that justifies the engineering cost. Self-healing coverage is not a vanity metric. It is the thing that turned two churned users into retained customers.

What we did not do

A few things we deliberately avoided.

No manual moderation queue. The classifier is the moderator. We audit rejected rows weekly and tune the prompt, but we do not sit on a queue of approvals. If the classifier is wrong, the system is wrong, and we fix it at the prompt layer.
No keyword stuffing. We did not just slam every miss into the scraper seed list. That path leads to rate limits, junk jobs, and scraper bans. The canonicalizer exists precisely to keep the seed list clean.
No opaque ML. The whole loop is auditable. Every suggestion row stores its source query, its classification, its reason, and its fulfillment status. Any behavior in production maps to a row you can SELECT.

Why this matters

The default failure mode of a job board is to serve the roles you seeded. The default failure mode of a user is to churn when their role is not there. Those two failure modes meet in an empty search, and that empty search is where most products lose.

We treat every empty result as a bug report from a user who did not file a ticket. The scraper gets smarter because users searched. The users get notified because the scraper succeeded. Nobody had to email support.

Coverage should grow where demand grows. Anything else is a guess.

If you are building anything with a search box and a long tail, this pattern generalizes. Log the miss. Canonicalize the signal. Feed it back into the source of truth. Close the loop with a notification. The hard part is not the code. The hard part is deciding that empty results are a product problem, not a user problem.

If you want to see the loop in action, sign in and run a search for your dream role at /dashboard. If it comes up empty today, we will likely be emailing you in a few days. And if you want the scraped coverage and auto-apply on top of it, our pricing is here.

Your dream role should not return zero results

AI Applyd covers 50K+ live jobs across LinkedIn, Greenhouse, Workday, Lever, iCIMS, and direct careers pages. If your role is not in there yet, the self-healing loop will pick it up.

Search jobs free See pricing

Enjoyed this? Share it.

Written by

Ava Bagherzadeh

Builder, AI Applyd

Ava built AI Applyd because she got tired of watching talented people get filtered out by broken hiring systems. She writes about what she has learned building a platform that actually respects job seekers.

1What is self-healing job coverage?

2How does AI Applyd keep job listings fresh?