JobRadarNextJSPlaywrightcheerioAISideProject

JobRadar Part 5: Dropping the Automation Fantasy and Building a Practical Pipeline

April 24, 20261 min read

When I first designed JobRadar, the roadmap was simple:

"Vercel Cron scrapes Indeed/Seek every morning, AI scores them, I get a digest email."

Sounds great. But building it was a different story.

  • Seek/Indeed bot blocking: User-Agent changes got 403s and 429s right back.
  • Playwright on Vercel Lambda: Bundle size exploded, timeout at 10 seconds, ETXTBSY errors.
  • Cloudflare blocking: Glassdoor couldn't even be fetched.

I spent two days in Part 3 trying to make Playwright + Vercel work. The conclusion was clear:

"If auto-collection is blocked, why not have users paste URLs themselves?"

The pivot turned out to be simpler and more powerful. Only listings the user actually cares about go in, so there's no junk data. On-demand means almost no server costs. This post covers two days of building after that shift.


What Got Built

FeatureDescription
URL input UIURL input field at the top of the job list, platform auto-detection badge
JD scrapersSeek / Indeed / Glassdoor / Generic implementations
Instant pipelineURL add → scraping → AI matching runs automatically in sequence
Job list UXDrag to reorder, delete, re-match
Cover letter generationAI-generated from JD + resume, editable, clipboard copy
Resume experience summaryResume text → Claude Haiku → English summary auto-fill

Step 1. URL Input and Platform Auto-Detection

The first thing I built was the input field. Paste a URL and the badge shows which site it's from instantly.

// src/lib/detect-platform.ts
export type Platform = 'seek' | 'indeed' | 'linkedin' | 'glassdoor' | 'other'

export function detectPlatform(url: string): Platform {
  if (url.includes('seek.com')) return 'seek'
  if (url.includes('indeed.com')) return 'indeed'
  if (url.includes('linkedin.com')) return 'linkedin'
  if (url.includes('glassdoor.com')) return 'glassdoor'
  return 'other'
}

export const PLATFORM_STYLE: Record<Platform, { label: string; className: string }> = {
  seek:      { label: 'Seek',      className: 'bg-blue-100 text-blue-700' },
  indeed:    { label: 'Indeed',    className: 'bg-yellow-100 text-yellow-700' },
  linkedin:  { label: 'LinkedIn',  className: 'bg-sky-100 text-sky-700' },
  glassdoor: { label: 'Glassdoor', className: 'bg-green-100 text-green-700' },
  other:     { label: 'Other',     className: 'bg-zinc-100 text-zinc-500' },
}

The badge color changing the moment you type a URL turned out to be surprisingly helpful UX. "Yep, I pasted it right."


Step 2. cheerio-Based JD Scraper

To get JDs without Playwright, it's fetch + HTML parsing. The library is cheerio — basically jQuery for Node.js, easy to learn, light on the server bundle.

npm install cheerio

Each platform needed a different scraping strategy.

Seek: __NEXT_DATA__ JSON Parsing

Seek is built on Next.js, so <script id="__NEXT_DATA__"> contains the entire job listing as JSON. Much cleaner than parsing the HTML.

// src/lib/scrapers/seek-url.ts
const nextDataRaw = $('#__NEXT_DATA__').text()
if (nextDataRaw) {
  const nextData = JSON.parse(nextDataRaw)
  const job = nextData?.props?.pageProps?.jobDetails?.job
    ?? nextData?.props?.pageProps?.job
    ?? nextData?.props?.pageProps?.jobViewDetails

  if (job) {
    return {
      title: job.title ?? job.header?.jobTitle ?? '',
      company: job.advertiser?.description ?? job.companyName ?? '',
      // ...
    }
  }
}

// cheerio fallback if JSON parsing fails
const title = $('h1[data-automation="job-detail-title"]').text()
  || $('h1').first().text()

The __NEXT_DATA__ structure can change when Seek deploys. Multiple paths are covered with ?? chaining, with a cheerio HTML fallback when that fails.

Indeed: JSON-LD Structured Data

Indeed provides JobPosting schema data in <script type="application/ld+json"> — standard structured data that's easy to parse.

// src/lib/scrapers/indeed-url.ts
$('script[type="application/ld+json"]').each((_, el) => {
  const data = JSON.parse($(el).text())
  if (data['@type'] === 'JobPosting') {
    ldJob = {
      title: data.title ?? '',
      company: data.hiringOrganization?.name ?? '',
      location: data.jobLocation?.address?.addressLocality ?? '',
      description: data.description ?? '',
      posted_at: data.datePosted ?? null,
    }
  }
})

If no JSON-LD, fall back to data-testid selectors via cheerio.

Generic: 3-Tier Fallback

For non-Seek/Indeed/LinkedIn/Glassdoor sites, try in order:

JSON-LD (JobPosting)
  → Open Graph (og:title, og:description)
    → meta tags (description)

Step 3. Glassdoor Adventures — Cloudflare Bypass

Glassdoor has Cloudflare bot protection that blocks fetch outright. Not even a 403 — the connection just gets dropped.

Tried stealth mode:

// test-stealth.js (ultimately discarded)
const browser = await chromium.launch({ headless: false })
// ... still blocked by Cloudflare

Then I looked more closely at the URL:

/job-listing/group-product-manager-deepl-JV_IC5023222_KO0,49_KE50,55.htm

KO0,49 and KE50,55. KO encodes the title range, KE encodes the company name range — as character indices in the slug string.

// src/lib/scrapers/glassdoor-url.ts
export function parseGlassdoorUrl(url: string): ScrapedJob {
  const pathname = new URL(url).pathname
  const slugMatch = pathname.match(/\/job-listing\/(.+?)(?:-JV_|-GD_|\.htm)/i)
  const slug = slugMatch?.[1] ?? ''

  const koMatch = pathname.match(/_KO\d+,(\d+)/)   // title end index
  const keMatch = pathname.match(/_KE(\d+),(\d+)/) // company start, end index

  if (koMatch && keMatch) {
    const titleEnd = parseInt(koMatch[1])
    const companyStart = parseInt(keMatch[1])
    const companyEnd = parseInt(keMatch[2])
    title = slug.slice(0, titleEnd).replace(/-/g, ' ')
    company = slug.slice(companyStart, companyEnd).replace(/-/g, ' ')
  }
  // ...
}

No fetch needed — extract title and company name purely from the URL. Can't get the full JD, but it's enough for matching. This was the most memorable moment of the whole build.


Step 4. Instant Pipeline After URL Add

The moment a URL is saved, scraping → AI matching runs automatically in sequence. AddJobForm shows status at each step.

// save → scrape → match sequential execution
const saved = await addJobUrl(formData)           // 1. save to DB
setStatus('Scraping...')
await fetch('/api/scrape-url', {                   // 2. scrape JD
  method: 'POST',
  body: JSON.stringify({ jobId: saved.id, url }),
})
setStatus('AI matching...')
await matchSingleJob(saved.id)                     // 3. AI match
setStatus('Done')

From the user's perspective: paste one URL and within 10 seconds the match score appears on the card.


Step 5. Job List UX Improvements

Drag to Reorder (@dnd-kit)

npm install @dnd-kit/core @dnd-kit/sortable @dnd-kit/utilities
// src/components/JobList.tsx
import { DndContext, closestCenter, PointerSensor, useSensor, useSensors } from '@dnd-kit/core'
import { SortableContext, verticalListSortingStrategy, arrayMove } from '@dnd-kit/sortable'

function handleDragEnd(event: DragEndEvent) {
  const { active, over } = event
  if (over && active.id !== over.id) {
    setItems(prev => arrayMove(prev,
      prev.findIndex(i => i.id === active.id),
      prev.findIndex(i => i.id === over.id),
    ))
  }
}

arrayMove handles the index swap automatically — simpler than expected. Order is managed in client state only, not persisted to DB (it's MVP).

Click Match Score to Re-match

Click the score badge and that single listing re-matches.

<button
  onClick={() => rematch(job.id)}
  className="text-xs font-bold px-2 py-0.5 rounded-full bg-zinc-100 hover:bg-zinc-200"
>
  {job.match_score ?? 'Unmatched'}
</button>

Click "Unmatched" to trigger a single-listing match immediately. No waiting for a batch run.


Step 6. Cover Letter Generation Pipeline

This is the core feature. Click "Cover Letter" on a job card, a modal opens, and Claude writes a cover letter based on the JD + your resume.

// src/app/actions.ts - generateCoverLetter
export async function generateCoverLetter(jobId: string) {
  const [job, profile] = await Promise.all([
    getJobById(jobId),
    getMyProfile(),
  ])

  const prompt = `
You are a professional cover letter writer.

[Applicant Profile]
${profile.experience_summary}

[Job Listing]
Company: ${job.company}
Title: ${job.title}
JD: ${job.description}

Please write an English cover letter based on the above.
`

  const message = await claude.messages.create({
    model: 'claude-haiku-20240307',
    max_tokens: 1024,
    messages: [{ role: 'user', content: prompt }],
  })

  const content = message.content[0].type === 'text' ? message.content[0].text : ''
  await upsertCoverLetter(jobId, content)
  return { content }
}

The modal lets you edit the generated content inline and has a clipboard copy button. Regeneration is also available.

[Cover Letter Modal]
┌─────────────────────────────────────┐
│ Cover Letter                         │
│ Senior Product Manager · Atlassian  │
├─────────────────────────────────────┤
│                                     │
│  Dear Hiring Manager,               │
│                                     │
│  I am writing to express my...      │
│  [editable textarea]                │
│                                     │
├─────────────────────────────────────┤
│ ↺ Regenerate      [Copy to clipboard] │
└─────────────────────────────────────┘

Step 7. AI Auto-fill from Resume

On the profile page, upload your resume and click "AI Auto-fill" to generate an experience summary automatically.

// src/app/profile/actions.ts
export async function autoFillExperience() {
  const profile = await getMyProfile()
  if (!profile.resume_text) return { error: 'Please upload your resume first.' }

  const message = await claude.messages.create({
    model: 'claude-haiku-20240307',
    max_tokens: 512,
    messages: [{
      role: 'user',
      content: `Based on the following resume, write a 3-4 sentence English career summary:\n\n${profile.resume_text}`,
    }],
  })

  const summary = message.content[0].type === 'text' ? message.content[0].text : ''
  await updateExperienceSummary(summary)
  return { summary }
}

This summary feeds directly into the cover letter generation prompt — so its quality directly determines the cover letter quality.


Troubleshooting

Glassdoor URL without KO/KE parameters

Some Glassdoor URLs don't have the KO/KE encoding. The fallback estimates the last word of the slug as the company name. Lower accuracy, but better than "can't parse."

// fallback when no KO/KE
company = parts[parts.length - 1] ?? ''
title = parts.slice(0, -1).join(' ')

Seek __NEXT_DATA__ structure change

Seek's pageProps subkeys can change between deployments. ?? chains cover multiple paths, then cheerio fallback kicks in if all else fails. Storing the error message in DB and showing it on the card made debugging much easier.

Vercel serverless function timeout

Sequential scraping + AI matching can exceed 10 seconds. Fix by increasing the API route timeout in next.config.js or separating the scrape and match calls.


Full Pipeline at a Glance

User pastes URL
        ↓
Platform auto-detect (Seek/Indeed/Glassdoor/Other)
        ↓
Save to DB (title/URL/platform)
        ↓
Call /api/scrape-url
  ├── Seek    → __NEXT_DATA__ JSON → cheerio fallback
  ├── Indeed  → JSON-LD → cheerio fallback
  ├── Glassdoor → URL slug parsing (KO/KE encoding)
  └── Generic → JSON-LD → Open Graph → meta tags
        ↓
AI matching (Claude Sonnet, 0–100 score + reasoning)
        ↓
Score displayed on job card
        ↓
Click "Cover Letter"
        ↓
AI cover letter generated from JD + resume
        ↓
Edit → copy to clipboard

Wrap-up

When I was insisting on Playwright automation at the start, I thought "bot blocking is nothing — just change the User-Agent." But Cloudflare doesn't work that way, and Vercel Lambda constraints are real.

Switching to on-demand actually made the code simpler. No batch scheduler, no error retry logic — just "process the request when it comes in." And like the Glassdoor adventure taught me: when you're blocked, find another way. The answer was in the URL slug.

The next post covers the email digest — sending the top-matched listings via Resend every morning.


JobRadar series

PM

backtodev

A 40-something PM returns to code. Learning, failing, and growing.

JobRadar Part 5: Dropping the Automation Fantasy and Building a Practical Pipeline | backtodev