Category: Guides

  • Attach Puppeteer to External Chrome Using browserd

    If you’ve ever tried to launch Chromium directly from Puppeteer, you know the pain — high CPU, zombie processes, and broken sandboxes.
    Instead of spawning Chrome from every Node process, you can run it once as a container and connect remotely.

    That’s exactly what browserd does — it wraps headless Chromium with a small Go proxy that exposes a stable WebSocket at ws://0.0.0.0:9223.
    Your Puppeteer client can attach directly to this endpoint — no extra HTTP fetch, no random /devtools/browser/<id>, and no lifecycle headaches.


    🧱 What is browserd?

    Headless Chromium packaged with a Go proxy that gives you a fixed WebSocket endpoint (ws://0.0.0.0:9223) so Puppeteer or any CDP client can connect immediately — even across load balancers.

    The proxy inside browserd tracks Chromium’s internal DevTools socket and exposes it directly, meaning:

    • You always connect to ws://host:9223
    • You never need to query /json/version or guess the internal DevTools path
    • You can safely scale multiple containers behind a load balancer

    If you don’t want to run containers at all, Peedief’s managed renderer hosts the same stack for you — same automation, zero ops.


    🚀 Quick Start

    Download the seccomp profile (for a safer sandbox) and run the container:

    # Download the Chromium seccomp profile
    curl -o chromium.json https://raw.githubusercontent.com/peedief/browserd/main/chromium.json
    
    # Run browserd container
    docker run --rm \
      --security-opt seccomp=chromium.json \
      -p 9223:9223 \
      --name browserd \
      ghcr.io/peedief/browserd:v1.0.0
    

    That’s it — the WebSocket is live at:

    ws://localhost:9223
    

    The Go proxy automatically connects to the internal Chrome DevTools backend, so you don’t have to worry about /devtools/browser/<id>.


    🧩 Connect Puppeteer to browserd

    In your Node app, install Puppeteer Core (no bundled Chrome):

    npm install puppeteer-core
    

    Then connect directly to the proxy:

    // connect.js
    import puppeteer from 'puppeteer-core';
    
    async function main() {
      const browser = await puppeteer.connect({
        browserWSEndpoint: 'ws://localhost:9223',
      });
    
      try {
        const page = await browser.newPage();
        await page.goto('https://example.com');
        console.log(await page.title());
        await page.close();
      } finally {
        browser.disconnect(); // don't call browser.close()
      }
    }
    
    main().catch((err) => {
      console.error(err);
      process.exit(1);
    });
    

    That’s all it takes.
    No /json/version calls, no internal WebSocket discovery — just a single stable endpoint.


    🧰 Using Docker Compose

    For a cleaner setup, define it in docker-compose.yml:

    services:
      browserd:
        image: ghcr.io/peedief/browserd:v1.0.0
        ports:
          - "9223:9223"
        security_opt:
          - seccomp=chromium.json

    Then download the seccomp file and bring it up:

    curl -o chromium.json https://raw.githubusercontent.com/peedief/browserd/main/chromium.json
    docker compose up -d browserd

    You now have a fully sandboxed headless Chromium instance that any remote Puppeteer client can connect to at ws://localhost:9223.


    🧠 Why This Is Better

    Old Way (Direct Launch)With browserd
    Every Node process spawns its own Chromium instanceOne centralized container hosts Chromium for all clients
    WebSocket URL changes every run (/devtools/browser/<id>)Stable ws://host:9223 endpoint — never changes
    You must fetch /json/version before connectingNo discovery step — connect instantly
    High CPU, memory leaks, zombie Chrome processesOne managed Chrome lifecycle handled by the proxy
    Sandboxing disabled with --no-sandbox for simplicityRuns under real seccomp sandbox (chromium.json)
    Hard to scale horizontallyEasily load-balance multiple browserd containers — all expose the same consistent WebSocket path
    Each app handles crashes separatelyBrowser lifecycle isolated inside the container

    browserd isn’t just cleaner — it’s scalable by design.
    Spin up 3–4 replicas behind NGINX, Traefik, or any load balancer, and your Puppeteer clients can connect to any of them using the exact same ws://host:9223 path.


    🧱 Health & Scaling

    • A /healthz endpoint is exposed for readiness/liveness checks.
    • You can run multiple browserd instances and load-balance them — since every proxy exposes the same stable ws:// path.
    • Each proxy holds one persistent Chromium process internally, managing its DevTools lifecycle for you.

    ⚙️ Production Tips

    1. Use the seccomp profile (chromium.json) — don’t disable the sandbox unless you absolutely have to.
    2. Limit resources:
      • deploy:
          resources:
            limits:
              cpus: "2.0"
              memory: 2g
    3. Add auth or IP whitelisting if exposing beyond localhost.
    4. Monitor /healthz for restarts or resource exhaustion.
    5. Scale horizontally — each browserd instance can handle its own browser process.

    🧾 Summary

    ✅ Run browserd once → exposes ws://localhost:9223
    ✅ Connect Puppeteer directly using that endpoint
    ✅ No /json/version fetches or dynamic IDs
    ✅ Sandbox stays intact under seccomp profile
    ✅ Load-balance multiple containers effortlessly
    ✅ Shared, stable, and production-ready Chrome layer


    In short:
    browserd turns headless Chrome into a simple, stable, load-balanced microservice you can attach to instantly from Puppeteer or any CDP client.

    If you ever got tired of fighting Chrome flags, zombie processes, or unstable endpoints — this container is your new friend.
    And if you want to skip containers altogether, Peedief.com hosts the same renderer stack for you – no ops, just a clean API.

  • Creating PDF from HTML using Puppeteer in Node.js

    This guide will walk you through how to generate a PDF file from an HTML page using Puppeteer — a popular Node.js library for controlling headless Chrome or Chromium browsers. You’ll also discover a smart way to simplify this workflow using online tools later in the guide.


    🧩 Prerequisites

    Before starting, make sure you have the following installed:

    • Node.js (v22 or later recommended)
    • npm or yarn or pnpm

    If you’re running this on WSL, Linux, or inside a Debian-based container, you may need to install additional system dependencies before using Puppeteer.

    Run the following commands:

    sudo apt-get update
    sudo apt-get install -y \
        fonts-liberation \
        libappindicator3-1 \
        libatk-bridge2.0-0 \
        libnspr4 \
        libnss3 \
        libxss1 \
        libx11-xcb1 \
        libxcomposite1 \
        libxdamage1 \
        libxrandr2 \
        libgbm1 \
        libgtk-3-0 \
        ca-certificates \
        wget \
        curl
    

    These packages ensure that Chromium (used by Puppeteer) runs correctly in a headless environment.


    ⚙️ Step 1: Setup a New Node.js Project

    mkdir html-to-pdf
    cd html-to-pdf
    npm init -y
    

    This creates a new Node.js project with a default package.json file.


    📦 Step 2: Install Puppeteer

    npm install puppeteer
    

    📝 Puppeteer will automatically download a compatible version of Chromium.


    🧱 Step 3: Create a Script to Generate PDF

    Create a new file named generate-pdf.js in your project root and add the following code:

    import puppeteer from 'puppeteer';
    
    async function generatePDF() {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
    
      const htmlContent = `
        <html>
          <head>
            <style>
              body { font-family: Arial, sans-serif; margin: 40px; }
              h1 { color: #4CAF50; }
              p { font-size: 14px; }
            </style>
          </head>
          <body>
            <h1>Invoice</h1>
            <p>This PDF was generated using Puppeteer.</p>
          </body>
        </html>
      `;
    
      await page.setContent(htmlContent, { waitUntil: 'networkidle0' });
    
      await page.pdf({
        path: 'output.pdf',
        format: 'A4',
        printBackground: true,
        margin: {
          top: '20mm',
          bottom: '20mm',
          left: '15mm',
          right: '15mm'
        }
      });
    
      console.log('✅ PDF generated successfully: output.pdf');
      await browser.close();
    }
    
    generatePDF().catch(console.error);
    

    💡 Step 4: Run the Script

    Run the script using Node.js:

    node generate-pdf.js
    

    After successful execution, you’ll see output.pdf in your project directory.


    🧠 Bonus: Generate PDF from an External URL

    You can also convert any webpage directly to a PDF.

    import puppeteer from 'puppeteer';
    
    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
    
      await page.goto('https://example.com', { waitUntil: 'networkidle0' });
    
      await page.pdf({ path: 'example.pdf', format: 'A4', printBackground: true });
    
      console.log('✅ PDF generated from example.com');
    
      await browser.close();
    })();
    

    🧰 Optional Configurations

    You can customize PDF generation with additional options:

    OptionDescription
    pathFile path to save the PDF.
    formatPaper size (e.g., A4, Letter).
    printBackgroundWhether to include background colors/images.
    landscapeSet to true for landscape orientation.
    marginSet custom margins (top, right, bottom, left).

    Example:

    await page.pdf({
      path: 'custom.pdf',
      format: 'A4',
      landscape: true,
      printBackground: true,
      margin: { top: '10mm', bottom: '10mm' }
    });
    

    🧑‍💻 Troubleshooting

    1. Puppeteer Download Too Large?

    If you already have Chrome installed, skip downloading Chromium:

    npm install puppeteer-core
    

    Then, connect Puppeteer to your local Chrome installation:

    const browser = await puppeteer.launch({
      executablePath: '/usr/bin/google-chrome', // path to your Chrome
    });
    

    2. Running in Docker or CI/CD?

    Add this config to run Puppeteer in restricted environments:

    const browser = await puppeteer.launch({
      headless: 'new',
      args: ['--no-sandbox', '--disable-setuid-sandbox'],
    });
    

    ✅ Summary

    You’ve now learned how to:

    • Create a Node.js project
    • Install Puppeteer
    • Generate a PDF from HTML content or a URL
    • Customize page settings
    • Fix environment dependencies on Linux or containers

    This setup is perfect for generating invoices, reports, or certificates from dynamic HTML templates.

    If you ever need to automate PDF generation at scale without managing browsers or code execution, you can try services like Peedief — which lets you create and render PDF templates via simple API calls. It’s great for production apps that need reliability without the hassle of maintaining Puppeteer environments.