Build a Facebook Ad Library Scraper with Node.js and Puppeteer

For developers in the Node.js ecosystem, there's a premier tool for automating browser tasks and scraping modern, JavaScript-heavy websites: Puppeteer. Developed by Google, Puppeteer provides a high-level API to control a headless Chrome or Chromium browser, making it the perfect choice for tackling a complex target like the Meta Ad Library.

Unlike basic HTTP clients, Puppeteer renders the full page, executes JavaScript, and interacts with elements just like a user would. This makes it incredibly effective for scraping dynamic content and handling "infinite scroll" pages without needing to reverse-engineer any private APIs. This guide will provide a complete walkthrough for building a powerful Facebook Ads scraper using Node.js and Puppeteer, no login required.

Prerequisites

Node.js and npm installed on your system
A basic understanding of JavaScript (including async/await) and the browser DOM

Step 1: Setting Up Your Node.js Project

First, create a new project folder and initialize it with npm. Open your terminal and run:

mkdir puppeteer-ad-scraper
cd puppeteer-ad-scraper
npm init -y
npm install puppeteer

This will create a package.json file and install Puppeteer. During installation, Puppeteer also downloads a recent version of Chromium that is guaranteed to work with the API, so you're ready to go.

Step 2: The Puppeteer Scraping Script

Create a file named index.js. We'll build our scraper here. Let's start with the basic code to launch Puppeteer and open a new page.

const puppeteer = require('puppeteer');
const fs = require('fs');

// --- Configuration ---
const AD_LIBRARY_URL = 'https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=ALL&q=sneakers&search_type=keyword_unordered&media_type=all';
const OUTPUT_FILENAME = 'facebook_ads_data.json';
const ADS_TO_SCRAPE = 100; // Target number of ads to scrape

// Main function wrapped in an Immediately Invoked Function Expression (IIFE)
(async () => {
    console.log('Launching browser...');
    const browser = await puppeteer.launch({
        headless: false, // Set to 'true' for production, 'false' for debugging
        args: ['--no-sandbox', '--disable-setuid-sandbox']
    });

    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 926 });

    console.log(`Navigating to: ${AD_LIBRARY_URL}`);
    await page.goto(AD_LIBRARY_URL, { waitUntil: 'networkidle2' });

    // --- All other logic will go here ---

    console.log('Closing browser...');
    await browser.close();
})();

Setting headless: false is extremely useful during development as it allows you to see exactly what the browser is doing.

Step 3: The Smart Way to Handle Infinite Scroll

Puppeteer allows for a more robust way to handle infinite scroll than just using fixed delays. We can write a function that scrolls down and then waits for the page's height to actually increase, confirming that new content has loaded.

async function scrapeInfiniteScrollItems(page, itemTargetCount) {
    let items = [];
    try {
        let previousHeight;
        while (items.length < itemTargetCount) {
            // --- Data extraction logic will go here in the next step ---
            // For now, we'll just get the current items on the page
            items = await page.evaluate(extractAdData);

            console.log(`Currently scraped ${items.length} unique ads.`);

            previousHeight = await page.evaluate('document.body.scrollHeight');
            await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
            
            // Wait for the page height to increase, indicating new content has loaded
            await page.waitForFunction(`document.body.scrollHeight > ${previousHeight}`, { timeout: 10000 });
            await page.waitForTimeout(1000); // Wait a little after scroll
        }
    } catch (e) {
        console.log(`Scrolling stopped. Reason: ${e.message}`);
    }
    return items;
}

This function will be called from our main script. The page.waitForFunction is the key: it pauses the script until the condition inside is true, with a timeout to prevent it from waiting forever.

Step 4: Extracting Data within the Browser Context

With Puppeteer, data extraction happens inside the browser's own environment using page.evaluate(). We pass this function a function to execute. This function must be self-contained.

Disclaimer: These selectors are examples and will change as Meta updates its website. You will need to inspect the page and update them periodically.

// This function runs in the browser's context
function extractAdData() {
    // This selector is an example and WILL change. Find the correct parent selector for ad cards.
    const adContainers = document.querySelectorAll('div.x1a2a7pz.x1a5g62h.x1fw500j.x1unuyjp.x1iorvi4');
    const ads = [];

    adContainers.forEach(ad => {
        const adData = {};
        
        // Extract ad text (example selector)
        const adTextElement = ad.querySelector('div._7j2a');
        adData.text = adTextElement ? adTextElement.innerText : 'N/A';

        // Extract image URL (example selector)
        const imgElement = ad.querySelector('img.xt7dq6l.xl1xv1r');
        adData.image_url = imgElement ? imgElement.src : 'N/A';

        // Extract landing page URL (example selector)
        const linkElement = ad.querySelector('a[rel="noopener nofollow"]');
        adData.landing_page = linkElement ? linkElement.href : 'N/A';
        
        // Add to list if it contains some data
        if (adData.text !== 'N/A') {
            ads.push(adData);
        }
    });
    
    // Remove duplicates based on ad text before returning
    const uniqueAds = Array.from(new Map(ads.map(ad => [ad.text, ad])).values());
    return uniqueAds;
}

Step 5: Putting It All Together and Saving the Data

Now, let's call our functions from the main script and save the results to a JSON file using Node's built-in fs module.

// --- Inside the main async IIFE, after page.goto() ---

console.log(`Starting scrape, targeting ${ADS_TO_SCRAPE} ads...`);
const scrapedAds = await scrapeInfiniteScrollItems(page, ADS_TO_SCRAPE);

console.log(`Scraping complete. Found ${scrapedAds.length} unique ads.`);

// Save data to a JSON file
fs.writeFileSync(OUTPUT_FILENAME, JSON.stringify(scrapedAds, null, 2));
console.log(`Data saved to ${OUTPUT_FILENAME}`);

// --- The browser.close() call will run after this ---

To run the full script, save index.js and execute node index.js in your terminal.

Best Practices for Puppeteer Scraping

Use a Custom User-Agent

By default, Puppeteer identifies itself as a bot. Set a realistic user agent to appear more like a regular user:

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115 Safari/537.36');

Add Random Delays

Use page.waitForTimeout() with random values between actions to mimic human browsing patterns and avoid triggering rate limits.

Use Proxies

For any serious scraping, rotate your IP address using a proxy service to prevent getting blocked.

Run Headless

For production, set headless: true. It's faster and uses fewer resources because it doesn't render a visible UI.

Puppeteer makes scraping dynamic websites accessible and powerful. By mastering these techniques, you can build reliable scrapers to gather valuable public data from the Meta Ad Library and gain a competitive edge.

Ready-made solution available

While building your own scraper is educational, maintaining it can be time-consuming. AdScraping handles all the complexity for you.

Try AdScraping Free