For developers in the Node.js ecosystem, there's a premier tool for automating browser tasks and scraping modern, JavaScript-heavy websites: Puppeteer. Developed by Google, Puppeteer provides a high-level API to control a headless Chrome or Chromium browser, making it the perfect choice for tackling a complex target like the Meta Ad Library.
Unlike basic HTTP clients, Puppeteer renders the full page, executes JavaScript, and interacts with elements just like a user would. This makes it incredibly effective for scraping dynamic content and handling "infinite scroll" pages without needing to reverse-engineer any private APIs. This guide will provide a complete walkthrough for building a powerful Facebook Ads scraper using Node.js and Puppeteer, no login required.
Prerequisites
- Node.js and npm installed on your system
- A basic understanding of JavaScript (including async/await) and the browser DOM
Step 1: Setting Up Your Node.js Project
First, create a new project folder and initialize it with npm. Open your terminal and run:
mkdir puppeteer-ad-scraper
cd puppeteer-ad-scraper
npm init -y
npm install puppeteer
This will create a package.json
file and install Puppeteer. During installation, Puppeteer also downloads a recent version of Chromium that is guaranteed to work with the API, so you're ready to go.
Step 2: The Puppeteer Scraping Script
Create a file named index.js
. We'll build our scraper here. Let's start with the basic code to launch Puppeteer and open a new page.
const puppeteer = require('puppeteer');
const fs = require('fs');
// --- Configuration ---
const AD_LIBRARY_URL = 'https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=ALL&q=sneakers&search_type=keyword_unordered&media_type=all';
const OUTPUT_FILENAME = 'facebook_ads_data.json';
const ADS_TO_SCRAPE = 100; // Target number of ads to scrape
// Main function wrapped in an Immediately Invoked Function Expression (IIFE)
(async () => {
console.log('Launching browser...');
const browser = await puppeteer.launch({
headless: false, // Set to 'true' for production, 'false' for debugging
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 926 });
console.log(`Navigating to: ${AD_LIBRARY_URL}`);
await page.goto(AD_LIBRARY_URL, { waitUntil: 'networkidle2' });
// --- All other logic will go here ---
console.log('Closing browser...');
await browser.close();
})();
Setting headless: false
is extremely useful during development as it allows you to see exactly what the browser is doing.
Step 3: The Smart Way to Handle Infinite Scroll
Puppeteer allows for a more robust way to handle infinite scroll than just using fixed delays. We can write a function that scrolls down and then waits for the page's height to actually increase, confirming that new content has loaded.
async function scrapeInfiniteScrollItems(page, itemTargetCount) {
let items = [];
try {
let previousHeight;
while (items.length < itemTargetCount) {
// --- Data extraction logic will go here in the next step ---
// For now, we'll just get the current items on the page
items = await page.evaluate(extractAdData);
console.log(`Currently scraped ${items.length} unique ads.`);
previousHeight = await page.evaluate('document.body.scrollHeight');
await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
// Wait for the page height to increase, indicating new content has loaded
await page.waitForFunction(`document.body.scrollHeight > ${previousHeight}`, { timeout: 10000 });
await page.waitForTimeout(1000); // Wait a little after scroll
}
} catch (e) {
console.log(`Scrolling stopped. Reason: ${e.message}`);
}
return items;
}
This function will be called from our main script. The page.waitForFunction
is the key: it pauses the script until the condition inside is true, with a timeout to prevent it from waiting forever.
Step 4: Extracting Data within the Browser Context
With Puppeteer, data extraction happens inside the browser's own environment using page.evaluate()
. We pass this function a function to execute. This function must be self-contained.
Disclaimer: These selectors are examples and will change as Meta updates its website. You will need to inspect the page and update them periodically.
// This function runs in the browser's context
function extractAdData() {
// This selector is an example and WILL change. Find the correct parent selector for ad cards.
const adContainers = document.querySelectorAll('div.x1a2a7pz.x1a5g62h.x1fw500j.x1unuyjp.x1iorvi4');
const ads = [];
adContainers.forEach(ad => {
const adData = {};
// Extract ad text (example selector)
const adTextElement = ad.querySelector('div._7j2a');
adData.text = adTextElement ? adTextElement.innerText : 'N/A';
// Extract image URL (example selector)
const imgElement = ad.querySelector('img.xt7dq6l.xl1xv1r');
adData.image_url = imgElement ? imgElement.src : 'N/A';
// Extract landing page URL (example selector)
const linkElement = ad.querySelector('a[rel="noopener nofollow"]');
adData.landing_page = linkElement ? linkElement.href : 'N/A';
// Add to list if it contains some data
if (adData.text !== 'N/A') {
ads.push(adData);
}
});
// Remove duplicates based on ad text before returning
const uniqueAds = Array.from(new Map(ads.map(ad => [ad.text, ad])).values());
return uniqueAds;
}
Step 5: Putting It All Together and Saving the Data
Now, let's call our functions from the main script and save the results to a JSON file using Node's built-in fs
module.
// --- Inside the main async IIFE, after page.goto() ---
console.log(`Starting scrape, targeting ${ADS_TO_SCRAPE} ads...`);
const scrapedAds = await scrapeInfiniteScrollItems(page, ADS_TO_SCRAPE);
console.log(`Scraping complete. Found ${scrapedAds.length} unique ads.`);
// Save data to a JSON file
fs.writeFileSync(OUTPUT_FILENAME, JSON.stringify(scrapedAds, null, 2));
console.log(`Data saved to ${OUTPUT_FILENAME}`);
// --- The browser.close() call will run after this ---
To run the full script, save index.js
and execute node index.js
in your terminal.
Best Practices for Puppeteer Scraping
Use a Custom User-Agent
By default, Puppeteer identifies itself as a bot. Set a realistic user agent to appear more like a regular user:
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115 Safari/537.36');
Add Random Delays
Use page.waitForTimeout()
with random values between actions to mimic human browsing patterns and avoid triggering rate limits.
Use Proxies
For any serious scraping, rotate your IP address using a proxy service to prevent getting blocked.
Run Headless
For production, set headless: true
. It's faster and uses fewer resources because it doesn't render a visible UI.
Puppeteer makes scraping dynamic websites accessible and powerful. By mastering these techniques, you can build reliable scrapers to gather valuable public data from the Meta Ad Library and gain a competitive edge.
Ready-made solution available
While building your own scraper is educational, maintaining it can be time-consuming. AdScraping handles all the complexity for you.
Try AdScraping Free