Hello World Master

Tutorials, articles and quizzes on Software Development

Puppeteer > Articles

Get a DOM element using Puppeteer

Puppeteer allows us to automate a web browser, and this also includes being able to use Javascript to get DOM elements on the page. In the web browsers we use, we would go to the developer tools and use the console to write Javascript code that can get elements.

In Puppeteer, we can use code to get DOM elements on our page. There are two ways we can do this, using page.$ and page.eval

Get DOM elements using page.$

To get an element from a webpage loaded by Puppeteer, we can call page.$ what this does is run document.querySelector in the browser

first lets create our the basic scaffolding for our Puppeteer application, which will just be us opening a web page

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');

  await browser.close();
})();

now after our call to page.goto and before our call to browser.close we want to add a line of code that gets the first p tag on the page

const getPTag = await page.$selector('p');

Which will run

document.querySelector('p');
Note that page.$selector is an async function, so since we’re using async await, we have to put await before it

Lets also add a console.log to see what we get back.

ElementHandle {
  _disposed: false,
  _context: ExecutionContext {
    _client: CDPSession {
      eventsMap: [Map],
      emitter: [Object],
      _callbacks: Map(0) {},
      _connection: [Connection],
      _targetType: 'page',
      _sessionId: 'F70B3B16423F4F2AD90513F2EBA7F79A'
    },
    _world: DOMWorld {
      _documentPromise: [Promise],
      _contextPromise: [Promise],
      _contextResolveCallback: null,
      _detached: false,
      _waitTasks: Set(0) {},
      _boundFunctions: Map(0) {},
      _ctxBindings: Set(0) {},
      _settingUpBinding: null,
      _frameManager: [FrameManager],
      _frame: [Frame],
      _timeoutSettings: [TimeoutSettings]
    },
    _contextId: 3,
    _contextName: ''
  },

Which doesnt have anything directly useful for us. so we need to call the getProperty function and pass in a hardcoded string named innerHTML

const getInnerHTMLProperty = await getPTag.getProperty('innerHTML');

and lets console log getInnerHTML property.

JSHandle {
  _disposed: false,
  _context: ExecutionContext {
    _client: CDPSession {
      eventsMap: [Map],
      emitter: [Object],
      _callbacks: Map(0) {},
      _connection: [Connection],
      _targetType: 'page',
      _sessionId: '920E56028019813F1E7E01A0EF2343DE'
    },
    _world: DOMWorld {
      _documentPromise: [Promise],
      _contextPromise: [Promise],
      _contextResolveCallback: null,
      _detached: false,
      _waitTasks: Set(0) {},
      _boundFunctions: Map(0) {},
      _ctxBindings: Set(0) {},
      _settingUpBinding: null,
      _frameManager: [FrameManager],
      _frame: [Frame],
      _timeoutSettings: [TimeoutSettings]
    },
    _contextId: 3,
const puppeteer = require('puppeteer');
    _contextName: ''
  },
  _client: CDPSession {
const puppeteer = require('puppeteer');
    eventsMap: Map(29) {
      'Fetch.requestPaused' => [Array],
      'Fetch.authRequired' => [Array],
      'Network.requestWillBeSent' => [Array],

We just have one last call we need to make. we need to get the JSON value of this handle

const getPtagValue = await getInnerHTMLProperty.jsonValue();

and then when we run node index.js one last time we get the text inside the first p tag

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

Getting DOM elements with page.$eval

We needed to get 3 async variables to end up getting the result we needed. This was fairly clean because we’re been using async and await

But we could also get the values we need using just one async await call using $eval. Lets create a new file for this implementation and add the basic scaffolding that opens up a page.

Now after page.goto add the following line of code

const getPtag = await page.$eval('p', (pTag => pTag.innerHTML));

and lets add a console.log calling getPTag after declaring it.

Now when we run node evaluateFunction.js

We also get back the value of first P tag.

page.$ vs page.$eval

Generally using page.$eval is recommended because

  • You only need 1 call, which might not seems like much with one element, can be cumbersome if you’re working with many elements
  • You can replicate anything you do in $eval’s callback in the browsers console if you need to run the code we use in its callback manually

But page.$ has its own benefits, since it returns back an Element handle from Puppeteer, we get functions that are available to us in the browser

For example, the Element Handler’s click function doesn’t just get the element and call .click or dispatch a click event, it scrolls down to the element and then clicks it.

The element function also has a drag and drop functionality that drags an element and drops it over another element.

Get all elements with $$ and $$eval

If you want to get all specified elements (as done when you run document.querySelectorAll ) we can use page.$$ instead of page.$ and $$eval instead of $eval

Click here to view the code associated with this article