Get href attribute in pupeteer Node.js

I know the common methods such as evaluate for capturing the elements in puppeteer, but I am curious why I cannot get the href attribute in a JavaScript-like approach as

const page = await browser.newPage();

await page.goto('https://www.example.com');

let links = await page.$$('a');
for (let i = 0; i < links.length; i++) {
  console.log(links[i].getAttribute('href'));
  console.log(links[i].href);
}

await page.$$('a') returns an array with ElementHandles — these are objects with their own pupeteer-specific API, they have not usual DOM API for HTML elements or DOM nodes. So you need either retrieve attributes/properties in the browser context via page.evaluate() or use rather complicated ElementHandles API. This is an example with both ways:

'use strict';

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    await page.goto('https://example.org/');

    // way 1
    const hrefs1 = await page.evaluate(
      () => Array.from(
        document.querySelectorAll('a[href]'),
        a => a.getAttribute('href')
      )
    );

    // way 2
    const elementHandles = await page.$$('a');
    const propertyJsHandles = await Promise.all(
      elementHandles.map(handle => handle.getProperty('href'))
    );
    const hrefs2 = await Promise.all(
      propertyJsHandles.map(handle => handle.jsonValue())
    );

    console.log(hrefs1, hrefs2);

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();

How can I get href property from span tag? � Issue #4116 � puppeteer , I have this span tag. text Using page.$('span.mySpan'), I can get span ElementHandle. Then, I try to get the href property by calling� This is how to get the value of the href attribute of the a tag in Puppeteer. I write about how to do it in selectors and in XPath. Environment. Node.js v12.18.1; puppeteer v4.0.1; How to get it using selectors. Get from multiple a tags.

I don't know why it's such a pain, but this was found when I encountered this a while ago.

async function getHrefs(page, selector) {
  return await page.$$eval(selector, anchors => [].map.call(anchors, a => a.href));
}

Collect element attributes from selector � Issue #628 � puppeteer , Wanted to make simple scraper which can collect href attribute of all and be sure you have the last version of puppeteer: I also mean that @paulirish's code reduces async interaction between Node.js and Chromium and� @muminoff You can get relative paths with the a.getAttribute('href'). I also mean that @paulirish's code reduces async interaction between Node.js and Chromium and this can be faster than many await in the cycle.

A Type safe way of returning an array of strings as the hrefs of the links by casting using the HTMLLinkElement generic for TypeScript users:

await page.$$eval('a', (anchors) => anchors.map((link) => (link as HTMLLinkElement).href));

Web Scraping with Node.js and Puppeteer – Sweetcode.io, nodejs. Web Scraping is the technique of extracting information from websites you have a fair knowledge of HTML and the DOM and Javascript (Node.js). Use debugger in node.js. This will let you debug test code. For example, you can step over await page.click() in the node.js script and see the click happen in the application code browser. Note that you won't be able to run await page.click() in DevTools console due to this Chromium bug. So if you want to try something out, you have to add it

Puppeteer, Puppeteer is a NodeJS library that gives us control over headless to our specified URL i.e. https://browsee.io/ in this case, we will find all the� @aslushnikov How about if I would like to get the href attribute of span tag? I tried elementHandle.getProperty(propertyName) but it doesn't work. P/s: It works when I do the same thing in a tag. Have you tried const hrefAttribute = page.evaluate(() => document.querySelector('span').href)?

WebElement Operations in Puppeteer, To perform any operation in Puppeteer, we have to find the element/elements on the as text in HTML, textContent will help you get the text out of an attribute. @Bruledamien I don't think so because I don't believe ElementHandle class has evaluate() function available to it.. In the case above, my giveaway variable is an ElementHandle that I am grabbing (the anchor tag that has href) and I am trying to get the value of the href (the url) and I assume I have to use the ga_page Page variable I am working with to evaluate the value of the ElementHandle.

Puppeteer documentation — DevDocs, Puppeteer is a Node library which provides a high-level API to control Chromium or Chrome browserURL <?string> a browser url to connect to, in format http://${ host}:${port} . Actual list of devices can be found in lib/DeviceDescriptors.js. returns: <string> A path where Puppeteer expects to find bundled Chromium. Puppeteer is a node.js library which provides a powerful but simple API that allows you to control Google’s Chrome or Chromium browser. It also allows you to run Chromium in headless mode (useful for running browsers in servers) and can send and receive requests without the need of a user interface.

Comments
  • Thanks for a clear explanation. Using page.eval() works like a charm.