Puppeteer with Crawlera Proxy

Related searches

I am not able to make a puppeteer request through a proxy that has authentication.

Have tried both proxy url authentication: --proxy-server=u:p@proxy.crawlera.com:8010

And also the puppeteer page.authenticate(u,p)

still getting ERR_NO_SUPPORTED_PROXIES

my code:

require('dotenv').config();
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    ignoreHTTPSErrors: true,
    args: ['--proxy-server=proxy.crawlera.com:8010']
  });

  const page = await browser.newPage();
  await page.setExtraHTTPHeaders({
    'Proxy-Authorization':
      'Basic ' +
      Buffer.from(`${process.env.CRAWLERA_APIKEY}:`).toString('base64')
  });

  page.on('console', (...args) => console.log('PAGE LOG:', ...args));
  const path = `https://www.andersonassociates.net/`;
  await page.setViewport({ width: 1680, height: 895 });

  try {
    console.log('before-goto', path);
    var start = +new Date();
    var resp = await page.goto(path, {
      timeout: 0,
      waitUntil: 'domcontentloaded'
    });

    console.log('after-goto', path);
    var end = +new Date();
    console.log('start-end-diff', (end - start) / 1000);

    if (!resp.ok) {
      browser.close();
      return { status: resp.status, error: `ASIN NOT OK. ${resp.status}` };
    }
    console.log('goto', path);
  } catch (error) {
    console.log('page.goto ERROR', error.stack.split('\n'));
    browser.close();
    return { error: error.toString(), stack: error.stack.split('\n') };
  }

  try {
    await page.screenshot({ path: `tmp/anderson.png`, fullPage: true });
    console.log('screenshot');
    browser.close();
  } catch (e) {
    browser.close();
    console.log('screenshot error', e.stack.split('\n'));
  }
})();

Update!

I used Crawlera as my proxy service too, already used proxy-chain & page.authenticate method, but no luck, I think it is caused Crawlera provides an empty password, and I solved by using page.setExtraHTTPHeaders:

const browser = await puppeteer.launch({
    ignoreHTTPSErrors: true, // To allow https url
    args: ['--proxy-server=proxy.crawlera.com:8010']
});
const page = await browser.newPage();
await page.setExtraHTTPHeaders({
    'Proxy-Authorization': 'Basic ' + Buffer.from('<APIKEY>:').toString('base64'),
});

Hope this helps.

How to use a proxy in Puppeteer, Puppeteer is a high-level API for headless chrome. It's one of the most popular tools to use for web automation or web scraping in Node.js. Note: It is recommended to use Puppeteer 1.17 with Chromium 76.0.3803.0. For newer versions of Puppeteer, the latest Chromium snapshot that can be used is r669921. Set ignoreHTTPSErrors to true in puppeteer.launch method; Specify Crawlera’s host and port in --proxy-server flag; Send Crawlera credentials in the Proxy-Authorization header

You can use proxy-chain npm package for it.

Example:

const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');

(async() => {
    const oldProxyUrl = 'http://u:p@proxy.crawlera.com:8010';
    const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);

    // Prints something like "http://127.0.0.1:45678"
    console.log(newProxyUrl);

    const browser = await puppeteer.launch({
        args: [`--proxy-server=${newProxyUrl}`],
    });

})();

You can read more about it in blogpost.

ERR_UNEXPECTED_PROXY_AUTH using crawlera with puppeteer, Below are the relevant sections of my code. browser = await puppeteer.launch({ args: [/*'--disable-dev-shm-usage', */ `--proxy-server=proxy.crawlera.com:8010`� ⚠️ Note: Puppeteer 1.17 and bundled Chromium 76.0.3803.0 are recommended. The latest Chromium snapshot that can be used with Puppeteer 1.18+ is r669921 (in later versions Proxy-Authorization header, required for sending Crawlera credentials, is blocked).

Could you provide the code samples of what you're doing?

Bear in mind that page.authenticate requires an object to be passed to the function. And the credentials have to be setup before doing anything else.

You could try something like this:

await page.authenticate({username, password});
await page.goto(myURL, {waitUntil: 'networkidle0'});

How to use crawlera with headless browsers, crawlera-headless-proxy is a complimentary proxy which is distributed as a The most common way is to use Selenium or Puppeteer. Another� Using Puppeteer for web scraping involves a few steps. Downloading puppeteer and installing puppeteer. Configuring your code to randomly change the browser fingerprint and IP address. Writing the crawler – the piece of software that sends Puppeteer to the website you want to scrape and collects the links that contain data that is valuable for

Here I read about how to setup Crawlera as the proxy provider with Puppeteer.

The blog post states:

⚠️ Note: Puppeteer 1.17 and bundled Chromium 76.0.3803.0 are recommended. The latest Chromium snapshot that can be used with Puppeteer 1.18+ is r669921 (in later versions Proxy-Authorization header, required for sending Crawlera credentials, is blocked).

So in order to deploy the solution with Docker that meant to setup for pupeteer I needed to download a specific version of Chromium, and Puppeteer doesn't guarantee it will work.

The solution was to use proxy-chain and anonymizeProxy method and set ignoreHTTPSErrors: true when launching the browser

Puppeteer with Crawlera Proxy, I used Crawlera as my proxy service too, already used proxy-chain & page. authenticate method, but no luck, I think it is caused Crawlera� ERR_UNEXPECTED_PROXY_AUTH using crawlera with puppeteer v . varunsub. started a topic about 1 year ago I know this is similar to

scrapinghub/crawlera-headless-proxy: A complimentary , Also, this proxy should help users of such frameworks as Selenium and Puppeteer to use Crawlera without a need to build Squid chains or install Polipo. Crawlera uses proxy authentication protocol described in RFC 7235 but it is rather hard to configure such authentication in headless browsers. The most popular way of bypassing this problem is to use Polipo which is, unfortunately, unsupported for a long time. Crawlera uses X-Headers as configuration .

PuppeteerCrawler ({requestList, handlePageFunction: async ({page, request }) => {// This function is called to extract data from a single web page // 'page' is an instance of Puppeteer.Page with page.goto(request.url) already called // 'request' is an instance of Request class with information about the page to load await Apify. pushData

To make it work, you'll need an Apify account with access to the proxy. Visit the Apify platform introduction to find how to log into your account from the SDK. To run this example on the Apify Platform, select the Node.js 12 + Chrome on Debian (apify/actor-node-chrome) base image on the Source tab when configuring the actor.