How can I catch and process the data from the XHR responses using casperjs?

casperjs waitforresource
casperjs exists
casperjs download file
how to use casperjs
casperjs click
casperjs fill form
homebrew casperjs
casperjs python

The data on the webpage is displayed dynamically and it seems that checking for every change in the html and extracting the data is a very daunting task and also needs me to use very unreliable XPaths. So I would want to be able to extract the data from the XHR packets.

I hope to be able to extract information from XHR packets as well as generate 'XHR' packets to be sent to the server. The extracting information part is more important for me because the sending of information can be handled easily by automatically triggering html elements using casperjs.

I'm attaching a screenshot of what I mean.

The text in the response tab is the data I need to process afterwards. (This XHR response has been received from the server.)

This is not easily possible, because the resource.received event handler only provides meta data like url, headers or status, but not the actual data. The underlying phantomjs event handler acts the same way.


Stateless AJAX Request

If the ajax call is stateless, you may repeat the request

casper.on("resource.received", function(resource){
    // somehow identify this request, here: if it contains ".json"
    // it also also only does something when the stage is "end" otherwise this would be executed two times
    if (resource.url.indexOf(".json") != -1 && resource.stage == "end") {
        var data = casper.evaluate(function(url){
            // synchronous GET request
            return __utils__.sendAJAX(url, "GET");
        }, resource.url);
        // do something with data, you might need to JSON.parse(data)
    }
});
casper.start(url); // your script

You may want to add the event listener to resource.requested. That way you don't need to way for the call to complete.

You can also do this right inside of the control flow like this (source: A: CasperJS waitForResource: how to get the resource i've waited for):

casper.start(url);

var res, resData;
casper.waitForResource(function check(resource){
    res = resource;
    return resource.url.indexOf(".json") != -1;
}, function then(){
    resData = casper.evaluate(function(url){
        // synchronous GET request
        return __utils__.sendAJAX(url, "GET");
    }, res.url);
    // do something with the data here or in a later step
});

casper.run();

Stateful AJAX Request

If it is not stateless, you would need to replace the implementation of XMLHttpRequest. You will need to inject your own implementation of the onreadystatechange handler, collect the information in the page window object and later collect it in another evaluate call.

You may want to look at the XHR faker in sinon.js or use the following complete proxy for XMLHttpRequest (I modeled it after method 3 from How can I create a XMLHttpRequest wrapper/proxy?):

function replaceXHR(){
    (function(window, debug){
        function args(a){
            var s = "";
            for(var i = 0; i < a.length; i++) {
                s += "\t\n[" + i + "] => " + a[i];
            }
            return s;
        }
        var _XMLHttpRequest = window.XMLHttpRequest;

        window.XMLHttpRequest = function() {
            this.xhr = new _XMLHttpRequest();
        }

        // proxy ALL methods/properties
        var methods = [ 
            "open", 
            "abort", 
            "setRequestHeader", 
            "send", 
            "addEventListener", 
            "removeEventListener", 
            "getResponseHeader", 
            "getAllResponseHeaders", 
            "dispatchEvent", 
            "overrideMimeType"
        ];
        methods.forEach(function(method){
            window.XMLHttpRequest.prototype[method] = function() {
                if (debug) console.log("ARGUMENTS", method, args(arguments));
                if (method == "open") {
                    this._url = arguments[1];
                }
                return this.xhr[method].apply(this.xhr, arguments);
            }
        });

        // proxy change event handler
        Object.defineProperty(window.XMLHttpRequest.prototype, "onreadystatechange", {
            get: function(){
                // this will probably never called
                return this.xhr.onreadystatechange;
            },
            set: function(onreadystatechange){
                var that = this.xhr;
                var realThis = this;
                that.onreadystatechange = function(){
                    // request is fully loaded
                    if (that.readyState == 4) {
                        if (debug) console.log("RESPONSE RECEIVED:", typeof that.responseText == "string" ? that.responseText.length : "none");
                        // there is a response and filter execution based on url
                        if (that.responseText && realThis._url.indexOf("whatever") != -1) {
                            window.myAwesomeResponse = that.responseText;
                        }
                    }
                    onreadystatechange.call(that);
                };
            }
        });

        var otherscalars = [
            "onabort",
            "onerror",
            "onload",
            "onloadstart",
            "onloadend",
            "onprogress",
            "readyState",
            "responseText",
            "responseType",
            "responseXML",
            "status",
            "statusText",
            "upload",
            "withCredentials",
            "DONE",
            "UNSENT",
            "HEADERS_RECEIVED",
            "LOADING",
            "OPENED"
        ];
        otherscalars.forEach(function(scalar){
            Object.defineProperty(window.XMLHttpRequest.prototype, scalar, {
                get: function(){
                    return this.xhr[scalar];
                },
                set: function(obj){
                    this.xhr[scalar] = obj;
                }
            });
        });
    })(window, false);
}

If you want to capture the AJAX calls from the very beginning, you need to add this to one of the first event handlers

casper.on("page.initialized", function(resource){
    this.evaluate(replaceXHR);
});

or evaluate(replaceXHR) when you need it.

The control flow would look like this:

function replaceXHR(){ /* from above*/ }

casper.start(yourUrl, function(){
    this.evaluate(replaceXHR);
});

function getAwesomeResponse(){
    return this.evaluate(function(){
        return window.myAwesomeResponse;
    });
}

// stops waiting if window.myAwesomeResponse is something that evaluates to true
casper.waitFor(getAwesomeResponse, function then(){
    var data = JSON.parse(getAwesomeResponse());
    // Do something with data
});

casper.run();

As described above, I create a proxy for XMLHttpRequest so that every time it is used on the page, I can do something with it. The page that you scrape uses the xhr.onreadystatechange callback to receive data. The proxying is done by defining a specific setter function which writes the received data to window.myAwesomeResponse in the page context. The only thing you need to do is retrieving this text.


JSONP Request

Writing a proxy for JSONP is even easier, if you know the prefix (the function to call with the loaded JSON e.g. insert({"data":["Some", "JSON", "here"],"id":"asdasda")). You can overwrite insert in the page context

  1. after the page is loaded

    casper.start(url).then(function(){
        this.evaluate(function(){
            var oldInsert = insert;
            insert = function(json){
                window.myAwesomeResponse = json;
                oldInsert.apply(window, arguments);
            };
        });
    }).waitFor(getAwesomeResponse, function then(){
        var data = JSON.parse(getAwesomeResponse());
        // Do something with data
    }).run();
    
  2. or before the request is received (if the function is registered just before the request is invoked)

    casper.on("resource.requested", function(resource){
        // filter on the correct call
        if (resource.url.indexOf(".jsonp") != -1) {
            this.evaluate(function(){
                var oldInsert = insert;
                insert = function(json){
                    window.myAwesomeResponse = json;
                    oldInsert.apply(window, arguments);
                };
            });
        }
    }).run();
    
    casper.start(url).waitFor(getAwesomeResponse, function then(){
        var data = JSON.parse(getAwesomeResponse());
        // Do something with data
    }).run();
    

The casper module, sendAJAX(url, "GET"); }, res.url); // do something with the data here or in a later step }); casper.run(); Stateful AJAX Request If it is not stateless, you would need  12 How can I catch and process the data from the XHR responses using casperjs? 9 What is the underlying mechanism behind va_list and where is it defined? 8 Is it unnecessary to learn the kind of data structures and objects inside sql only because we are using another language to access db indirectly?

I may be late into the party, but the answer may help someone like me who would fall into this problem later in future.

I had to start with PhantomJS, then moved to CasperJS but finally settled with SlimerJS. Slimer is based on Phantom, is compatible with Casper, and can send you back the response body using the same onResponseReceived method, in "response.body" part.

Reference: https://docs.slimerjs.org/current/api/webpage.html#webpage-onresourcereceived

How to make HTTP requests using Fetch API and Promises, The data on the webpage is displayed dynamically and it seems that checking can be handled easily by automatically triggering html elements using casperjs. 12 How can I catch and process the data from the XHR responses using casperjs? 9 What is the underlying mechanism behind va_list and where is it defined? 8 Is it unnecessary to learn the kind of data structures and objects inside sql only because we are using another language to access db indirectly?

@Artjom's answer's doesn't work for me in the recent Chrome and CasperJS versions.

Based on @Artjom's answer and based on gilly3's answer on how to replace XMLHttpRequest, I have composed a new solution that should work in most/all versions of the different browsers. Works for me.

SlimerJS cannot work on newer version of FireFox, therefore no good for me.

Here is the the generic code to add a listner to load of XHR (not dependent on CasperJS):

var addXHRListener = function (XHROnStateChange) {

    var XHROnLoad = function () {
        if (this.readyState == 4) {
            XHROnStateChange(this)
        }
    }

    var open_original = XMLHttpRequest.prototype.open;

    XMLHttpRequest.prototype.open = function (method, url, async, unk1, unk2) {
        this.requestUrl = url
        open_original.apply(this, arguments);
    };

    var xhrSend = XMLHttpRequest.prototype.send;
    XMLHttpRequest.prototype.send = function () {

        var xhr = this;
        if (xhr.addEventListener) {
            xhr.removeEventListener("readystatechange", XHROnLoad);
            xhr.addEventListener("readystatechange", XHROnLoad, false);
        } else {
            function readyStateChange() {
                if (handler) {
                    if (handler.handleEvent) {
                        handler.handleEvent.apply(xhr, arguments);
                    } else {
                        handler.apply(xhr, arguments);
                    }
                }
                XHROnLoad.apply(xhr, arguments);
                setReadyStateChange();
            }

            function setReadyStateChange() {
                setTimeout(function () {
                    if (xhr.onreadystatechange != readyStateChange) {
                        handler = xhr.onreadystatechange;
                        xhr.onreadystatechange = readyStateChange;
                    }
                }, 1);
            }

            var handler;
            setReadyStateChange();
        }
        xhrSend.apply(xhr, arguments);
    };

}

Here is CasperJS code to emit a custom event on load of XHR:

casper.on("page.initialized", function (resource) {
    var emitXHRLoad = function (xhr) {
        window.callPhantom({eventName: 'xhr.load', eventData: xhr})
    }
    this.evaluate(addXHRListener, emitXHRLoad);
});

casper.on('remote.callback', function (data) {
    casper.emit(data.eventName, data.eventData)
});

Here is a code to listen to "xhr.load" event and get the XHR response body:

casper.on('xhr.load', function (xhr) {
    console.log('xhr load', xhr.requestUrl)
    console.log('xhr load', xhr.responseText)
});

'XHR' responses received from the server using CasperJS, The easiest way to get a casper instance is to use the module's create() A function to be executed when a waitFor* function execution time exceeds Encodes a resource using the base64 algorithm synchronously using client-side XMLHttpRequest. thenOpen(response.data, function(response) { console.log('​Opened',  I've found a way to do this using casperjs (it should work with phantomjs alone if you implement the download function using XMLHttpRequest, but i've not tried). I'll leave you the working example, that tries to download the mos recent PDF from this page .

Additionally, you can also directly download the content and manipulate it later. Here is the example of the script I am using to retrieve a JSON and save it locally :

var casper = require('casper').create({
    pageSettings: {
        webSecurityEnabled: false
    }
});

var url = 'https://twitter.com/users/username_available?username=whatever';

casper.start('about:blank', function() {
   this.download(url, "hop.json");
});

casper.run(function() {
    this.echo('Done.').exit();
});

[PDF] CasperJs Documentation, The data on the webpage is displayed dynamically and it seems that I catch and process the data from the XHR responses using casperjs? By using our site, openrijal. I am Different. 1. answer. 1. question 7 How can I catch and process the data from the XHR responses using casperjs?

How can I catch and process the data from the XHR responses using casperjs?. The data on the webpage is displayed dynamically and it seems that checking  How can I 'intercept' the JSON data sent in response to an onClick()-backed click? half an idea There may be some way to subvert the showBills() method on the client, for example to send the JSON response as an ordinary page (rather than an XMLHttpRequest).

You will learn how to make a HTTP request using Fetch API, learn the handle responses easier than our old friend XMLHttpRequest(XHR). the Fetch API and Promises in order to render a list that contains data from an API endpoint. we should get a response object with some information that includes  With Cypress, you can stub network requests and have it respond instantly with fixture data. When stubbing a response, you typically need to manage potentially large and complex JSON objects. Cypress allows you to integrate fixture syntax directly into responses.

You want the resource.received(casperjs)/onResourceReceived(phantomjs) from the 'XHR' responses received from the server using CasperJS (with PhantomJS in What should be strategy of web scraping if we want to scrape data from  When an XHR successfully gets data from the network, it sends a load event. To process the data after it's loaded, we set a function to the onload property of the XHR object. In this case, we simply log the response to the console. Now, in your developer console you should see the following.

Comments
  • @ArtjomB. For now I just extracted the information from the html elements and deployed the script. Getting the value of the responses would be more elegant though. I will look at using this in a few days time. I'm really sorry, I couldn't incorporate this yet.
  • if i want to fetch image src from website which use ajax to load it , i try the "scrollTo" and "viewport ", but i fails in the end , how should i do ?
  • @qianjiahao You should ask a new question and describe your problem properly. I have no idea what you're talking about.
  • Doesn't work for me in late 2019, I composed another answer base on this answer. stackoverflow.com/a/58168312/1265306
  • But unfortunately, SlimerJS has a dependency on Firefox being installed.
  • @nchaud - it's a plus not a minus the way I see it. FF is the strongest browser, with the best support for javascript/DOM
  • @nchaud FF or any other XUL Runner, I suppose.
  • But it needs to be installed, which is the problem if you're developing something that needs to be deployed on, say, client sites. (Understandably this is most heavily used for in-house testing and in that case it's easy enough to install FF/XULRunner on the CI server).
  • @nchaud I actually cannot imagine of a situation where you can install SlimerJS but not FF or XULRunner. I think you'd need a installed version of SlimerJS binary somewhere, why can't you have FF installed there as well?
  • This worked out for me today.