VBA HTML Listing Info Pull

vba list all html elements
vba html object library
vba getelementsbytagname div class
automatically pull data from website into excel vba
vba parse html file
vba read html file
getelementsbyclassname vba
extract data from website to excel vba

I am looking to follow a series of URL's that are found in column A (example: https://www.ebay.com/itm/Apple-iPhone-7-GSM-Unlocked-Verizon-AT-T-TMobile-Sprint-32GB-128GB-256GB/352381131997?epid=225303158&hash=item520b8d5cdd:m:mWgYDe4a79NeLuAlV-RmAQA:rk:7:pf:0) and pull the following information from them: - Title - Price - Description

I think there are multiple issues with my code... For one, I can't get the program to follow specific URL's listed in the Excel (only if I specify one within the code). Also, pulling multiple fields has given me issues.

Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet, t As Date
Dim i As Integer
i = 0

Do While Worksheets("Sheet1").Cells(i, 1).Value <> ""
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
    .Visible = True
    .Navigate2 Worksheets("Sheet1").Cells(i, 1).Value

    While .Busy Or .readyState < 4: DoEvents: Wend

    Dim Links As Object, i As Long, count As Long
    t = Timer
    Do
        On Error Resume Next
        Set Title = .document.querySelectorAll("it-ttl")
        Set price = .document.querySelectorAll("notranslate")
        Set Description = .document.querySelectorAll("ds_div")
        count = Links.Length
        On Error GoTo 0
        If Timer - t > MAX_WAIT_SEC Then Exit Do
    Loop While count = 0
    For i = 0 To Title.Length - 1
        ws.Cells(i + 1, 1) = Title.item(i)
        ws.Cells(i + 1, 2) = price.item(i)
        ws.Cells(i + 1, 3) = Description.item(i)
    Next
    .Quit
i = i + 1
Loop
End With
End Sub

I would use late binding for MSXML2.XMLHTTP and set a reference to the Microsoft HTML Object Library for the HTMLDocument.

Note: querySelector() references the first item it finds that matches its search string.

Here is the short version:

Public Sub ListingInfo()
    Dim cell As Range
    With ThisWorkbook.Worksheets("Sheet1")
        For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
            Dim Document As MSHTML.HTMLDocument
            With CreateObject("MSXML2.XMLHTTP")
                .Open "GET", cell.Value, False
                .send
                Set Document = New MSHTML.HTMLDocument
                Document.body.innerHTML = .responseText
            End With
            cell.Offset(0, 1).Value = Document.getElementByID("itemTitle").innerText
            cell.Offset(0, 2).Value = Document.getElementByID("prcIsum").innerText

            If Not Document.querySelector(".viSNotesCnt") Is Nothing Then
                cell.Offset(0, 3).Value = Document.querySelector(".viSNotesCnt").innerText
            Else
                'Try Something Else
            End If
        Next
    End With
End Sub

A more elaborate solution would be to break the code up into smaller routines and load the data into an Array. The main advantage of this is that you can test each subroutine separately.

Option Explicit
Public Type tListingInfo
    Description As String
    Price As Currency
    Title As String
End Type

Public Sub ListingInfo()
    Dim source As Range
    Dim data As Variant
    With ThisWorkbook.Worksheets("Sheet1")
        Set source = .Range("A1:D1", .Cells(.Rows.count, 1).End(xlUp))
        data = source.Value
    End With
    Dim r As Long
    Dim record As tListingInfo
    Dim url As String

    For r = 1 To UBound(data)
        record = getListingInfo()
        url = data(r, 1)
        record = getListingInfo(url)
        With record
            data(r, 2) = .Description
            data(r, 3) = .Price
            data(r, 4) = .Title
        End With
    Next
    source.Value = data
End Sub

Public Function getListingInfo(url As String) As tListingInfo
    Dim ListingInfo As tListingInfo
    Dim Document As MSHTML.HTMLDocument
    Set Document = getHTMLDocument(url)

    With ListingInfo
        .Description = Document.getElementByID("itemTitle").innerText
        .Price = Split(Document.getElementByID("prcIsum").innerText)(1)
        .Title = Document.querySelectorAll(".viSNotesCnt")(0).innerText
        Debug.Print .Description, .Price, .Title
    End With
End Function

Public Function getHTMLDocument(url As String) As MSHTML.HTMLDocument
    Const READYSTATE_COMPLETE As Long = 4

    Dim Document As MSHTML.HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", url, False
        .send
        If .readyState = READYSTATE_COMPLETE And .Status = 200 Then
            Set Document = New MSHTML.HTMLDocument
            Document.body.innerHTML = .responseText
            Set getHTMLDocument = Document
        Else
            MsgBox "URL:  " & vbCrLf & "Ready state: " & .readyState & vbCrLf & "HTTP request status: " & .Status, vbInformation, "URL Not Responding"
        End If
    End With
End Function

Extract or Get data from HTML Element in Excel using VBA, Web queries are one simple feature in Excel that you can use to import data from (it's also called the unordered list element) and it has an id (ids are unique). See this image! Extract HTML Element Contents in WebPage using VBA in Excel. In-addition, to read and extract contents of HTML elements, we’ll have to create few objects using a library. Therefore, first add the Microsoft HTML Object Library reference to the application. From the top menu of your VBA editor, click Tools -> References…. In the References window, find and select Microsoft HTML Object Library and click OK.

There are a lot of things to fix in your code. It is late here so I will just give pointers (and update fully later) and working code below:

  1. Declare all variables and use appropriate type
  2. Review For Loops and how transpose can be used to create a 1d array of urls pulled from sheet to loop over
  3. Review the difference between querySelector and querySelectorAll methods
  4. Review CSS selectors (you are specifying everything as type selector when in fact you are not selecting by tag for the elements of interest; nor by your stated text)
  5. Think about placement of your IE object creation and of your .Navigate2 to make use of existing object
  6. Make sure to use distinct loop counters
  7. Be sure not to overwrite values in sheet

Code:

Option Explicit
Public Sub ListingInfo()
    Dim ie As New InternetExplorer, ws As Worksheet
    Dim i As Long, urls(), rowCounter As Long
    Dim title As Object, price As Object, description As Object
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    urls = Application.Transpose(ws.Range("A1:A2").Value) '<= Adjust
    With ie
        .Visible = True
        For i = LBound(urls) To UBound(urls)
            If InStr(urls(i), "http") > 0 Then
                rowCounter = rowCounter + 1
                .Navigate2 urls(i)
                While .Busy Or .readyState < 4: DoEvents: Wend
                Set title = .document.querySelector(".it-ttl")
                Set price = .document.querySelector("#prcIsum")
                Set description = .document.querySelector("#viTabs_0_is")

                ws.Cells(rowCounter, 3) = title.innerText
                ws.Cells(rowCounter, 4) = price.innerText
                ws.Cells(rowCounter, 5) = description.innerText
                Set title = Nothing: Set price = Nothing: Set description = Nothing
            End If
        Next
        .Quit
    End With
End Sub

Scraping a website HTML in VBA, You can use VBA to extract data from web pages, either as whole tables The home page lists out the questions which have been asked most  As this piece of HTML is quite unique in the whole HTML content we can simply replace the header between the strong tags with the following regex (.*?). This will extract any string of characters between these tags which are not end of line characters.

Here's an approach using Web Requests, using MSXML. It should be significantly faster than using IE, and I'd encourage you to strongly consider using this approach wherever possible.

You'll need references to Microsoft HTML Object Library and Microsoft XML v6.0 to get this working.

Option Explicit

Public Sub SubmitRequest()
    Dim URLs                              As Excel.Range
    Dim URL                               As Excel.Range
    Dim LastRow                           As Long
    Dim wb                                As Excel.Workbook: Set wb = ThisWorkbook
    Dim ws                                As Excel.Worksheet: Set ws = wb.Worksheets(1)
    Dim ListingDetail                     As Variant
    Dim i                                 As Long
    Dim j                                 As Long
    Dim html                              As HTMLDocument

    ReDim ListingDetail(0 To 2, 0 To 10000)

    'Get URLs
    With ws
        LastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
        Set URLs = .Range(.Cells(1, 1), .Cells(LastRow, 1))
    End With

    'Update the ListingDetail
    For Each URL In URLs
        Set html = getHTML(URL.Value2)
        ListingDetail(0, i) = html.getElementByID("itemTitle").innertext 'Title
        ListingDetail(1, i) = html.getElementByID("prcIsum").innertext 'Price
        ListingDetail(2, i) = html.getElementsByClassName("viSNotesCnt")(0).innertext 'Seller Notes
        i = i + 1
    Next

    'Resize array
    ReDim Preserve ListingDetail(0 To 2, 0 To i - 1)

    'Dump in Column T,U,V of existing sheet
    ws.Range("T1:V" & i).Value = WorksheetFunction.Transpose(ListingDetail)
End Sub

Private Function getHTML(ByVal URL As String) As HTMLDocument
    'Add a reference to Microsoft HTML Object Library
    Set getHTML = New HTMLDocument

    With New MSXML2.XMLHTTP60
        .Open "GET", URL
        .send
        getHTML.body.innerHTML = .responseText
    End With
End Function

How to Use VBA to List Objects in Access Database, How to use Access VBA functions to list all tables, queries, forms and reports in How to use VBA to List Microsoft® Access Database Objects any liability for loss or damage to data to which any techniques, methods or code to Excel Extract dates from strings Form design Form #Error List DB objects  List all file names from a folder into worksheet by using VBA code. Using the following VBA to list files in a folder in a worksheet: 1. Open a worksheet, and click to select a cell where you want to put the filenames. 2. Hold down the ALT + F11 keys in Excel, and it opens the Microsoft Visual Basic for Applications window. 3.

VBA: WebScraping to pull data using <ul, <li, <div, <span tags , I need to pull <div class codes which are highlighted in yellow color in the Html code to excel file. The out put that is need is listed below. 64238. Creating Data Validation (Using VBA): Using the code below a drop down list (data validation) will be created in the cell “J2” . The data for the drop down list will come from the range “=A1:A6” in the sheet “Sheet1”.

Multiple List Box Selections in Excel VBA, Next, you can drag a list box on the Userform. When you arrive at the 'Select Type​' frame, remember to draw this frame first before you place the three option  How to extract data to another worksheet using VBA? We have Company Report with Employee Name, State, Emp Id, Manager & Process. The above data contains more than 250 rows.

Excel VBA Combo Box, A combo box is a drop-down list from where a user can select an item or fill in his/​her own choice. To create a combo box in Excel VBA, execute the following  VBA – Get Information From a Web Page – Excel Macro – Amazon Example July 13, 2015 / in Excel , Reference , Tutorial / by admin This is a basic VBA code to get started with InternetExplorer.Application object and work with web browsers and data extraction.

Comments
  • On the first iteration of your Do While i is going to be equal to zero - also this code shouldn't compile as you have a duplicate declaration in your scope (you're declaring i twice).
  • You're also using Option Explicit but I don't see where you declare Title, price, Description - what am I missing here? This sub shouldn't even be able to run.
  • I like this! One small issue that just came to light for me... There seems to be some listings without Document.querySelector(".viSNotesCnt").innerText which then causes an error. Any ideas on how to pull these too, or add a column that pulls this description as well and leave the #N/A's as blanks? Example: ebay.com/itm/Apple-iPhone-X-256GB-Space-Gray-Unlocked-GSM/…
  • @RCarmody I modified the short version to escape the error. I'm not sure what information that you would want to use to replace it.
  • So that worked for the one, but when I try and apply it elsewhere (say, for the prcISum) then I get a type mismatch error. Is this because one is looking at class and the other ID?
  • @RCarmody the proper way to do it is to create a variable for each item and test if the variable is Nothing after setting it.
  • One more question... I got everything working, but this is taking approx 21min per 1000 URLs. I have thousands of URL's to go through - is there any way to speed this up?