How do I extract the email address using scrapy?

python email extractor
scrapy example
how to extract emails from website database
web scraping for email addresses
scrapy tutorial
scrapy email
extract contact information from emails
email scraper

I'm trying to extract the email address of each restaurant on TripAdvisor.

I've tried this but keeps returning an [ ]:

response.xpath('//*[@class= "restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--89flT6"]')

Code snippet off the TripAdvisor page is below:

<div class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"><span><a href="mailto:info@canopylounge.my?subject=?"><span class="ui_icon email restaurants-detail-overview-cards-LocationOverviewCard__detailLinkIcon--T_k32"></span><span class="restaurants-detail-overview-cards-LocationOverviewCard__detailLinkText--co3ei">Email</span><span class="ui_icon external-link-no-box restaurants-detail-overview-cards-LocationOverviewCard__upLinkIcon--1oVn1"></span></a></span></div>

First: you had mistake in class name.

Second: it is class in <div> but @href is in <a>. And <a> is not directly after <div> so you need

'//*[@class="..."]//a/@href'

(I skip class name because it is too long to display it)


But instead of so long class name you can try

'//a[contains(@href, "mailto")]/@href'

I tested xpath using lxml

text = '''<div class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6">
<span><a href="mailto:info@canopylounge.my?subject=?">
<span class="ui_icon email restaurants-detail-overview-cards-LocationOverviewCard__detailLinkIcon--T_k32"></span>
<span class="restaurants-detail-overview-cards-LocationOverviewCard__detailLinkText--co3ei">Email</span>
<span class="ui_icon external-link-no-box restaurants-detail-overview-cards-LocationOverviewCard__upLinkIcon--1oVn1"></span>
</a></span>
</div>'''

import lxml.html

soup = lxml.html.fromstring(text)

print(soup.xpath('//*[@class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"]//a/@href'))
print(soup.xpath('//a[contains(@href, "mailto")]/@href'))

Web scraping to extract contact information— Part 1: Mailing Lists, For the last few weeks I've been researching about web scraping with Python and Scrapy, and decided to apply it to a Contact Extractor, a bot  1 — Extract websites from google with googlesearch 2— Make a regex expression to extract emails 3 — Scrape websites using a Scrapy Spider 4 — Save those emails in a CSV file 5 — Put everything together. This article will present some code, but feel free to skip it if you’d like, I’ll try to make it as intuitive as possible.

This is one of the ways how you can:

import requests
from scrapy import Selector

site_link = 'https://www.tripadvisor.com/Restaurant_Review-g60713-d11882449-Reviews-Coin_Op_Game_Room-San_Francisco_California.html'

res = requests.get(site_link)
sel = Selector(res)
email = sel.xpath("//*[contains(@class,'LocationOverviewCard__contactItem--')]//a[contains(@href,'mailto:')]/@href").get()
email = email.split("mailto:")[1].split("?")[0] if email else ""
print(email)

Output:

info@coinopsf.com

apetz/email-scraper, 0+) for crawling websites to extract email addresses. Overview. I implemented this using the popular python web crawling framework scrapy. I had never used it​  To do the simplest of login procedures in Scrapy we can use Scrapy’s FormRequest class. Actually it’s better using one of FormRequests methods to do the form data but more on that later on! With that lets see how this works first and then build on that.

Selector also has a .re() method for extracting data using regular expressions.

In [2]: response.xpath('//a[contains(@href, "mailto")]/@href')
Out[2]: [<Selector xpath='//a[contains(@href, "mailto")]/@href' data='mailto:info@coinopsf.com?subject=?'>]

In [3]: response.xpath('//a[contains(@href, "mailto")]/@href').get()
Out[3]: 'mailto:info@coinopsf.com?subject=?'

In [4]: response.xpath('//a[contains(@href, "mailto")]/@href').re('mailto:(.*)\?\w')
Out[4]: ['info@coinopsf.com']
In [5]: response.xpath('//a[contains(@href, "mailto")]/@href').re('mailto:([^?]*)')
Out[5]: ['info@coinopsf.com']

How to Make a Simple Email Extractor in Python?, or web data extraction is data scraping used for extracting data from for web scraping: Mechanize. BeautifulSoup. Selenium. lxml. Scrapy  How do I extract the email address using scrapy? Ask Question Asked 2 months ago. Active 2 months ago. Viewed 106 times -2. I'm trying to extract the email address of

How to Extract Email Addresses from your Gmail Messages, that match the rule will be parsed by the extractor. You may use any of the Gmail Search operators to filter messages. What you will do in this case is extract all such URLs that IFrame is displaying using Scrapy and then create another request for those URLs and give them to Scrapy. It will then handle things similarly.

Implementing Web Scraping in Python with Scrapy, How do you scrape an email address from a website in Python? T extract the domain from an email address, you can use a formula based on the RIGHT, LEN, and FIND functions. In the generic form above, email represents the email address you are working with. In the example shown, the formula in E4 is: = RIGHT(C4,LEN(C4) - FIND("@", C4))

How to scrape HTML table using Scrapy, shell at the terminal with the web page URL as an argument. Here, Scrapy uses a callback mechanism to follow links. Using this mechanism, the bigger crawler can be designed and can follow links of interest to scrape the desired data from different pages. The regular method will be callback method, which will extract the items, look for links to follow the next page, and then provide a request for the

Comments
  • first check if page doesn't use JavaScript to add this link. Scrapy can't run JavaScript. It would need Selenium to control web browser which runs JavaScript.
  • next check if you correctly wrote class name. I found mistake in your class name at the end --89flT6. Or maybe page uses different class names for every element.
  • [<Selector xpath='//*[@class= "restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"]/span' data='<span><div class="_2wKz--mA" data-enc...'>, <Selector xpath='//*[@class= "restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"]/span' data='<span><a href="mailto:info@canopyloun...'>]
  • Got this but when i try to access @href it returns [] again
  • try .xpath('//a[contains(@href, "mailto")]/@href')
  • Slightly shorter .re('mailto:([^?]*)').
  • Thank. Added your option