How do I extract the email address using scrapy?

I'm trying to extract the email address of each restaurant on TripAdvisor.

I've tried this but keeps returning an [ ]:

response.xpath('//*[@class= "restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--89flT6"]')

Code snippet off the TripAdvisor page is below:

<div class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"><span><a href=""><span class="ui_icon email restaurants-detail-overview-cards-LocationOverviewCard__detailLinkIcon--T_k32"></span><span class="restaurants-detail-overview-cards-LocationOverviewCard__detailLinkText--co3ei">Email</span><span class="ui_icon external-link-no-box restaurants-detail-overview-cards-LocationOverviewCard__upLinkIcon--1oVn1"></span></a></span></div>

First: you had mistake in class name.

Second: it is class in <div> but @href is in <a>. And <a> is not directly after <div> so you need


(I skip class name because it is too long to display it)

But instead of so long class name you can try

'//a[contains(@href, "mailto")]/@href'

I tested xpath using lxml

text = '''<div class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6">
<span><a href="">
<span class="ui_icon email restaurants-detail-overview-cards-LocationOverviewCard__detailLinkIcon--T_k32"></span>
<span class="restaurants-detail-overview-cards-LocationOverviewCard__detailLinkText--co3ei">Email</span>
<span class="ui_icon external-link-no-box restaurants-detail-overview-cards-LocationOverviewCard__upLinkIcon--1oVn1"></span>

import lxml.html

soup = lxml.html.fromstring(text)

print(soup.xpath('//*[@class="restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"]//a/@href'))
print(soup.xpath('//a[contains(@href, "mailto")]/@href'))

This is one of the ways how you can:

import requests
from scrapy import Selector

site_link = ''

res = requests.get(site_link)
sel = Selector(res)
email = sel.xpath("//*[contains(@class,'LocationOverviewCard__contactItem--')]//a[contains(@href,'mailto:')]/@href").get()
email = email.split("mailto:")[1].split("?")[0] if email else ""


Selector also has a .re() method for extracting data using regular expressions.

In [2]: response.xpath('//a[contains(@href, "mailto")]/@href')
Out[2]: [<Selector xpath='//a[contains(@href, "mailto")]/@href' data=''>]

In [3]: response.xpath('//a[contains(@href, "mailto")]/@href').get()
Out[3]: ''

In [4]: response.xpath('//a[contains(@href, "mailto")]/@href').re('mailto:(.*)\?\w')
Out[4]: ['']
In [5]: response.xpath('//a[contains(@href, "mailto")]/@href').re('mailto:([^?]*)')
Out[5]: ['']

  • first check if page doesn't use JavaScript to add this link. Scrapy can't run JavaScript. It would need Selenium to control web browser which runs JavaScript.
  • next check if you correctly wrote class name. I found mistake in your class name at the end --89flT6. Or maybe page uses different class names for every element.
  • [<Selector xpath='//*[@class= "restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"]/span' data='<span><div class="_2wKz--mA" data-enc...'>, <Selector xpath='//*[@class= "restaurants-detail-overview-cards-LocationOverviewCard__detailLink--iyzJI restaurants-detail-overview-cards-LocationOverviewCard__contactItem--1flT6"]/span' data='<span><a href="mailto:info@canopyloun...'>]
  • Got this but when i try to access @href it returns [] again
  • try .xpath('//a[contains(@href, "mailto")]/@href')
  • Slightly shorter .re('mailto:([^?]*)').
  • Thank. Added your option