BeautifulSoup and scraping href's isn't working

820 views python

Again I am having trouble scraping href's in BeautifulSoup. I have a list of pages that I am scraping and I have the data but I can't seem to get the hrefs even when I use various codes that work in other scripts.

So here is the code and my data will be below that:

import requests
from bs4 import BeautifulSoup

with open('states_names.csv', 'r') as reader:
    states = [states.strip().replace(' ', '-') for states in reader]

url = ''

for state in states:
    page = requests.get(url+state)
    soup = BeautifulSoup(page.text, 'html.parser')
    links = soup.findAll('div', class_='description')
    # When I try to add .get('href') I get a traceback error. Am I trying to scrape the href too early? 
    h_page = soup.findAll('h3')

<h3><a href="">Gaines Ridge Dinner Club</a></h3>
<h3><a href="">Purifoy-Lipscomb House</a></h3>
<h3><a href="">Kate Shepard House Bed and Breakfast</a></h3>
<h3><a href="">Cedarhurst Mansion</a></h3>
<h3><a href="">Crybaby Bridge</a></h3>
<h3><a href="">Gaineswood Plantation</a></h3>
<h3><a href="">Mountain View Hospital</a></h3>

answered question

1 Answer


Try that:

soup = BeautifulSoup(page.content, 'html.parser')
list0 = []   
possible_links = soup.find_all('a')
for link in possible_links:
    if link.has_attr('href'):
        print (link.attrs['href'])

posted this

Have an answer?


Please login first before posting an answer.