Python Web Scraper Tutorial - Scrape Any Website in 10 Lines

7n25gSnP · 发表于 2026-3-22 17:04:46

Learn Web Scraping with Python - Beginner Friendly Guide

What You Need:
- Python 3.6+
- requests library: pip install requests
- beautifulsoup4: pip install beautifulsoup4

Setup:
pip install requests beautifulsoup4

Example 1: Scrape a webpage title and links

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

print("Title:", soup.title.string)

for link in soup.find_all("a"):
print(link.get("href"))

Example 2: Scrape table data

import requests
from bs4 import BeautifulSoup

url = "https://example.com/table"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

table = soup.find("table")
for row in table.find_all("tr"):
cells = row.find_all("td")
data = [cell.text.strip() for cell in cells]
print(data)

Example 3: Save data to CSV

import requests, csv
from bs4 import BeautifulSoup

url = "https://example.com/data"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

items = soup.find_all("div", class_="item")
results = []
for item in items:
results.append({
"name": item.find("h3").text,
"price": item.find("span").text,
})

with open("output.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "price"])
writer.writeheader()
writer.writerows(results)

print("Data saved to output.csv")

Useful Tips:
1. Always add headers: requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
2. Handle errors: use try/except for network issues
3. Add delays: import time; time.sleep(2) between requests
4. Respect robots.txt: check target website rules
5. Use sessions for multiple requests to same site

Common HTTP Status Codes:
- 200: OK, success
- 403: Forbidden, need headers
- 404: Page not found
- 429: Too many requests, slow down

That is it! Web scraping is powerful. Use it responsibly. Happy coding!

		自动登录	找回密码
密码			立即注册