Skip to content

爬蟲教學

BeautifulSoup簡介

`BeautifulSoup`簡介

BeautifulSoup 是 Python 最常用的 HTML 解析套件，可以輕鬆從網頁(原始碼)提取資料。

初學者提醒

BeautifulSoup是剖析資料的工具！requests是擷取資料的工具！初學者要能夠搞得清楚這兩個的差別喔！

⚙️ 安裝

pip install beautifulsoup4

🚀 基本用法

from bs4 import BeautifulSoup

html = """
<html>
  <body>
    <h1>標題</h1>
    <p class="info">這是一段文字</p>
    <a href="https://example.com">連結</a>
  </body>
</html>
"""

# 解析 HTML
soup = BeautifulSoup(html, "html.parser")

# 取得標題
print(soup.h1.text)  # 輸出: 標題

# 取得 class="info" 的段落
print(soup.find("p", class_="info").text)  # 輸出: 這是一段文字

# 取得超連結
print(soup.a["href"])  # 輸出: https://example.com

✅ 爬取網頁內容

import requests
from bs4 import BeautifulSoup

# 取得網頁內容
url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# 找出所有 <a> 標籤的超連結
for link in soup.find_all("a"):
    print(link["href"])

📝 總結

操作	方法
解析 HTML	`BeautifulSoup(html, "html.parser")`
取得標籤內容	`soup.h1.text`
尋找特定標籤	`soup.find("p", class_="info")`
取得所有超連結	`soup.find_all("a")`

🚀 適用於爬蟲、HTML 解析，讓你快速獲取網頁內容！ 😊