怎么用python爬取文本內(nèi)容并保存

133次閱讀

共計 863 個字符，預(yù)計需要花費 3 分鐘才能閱讀完成。

要用 Python 爬取文本內(nèi)容并保存，可以按照以下步驟進行：

導(dǎo)入所需的庫：首先，導(dǎo)入 requests 庫，用于發(fā)送 HTTP 請求獲取網(wǎng)頁內(nèi)容；導(dǎo)入 BeautifulSoup 庫，用于解析網(wǎng)頁內(nèi)容。

import requests
from bs4 import BeautifulSoup

發(fā)送 HTTP 請求并獲取網(wǎng)頁內(nèi)容：使用 requests 庫的 get 方法發(fā)送 GET 請求，并通過 text 屬性獲取網(wǎng)頁內(nèi)容。

url = ' 要爬取的網(wǎng)頁 URL'
response = requests.get(url)
html = response.text

解析網(wǎng)頁內(nèi)容：使用 BeautifulSoup 庫解析網(wǎng)頁內(nèi)容，并提取所需的文本信息。

soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()

保存文本內(nèi)容：將提取到的文本內(nèi)容保存到文件中，可使用 open 函數(shù)打開一個文件，然后使用 write 方法寫入內(nèi)容。

with open(' 保存的文件路徑 ', 'w', encoding='utf-8') as file:
    file.write(text)

完整代碼示例：

import requests
from bs4 import BeautifulSoup

url = ' 要爬取的網(wǎng)頁 URL'
response = requests.get(url)
html = response.text

soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()

with open(' 保存的文件路徑 ', 'w', encoding='utf-8') as file:
    file.write(text)