首页
/
关于如何才能识别并提取所有 AJAX链接?
/
如何才能识别并提取所有 AJAX链接?

如何才能识别并提取所有 AJAX链接?

6个月前

如何才能识别并提取所有 AJAX链接?

方法 1：使用正则表达式

import re

url_pattern = r"https?://[^/\s]+/\S+"

# 遍历 HTML 文档
for line in html_content:
    # 匹配所有 AJAX 请求的 URL
    matches = re.findall(url_pattern, line)
    if matches:
        print(matches)

方法 2：使用 BeautifulSoup

from bs4 import BeautifulSoup

# 创建 BeautifulSoup 对象
soup = BeautifulSoup(html_content, "html.parser")

# 遍历所有元素
for link in soup.find_all("a", href=True):
    # 匹配 AJAX 请求的 URL
    if link.get("href").startswith("data-ajax-"):
        print(link.get("href"))

方法 3：使用 lxml

import lxml

# 创建 lxml 对象
tree = lxml.etree.parse(html_content)

# 遍历所有元素
for link in tree.findall(".//a[@href and starts-with('data-ajax')]"):
    print(link.attrib["href"])

注意：