Files
nexus/wiki/sources/Scrapy---Playwright-抓取TikTok-Shop-Data.md
2026-05-03 05:42:12 +08:00

42 lines
2.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Scrapy + Playwright 抓取TikTok Shop Data"
type: source
tags: [scrapy, playwright, tiktok-shop, python, docker, web-scraping]
date: 2026-05-02
---
## Source File
- [[raw/AI/Scrapy + Playwright 抓取TikTok Shop Data.md]]
## Summary用中文描述
- 核心主题:使用 Scrapy + Playwright 技术栈抓取 TikTok Shop 店铺数据
- 问题域TikTok Shop 电商数据采集、Web 页面爬取的环境配置
- 方法/机制:通过 Python venv 隔离环境,安装 scrapy + scrapy-playwright并配置 Playwright Chromium 浏览器,实现动态页面渲染与数据抓取
- 结论/价值:提供了一套完整的技术方案,用于在 Docker 环境中运行 TikTok Shop 数据爬虫
## Key Claims用中文描述
- 开发者通过创建 Python venv 隔离项目依赖,避免与系统环境冲突
- scrapy-playwright 插件将 Playwright浏览器自动化与 Scrapy爬虫框架结合支持 JS 渲染页面抓取
- Docker 容器内运行需要额外配置 venv 和 PATH 环境变量
## Key Quotes
> "source venv/bin/activate" — 进入 Python 虚拟环境,终端前缀出现 `(venv)` 表示激活成功
> "scrapy runspider tiktok_shop_spider.py -a shop_url=\"https://www.tiktok.com/shop/store/aopuro/7495894041403296077\"" — 运行 TikTok Shop 爬虫的命令示例
> "python -c \"from playwright.sync_api import sync_playwright; print('Playwright OK')\"" — 验证 Playwright 安装是否成功的命令
## Key Concepts
- [[Scrapy]]Python 开源爬虫框架,支持异步 HTTP 请求和数据管道
- [[Playwright]]:微软开发的浏览器自动化工具,支持 Chromium/Firefox/WebKit支持 JS 渲染页面抓取
- [[scrapy-playwright]]:连接 Scrapy 与 Playwright 的中间件,使 Scrapy Spider 能控制 Playwright 浏览器
- [[Python venv]]Python 虚拟环境工具,用于隔离项目依赖,避免包版本冲突
## Key Entities
- [[TikTok Shop]]:字节跳动旗下的电商平台,本文档的爬取目标
## Connections
- [[TikTok Shop - Apache Superset Dashboard设计思路]] ← 关联 ← [[Scrapy + Playwright 抓取TikTok Shop Data]]
- [[可自动化、可扩展、AI增强的电商数据采集与处理系统]] ← 属于 ← [[Scrapy + Playwright 抓取TikTok Shop Data]]
## Contradictions
- (暂无发现冲突内容)