Files
nexus/wiki/sources/Scrapy-Playwright-抓取TikTok-Shop-Data.md
2026-04-18 12:03:16 +08:00

38 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Scrapy + Playwright 抓取TikTok Shop Data"
type: source
tags: [playwright, scrapy, tiktok, 跨境电商]
date: 2026-04-18
---
## Source File
- [[raw/跨境电商/Scrapy + Playwright 抓取TikTok Shop Data.md]]
## Summary
- 核心主题TikTok Shop 数据抓取环境配置指南
- 问题域跨境电商数据采集、Docker 环境配置
- 方法/机制Python 虚拟环境 + Scrapy + Playwright 组合爬虫架构
- 结论/价值:提供 Docker 容器内运行 Python 爬虫的完整配置方案
## Key Claims
- Scrapy + Playwright 组合是抓取动态网页TikTok Shop的最佳方案
- Docker 容器内运行需要额外配置虚拟环境venv才能正常工作
- 虚拟环境可以隔离依赖,避免全局污染
## Key Concepts
- [[虚拟环境 (venv)]]Python 依赖隔离机制,通过 `python3 -m venv venv` 创建
- [[Scrapy]]Python 爬虫框架,适合结构化网页抓取
- [[Playwright]]Microsoft 浏览器自动化工具,支持 Chromium/Chrome 渲染动态内容
- [[scrapy-playwright]]Scrapy 与 Playwright 集成的中间件,支持 JavaScript 渲染页面
## Key Entities
- [[Docker]]:容器化平台,本场景的部署环境
- [[TikTok Shop]]:字节跳动旗下电商平台,本文的抓取目标
## Connections
- [[Scrapy]] ← uses ← [[scrapy-playwright]]
- [[Playwright]] ← provides ← [[浏览器自动化]]
- [[Docker]] ← requires ← [[虚拟环境 (venv)]]
## Contradictions
- (暂无)