Files
nexus/wiki/concepts/电商数据采集.md
weishen 5789476c23 Batch ingest: Multi-Agent Team / DevOps Maturity / 一语点醒梦中人 / NodeWarden
Sources:
- Agent-usecases-multi-Agent-Team.md
- DevOps-Maturity-Model-From-Traditional-IT-to-Advanced-DevOps.md
- AI-一语点醒梦中人.md
- Home-Office-NodeWarden-把-Bitwarden-搬上-Cloudflare-Workers彻底告别服务器.md

Entities: Trebuh, Cloudflare
Concepts: DevOps成熟度模型, 共享内存模式, 空性智慧, 绝处逢生
2026-04-15 18:05:17 +08:00

34 lines
1.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: 电商数据采集
type: concept
tags: [scraper, e-commerce, data-pipeline]
sources: []
last_updated: 2026-04-15
---
## 定义
从电商平台Amazon/淘宝/JD/Shopee 等)采集产品结构化信息(标题、价格、评分、图片、评论等),用于竞品分析、价格监控或市场研究。
## 采集字段
- title标题
- price价格
- rating评分
- image_urls图片 URL
- product_url商品链接
- 扩展字段:品牌、型号、类目、评论数、上架时间
## 技术栈
- **静态页面**[Scrapy] 为主,高效结构化抓取
- **动态页面**[Playwright] 渲染 JS 后采集
- **混合方案**[scrapy-playwright] 插件,两者结合
## 防封策略
- User-Agent 轮换
- 代理池([[BrightData]]/[[ScraperAPI]]
- DOWNLOAD_DELAY + RANDOMIZE_DOWNLOAD_DELAY
- 分布式调度Scrapyd 集群)
## 在 Wiki 中的角色
- [[可自动化可扩展AI增强的电商数据采集与处理系统]] 核心场景
- 采集结果 JSON/CSV → [[n8n Workflow自动化]] 消费处理