Batch ingest: Multi-Agent Team / DevOps Maturity / 一语点醒梦中人 / NodeWarden
Sources: - Agent-usecases-multi-Agent-Team.md - DevOps-Maturity-Model-From-Traditional-IT-to-Advanced-DevOps.md - AI-一语点醒梦中人.md - Home-Office-NodeWarden-把-Bitwarden-搬上-Cloudflare-Workers彻底告别服务器.md Entities: Trebuh, Cloudflare Concepts: DevOps成熟度模型, 共享内存模式, 空性智慧, 绝处逢生
This commit is contained in:
33
wiki/concepts/电商数据采集.md
Normal file
33
wiki/concepts/电商数据采集.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
title: 电商数据采集
|
||||
type: concept
|
||||
tags: [scraper, e-commerce, data-pipeline]
|
||||
sources: []
|
||||
last_updated: 2026-04-15
|
||||
---
|
||||
|
||||
## 定义
|
||||
从电商平台(Amazon/淘宝/JD/Shopee 等)采集产品结构化信息(标题、价格、评分、图片、评论等),用于竞品分析、价格监控或市场研究。
|
||||
|
||||
## 采集字段
|
||||
- title(标题)
|
||||
- price(价格)
|
||||
- rating(评分)
|
||||
- image_urls(图片 URL)
|
||||
- product_url(商品链接)
|
||||
- 扩展字段:品牌、型号、类目、评论数、上架时间
|
||||
|
||||
## 技术栈
|
||||
- **静态页面**:[Scrapy] 为主,高效结构化抓取
|
||||
- **动态页面**:[Playwright] 渲染 JS 后采集
|
||||
- **混合方案**:[scrapy-playwright] 插件,两者结合
|
||||
|
||||
## 防封策略
|
||||
- User-Agent 轮换
|
||||
- 代理池([[BrightData]]/[[ScraperAPI]])
|
||||
- DOWNLOAD_DELAY + RANDOMIZE_DOWNLOAD_DELAY
|
||||
- 分布式调度(Scrapyd 集群)
|
||||
|
||||
## 在 Wiki 中的角色
|
||||
- [[可自动化可扩展AI增强的电商数据采集与处理系统]] 核心场景
|
||||
- 采集结果 JSON/CSV → [[n8n Workflow自动化]] 消费处理
|
||||
Reference in New Issue
Block a user