蜂采-商品数据采集器

Name: 蜂采-商品数据采集器
Author: ezmo

批量抓取商品详情页内容，包括商品名称、价格、品牌、型号、销售单位、商品图片、商品详情等信息。

As of June 2026, 蜂采-商品数据采集器 has 19 users in the Productivity category.

ezmo Productivity

Chrome Web Store ↗.crx

Users+533.3%

Rating0%

—

— reviews

Reviews0%

—

Version

1.3.2

Manifest V3

90-day change · In the last 90 days this extension 5 version updates, changed permissions.

History

10 snapshots

Tracking since Apr 1, 2026.

View as table

Date	Users	Rating	Reviews	Version
Apr 1, 2026	3	—	—	1.0.0
Apr 16, 2026	6	—	—	1.0.0
Apr 22, 2026	11	—	—	1.1.2
Apr 27, 2026	12	—	—	1.2.0
May 4, 2026	17	—	—	1.3.0
May 10, 2026	17	—	—	1.3.1
May 15, 2026	12	—	—	1.3.1
May 21, 2026	13	—	—	1.3.1
May 27, 2026	14	—	—	1.3.2
Jun 9, 2026	18	—	—	1.3.2
Now	19	—	—	1.3.2

Changelog

Apr 27, 2026

description

**蜂采**是一款强大的商品数据采集工具，专为电商运营、数据分析人员设计。支持两种采集模式，灵活配置提取规则，自动化批量采集商品信息。

**BeeCollect** is a powerful product data scraping tool designed for e-commerce operations and data analysts. It supports two scraping modes with flexible extraction rules for automated batch data collection.

### ✨ 主要特性

#### 📋 双模式采集

1. **手动输入模式**

   - 导入 CSV/Excel 文件（包含 SKU 和 URL）
   - 手动输入商品链接
   - 适合已有商品列表的场景
2. **列表页模式**

   - 自动抓取商品列表页
   - 可选进入详情页补充数据
   - 支持翻页（点击下一页/滚动加载）
   - 适合批量采集整个分类的商品

#### 🎨 灵活的规则配置

- **可视化选择器**：点击页面元素自动生成 CSS 选择器
- **多种字段类型**：文本、数字、图片、图片列表、HTML、元素属性
- **规则集管理**：为不同网站配置不同的提取规则
- **URL 自动匹配**：根据商品 URL 自动选择对应规则集
- **后处理函数**：支持 JavaScript 自定义数据处理逻辑
- **字段排序**：拖拽调整字段顺序

#### 📊 智能导出

- **Excel 格式**：自动生成格式化的 Excel 文件
- **图片下载**：自动下载主图和详情图
- **文件命名**：支持自定义图片文件名格式
- **图片转换**：自动将 WebP/AVIF 转换为 JPG
- **灵活存储**：按 SKU 分文件夹或统一存储

#### 🌍 多语言支持

- 简体中文
- English
- 根据浏览器语言自动切换



### 使用指南

#### 快速开始

**步骤 1：准备商品列表**

- 方式 A：上传包含 SKU 和 URL 的 Excel/CSV 文件
- 方式 B：手动输入 `SKU URL`（每行一个）
- 方式 C：配置列表页规则，自动抓取商品列表

**步骤 2：配置提取规则**

1. 打开任意商品详情页
2. 点击"添加规则集"
3. 填写规则集名称和 URL 匹配模式（正则表达式）
4. 点击"添加字段"，为每个需要采集的数据配置：
   - 字段名称（如：商品名称、价格、品牌）
   - CSS 选择器（可使用可视化选择器）
   - 字段类型（文本/数字/图片等）
   - 图片存储方式（主图/详情图）
5. 点击"测试匹配"验证规则是否正确

**步骤 3：执行采集**

1. 切换到"执行采集"标签页
2. 选择导出目录
3. 配置图片存储选项
4. 点击"开始采集"
5. 等待采集完成，自动生成 Excel 和图片文件

#### 高级功能

**列表页采集**

1. 选择"列表页模式"
2. 配置列表入口 URL
3. 设置列表项选择器（对应每个商品）
4. 配置列表字段提取规则
5. 可选：配置详情链接选择器，进入详情页补充数据
6. 设置翻页策略（点击下一页/滚动加载）
7. 配置终止条件（可选）

**自定义前置脚本**

- 在数据提取前执行自定义 JavaScript
- 适用于需要点击、滚动等交互的场景
- 支持异步操作

**后处理函数**

- 对提取的数据进行二次处理
- 支持正则替换、格式转换等操作
- 可访问其他字段的值

**规则导入导出**

- 导出规则集为 JSON 文件
- 分享给团队成员或备份
- 一键导入已有规则

### 🔒 隐私与安全

- ✅ 所有数据处理均在本地完成
- ✅ 不上传任何数据到服务器
- ✅ 不收集用户隐私信息

### 💡 使用场景

- 📦 电商运营：批量采集竞品商品信息
- 📊 数据分析：收集市场价格数据
- 🏪 选品研究：快速获取商品详情
- 📝 内容创作：采集商品图片和描述
- 🔍 市场调研：分析商品特征和趋势

# 蜂采 · 商品数据采集器 — 功能介绍与使用说明

# Fengcai · Product Data Scraper — Feature Guide & User Manual

---

## 目录 / Table of Contents

- [简介 / Introduction](#简介--introduction)
- [界面概览 / UI Overview](#界面概览--ui-overview)
- [步骤一：商品列表 / Step 1: Product List](#步骤一商品列表--step-1-product-list)
- [步骤二：详情页规则 / Step 2: Detail Page Rules](#步骤二详情页规则--step-2-detail-page-rules)
- [步骤三：执行采集 / Step 3: Run Scraping](#步骤三执行采集--step-3-run-scraping)
- [截图功能 / Screenshot Feature](#截图功能--screenshot-feature)
- [导出结果 / Export Results](#导出结果--export-results)
- [规则管理 / Rule Management](#规则管理--rule-management)
- [注意事项 / Notes](#注意事项--notes)

---

## 简介 / Introduction

**蜂采**是一款 Chrome 侧边栏扩展，专为电商选品、商品数据整理场景设计。无需编程基础，通过点选配置即可批量抓取商品详情页数据，并自动下载图片、截图，导出 Excel。

**Fengcai** is a Chrome side-panel extension for e-commerce product research and data collection. No coding required — configure rules by clicking, then batch-scrape product pages, auto-download images and screenshots, and export to Excel.

---

## 界面概览 / UI Overview

**中文**

插件以侧边栏形式运行，顶部为三步流程导航：

| 步骤 | 标签       | 说明                                 |
| ---- | ---------- | ------------------------------------ |
| 1    | 商品列表   | 管理待采集的商品 SKU 和链接          |
| 2    | 详情页规则 | 配置各站点的字段提取规则与截图设置   |
| 3    | 执行采集   | 配置采集参数、启动采集、查看实时日志 |

**English**

The extension runs as a side panel with a 3-step navigation:

| Step | Tab               | Description                                                       |
| ---- | ----------------- | ----------------------------------------------------------------- |
| 1    | Product List      | Manage SKUs and URLs to scrape                                    |
| 2    | Detail Page Rules | Configure field extraction rules and screenshot settings per site |
| 3    | Run Scraping      | Set scraping parameters, start scraping, view real-time logs      |

---

## 步骤一：商品列表 / Step 1: Product List

### 两种模式 / Two Modes

**中文**

**① 手动输入模式（默认）**

- 导入 CSV 或 Excel 文件（需包含 `SKU`、`URL` 两列）
- 或逐行手动输入，格式：`SKU 空格 URL`
- 列表中 SKU 和 URL 均可点击，在新标签页中打开对应链接
- 支持全选/单选后批量删除

**② 列表页模式**

- 适合从分类页、搜索结果页批量采集
- 配置项：
  - **列表入口 URL**：采集起始页面
  - **列表项选择器**：定位每个商品卡片的 CSS 选择器
  - **列表字段**：在列表页直接提取的字段（标题、价格、主图等）
  - **翻页策略**：下一页按钮 / 无限滚动，支持配置终止条件
  - **进入详情页补充抓取**：勾选后可配置详情链接选择器，抓取详情页数据

> 若列表入口 URL 与当前浏览器活动标签页一致，插件会直接复用该页面，不会新开标签页。

**English**

**① Manual Mode (Default)**

- Import a CSV or Excel file (must include `SKU` and `URL` columns)
- Or enter products manually, one per line: `SKU SPACE URL`
- Both SKU and URL in the list are clickable links (open in new tab)
- Supports select-all / single-select for bulk deletion

**② List Page Mode**

- Ideal for bulk collection from category or search results pages
- Configuration:
  - **Entry URL**: Starting page for scraping
  - **Item Selector**: CSS selector to locate each product card
  - **List Fields**: Fields extracted directly from the list page (title, price, image, etc.)
  - **Pagination**: Next-page button or infinite scroll, with configurable stop conditions
  - **Enter Detail Page**: When checked, configures detail link selector to also scrape detail pages

> If the entry URL matches the currently active browser tab, the extension reuses that tab instead of opening a new one.

---

## 步骤二：详情页规则 / Step 2: Detail Page Rules

### 规则集 / Rule Sets

**中文**

每个**规则集**对应一个或多个站点，包含：

- **名称**：标识该规则集
- **URL 匹配**：正则表达式，可用逗号分隔多个，用于自动匹配商品 URL
- **字段规则列表**：定义要提取的字段
- **高级设置**：页面加载判定、自动滚动、前置脚本
- **截图设置**：见下方截图功能章节

采集时根据商品 URL 自动匹配规则集；无匹配则跳过该商品。

**English**

Each **rule set** corresponds to one or more sites and contains:

- **Name**: Identifier for the rule set
- **URL Pattern**: Regex patterns (comma-separated) to auto-match product URLs
- **Field Rules**: Define which fields to extract
- **Advanced Settings**: Page load conditions, auto-scroll, pre-execution script
- **Screenshot Settings**: See the Screenshot section below

During scraping, the matching rule set is automatically selected based on product URL; unmatched products are skipped.

---

### 字段规则配置 / Field Rule Configuration

**中文**

每个字段规则包含：

| 配置项       | 说明                                            |
| ------------ | ----------------------------------------------- |
| 字段名称     | 导出 Excel 的列标题                             |
| CSS 选择器   | 支持多行（按顺序匹配第一个命中的）              |
| 字段类型     | 文本 / 数字 / 图片 / 图片列表 / HTML / 元素属性 |
| 是否为列表   | 提取多个值（图片列表自动勾选）                  |
| 图片存储方式 | 无 / 主图 / 详情图                              |
| 后处理函数   | JavaScript 函数，对提取结果进行二次处理         |
| 字段宽度     | 控制 Excel 导出时该列的宽度                     |

支持：

- **元素拾取器**：点击按钮后在页面上点选元素，自动生成选择器
- **测试匹配**：实时在当前页面验证选择器命中情况
- **拖拽排序**：左侧手柄拖动调整字段顺序

**English**

Each field rule includes:

| Setting               | Description                                           |
| --------------------- | ----------------------------------------------------- |
| Field Name            | Column header in the exported Excel                   |
| CSS Selector          | Multi-line supported (first match wins)               |
| Field Type            | Text / Number / Image / Image List / HTML / Attribute |
| Is List               | Extract multiple values (auto-checked for Image List) |
| Image Storage         | None / Main Image / Description Image                 |
| Post-process Function | JavaScript to transform the extracted value           |
| Column Width          | Controls Excel column width on export                 |

Features:

- **Element Picker**: Click to pick an element on the page; selector is auto-generated
- **Test Match**: Instantly verify the selector against the current page
- **Drag to Reorder**: Drag the left handle to rearrange fields

---

### 高级设置 / Advanced Settings

**中文**

| 配置项       | 说明                                          |
| ------------ | --------------------------------------------- |
| 页面加载判定 | 元素出现 / 元素消失，配合选择器和超时时间使用 |
| 自动滚动     | 采集前自动滚动页面，触发懒加载内容            |
| 前置脚本     | 在提取数据前执行自定义 JS（支持 async/await） |

**English**

| Setting        | Description                                                       |
| -------------- | ----------------------------------------------------------------- |
| Load Condition | Wait for element to appear / disappear, with selector and timeout |
| Auto Scroll    | Scroll the page before extraction to trigger lazy-loaded content  |
| Pre-script     | Run custom JS before data extraction (supports async/await)       |

---

## 步骤三：执行采集 / Step 3: Run Scraping

**中文**

| 配置项       | 说明                                                                     |
| ------------ | ------------------------------------------------------------------------ |
| 抓取间隔     | 相邻两个商品之间的等待时间（秒），建议 ≥ 5                              |
| 图片命名规则 | 主图 / 详情图的文件名格式（支持 `{sku}`、`{index}`、`{ext}` 变量） |
| 图片存储方式 | 合并模式（主图/详情图共用目录）/ 分离模式（按 SKU 子目录）               |

操作按钮：**开始 → 暂停 → 继续 → 停止**

右侧实时日志面板显示每条商品的采集状态、图片下载进度、截图结果等。

导出目录结构预览（第三个 Tab 底部）会实时反映当前配置，包括是否显示 `screenshots/` 文件夹。

**English**

| Setting         | Description                                                                                 |
| --------------- | ------------------------------------------------------------------------------------------- |
| Scrape Interval | Wait time (seconds) between products; recommended ≥ 5                                      |
| Image Naming    | File name pattern for main/description images (`{sku}`, `{index}`, `{ext}` variables) |
| Image Storage   | Merged (shared dirs) / Separated (per-SKU subdirectories)                                   |

Controls: **Start → Pause → Resume → Stop**

The real-time log panel on the right shows scraping status, image download progress, and screenshot results for each product.

The directory structure preview (bottom of Step 3 tab) updates live based on settings, including whether `screenshots/` is shown.

---

## 截图功能 / Screenshot Feature

**中文**

截图设置位于**详情页规则 Tab** 的字段列表下方，按规则集独立配置。

### 启用截图

勾选**启用截图**后，每个详情页抓取完成后自动截图。

### 截图模式

| 模式                             | 说明                                                      |
| -------------------------------- | --------------------------------------------------------- |
| 可见区域（默认）                 | 截取当前浏览器视口                                        |
| 全页面                           | 截取完整页面（含滚动区域），使用 Chrome DevTools Protocol |
| 滚动到指定元素，然后截取可见区域 | 滚动到目标元素后截取视口                                  |
| 只截取指定元素                   | 精确裁剪到元素边界，不含周围内容                          |

> "全页面"和"只截取指定元素"模式需要 `debugger` 权限，截图完成后立即释放。

### 元素选择器

- 填写 CSS 选择器，指定目标元素
- 点击**拾取器按钮**（准星图标）从页面上直接选取元素
- 点击**测试按钮**（对勾图标）验证选择器是否能正确命中元素，并显示元素位置和尺寸信息

### 文件命名与保存位置

- 文件名：`{SKU}.png`（特殊字符自动替换为 `_`）
- 保存位置：导出目录下的 `screenshots/` 子文件夹
- 采集开始时自动预建该文件夹

### 导入/导出

截图设置随规则集一起保存，支持导入和导出 JSON 规则文件。

---

**English**

Screenshot settings are located **below the field list** in the Detail Page Rules tab, configured per rule set independently.

### Enable Screenshot

When **Enable Screenshot** is checked, a screenshot is taken automatically after each detail page is scraped.

### Screenshot Modes

| Mode                                         | Description                                                                       |
| -------------------------------------------- | --------------------------------------------------------------------------------- |
| Visible Area (Default)                       | Captures the current browser viewport                                             |
| Full Page                                    | Captures the entire page including scrolled content, via Chrome DevTools Protocol |
| Scroll to element, then capture visible area | Scrolls to the target element, then captures the viewport                         |
| Capture only the specific element            | Precisely crops to the element's boundaries                                       |

> "Full Page" and "Capture only the specific element" modes require the `debugger` permission, which is released immediately after each screenshot.

### Element Selector

- Enter a CSS selector for the target element
- Click the **Picker button** (crosshair icon) to pick an element directly from the page
- Click the **Test button** (checkmark icon) to verify the selector matches correctly, showing the element's position and dimensions

### File Naming & Save Location

- Filename: `{SKU}.png` (special characters replaced with `_`)
- Saved to: `screenshots/` subfolder inside the export directory
- The folder is automatically pre-created when scraping begins

### Import / Export

Screenshot settings are saved as part of the rule set and are included in JSON rule set import/export.

---

## 导出结果 / Export Results

**中文**

采集完成后自动导出：

- **Excel 文件**（`.xlsx`，含时间戳）：含表头样式、边框、自动筛选、冻结首行
- **图片文件**：按规则自动下载到对应目录
- **截图文件**：保存到 `screenshots/` 目录（若启用）

### 导出目录结构示例

```
导出目录/
├── 商品数据_2026-04-22T13-30-00.xlsx
├── images/
│   ├── ABC001/
│   │   ├── main/
│   │   │   ├── 01_ABC001.jpg
│   │   │   └── 02_ABC001.jpg
│   │   └── description/
│   │       └── ABC001_01.jpg
│   └── XYZ002/
│       └── ...
└── screenshots/          # 仅启用截图时存在
    ├── ABC001.png
    └── XYZ002.png
```

**English**

After scraping, the following are exported automatically:

- **Excel file** (`.xlsx` with timestamp): styled headers, borders, auto-filter, frozen first row
- **Image files**: downloaded to the configured directories
- **Screenshots**: saved to `screenshots/` directory (if enabled)

### Export Directory Structure Example

```
export-dir/
├── ProductData_2026-04-22T13-30-00.xlsx
├── images/
│   ├── ABC001/
│   │   ├── main/
│   │   │   ├── 01_ABC001.jpg
│   │   │   └── 02_ABC001.jpg
│   │   └── description/
│   │       └── ABC001_01.jpg
│   └── XYZ002/
│       └── ...
└── screenshots/          # Only present when screenshot is enabled
    ├── ABC001.png
    └── XYZ002.png
```

---

## 规则管理 / Rule Management

**中文**

| 操作              | 说明                                         |
| ----------------- | -------------------------------------------- |
| 导出规则集        | 将当前规则集（含截图设置）保存为 JSON 文件   |
| 导入规则集        | 从 JSON 文件加载规则集，可选择立即应用到当前 |
| 备份自定义规则    | 导出所有自定义规则集为一个 JSON 备份文件     |
| 全局配置导出/导入 | 包含所有规则集、设置项、列表配置的完整备份   |

**English**

| Action                      | Description                                                           |
| --------------------------- | --------------------------------------------------------------------- |
| Export Rule Set             | Save current rule set (including screenshot settings) as a JSON file  |
| Import Rule Set             | Load a rule set from a JSON file; optionally apply immediately        |
| Backup Custom Rules         | Export all custom rule sets as a single JSON backup                   |
| Global Config Export/Import | Full backup including all rule sets, settings, and list configuration |

---

## 注意事项 / Notes

**中文**

- 抓取间隔建议 ≥ 5 秒，避免对目标网站造成压力
- 列表模式使用前建议先测试选择器和字段命中情况
- 部分站点需要登录后才能抓取完整数据
- 列表翻页最多 200 页，超限自动停止
- 使用全页面截图时，页面会短暂被 Debugger 附加，部分有反爬机制的站点可能会检测到
- 截图功能要求标签页在采集时保持激活状态（插件会自动切换）

**English**

- Set scrape interval ≥ 5 seconds to avoid overloading the target site
- Always test selectors and field matching before starting a full list scrape
- Some sites require you to be logged in to scrape full product data
- List pagination is capped at 200 pages; scraping stops automatically after that
- Full-page screenshot mode briefly attaches a Debugger to the tab; some anti-bot systems may detect this
- Screenshot capture requires the tab to be active; the extension switches to it automatically

Apr 27, 2026

permissions

activeTab, tabs, storage, scripting, downloads, offscreen, sidePanel

activeTab, tabs, storage, scripting, downloads, offscreen, sidePanel, debugger

Permissions & access

Permissions: activeTabtabsstoragescriptingdownloadsoffscreensidePaneldebugger
Host access: <all_urls>

Screenshots

About

# 蜂采 · 商品数据采集器 — 功能介绍与使用说明

# Fengcai · Product Data Scraper — Feature Guide & User Manual

---

## 目录 / Table of Contents

- [简介 / Introduction](#简介--introduction)
- [界面概览 / UI Overview](#界面概览--ui-overview)
- [步骤一：商品列表 / Step 1: Product List](#步骤一商品列表--step-1-product-list)
- [步骤二：详情页规则 / Step 2: Detail Page Rules](#步骤二详情页规则--step-2-detail-page-rules)
- [步骤三：执行采集 / Step 3: Run Scraping](#步骤三执行采集--step-3-run-scraping)
- [截图功能 / Screenshot Feature](#截图功能--screenshot-feature)
- [导出结果 / Export Results](#导出结果--export-results)
- [规则管理 / Rule Management](#规则管理--rule-management)
- [注意事项 / Notes](#注意事项--notes)

---

## 简介 / Introduction

**蜂采**是一款 Chrome 侧边栏扩展，专为电商选品、商品数据整理场景设计。无需编程基础，通过点选配置即可批量抓取商品详情页数据，并自动下载图片、截图，导出 Excel。

**Fengcai** is a Chrome side-panel extension for e-commerce product research and data collection. No coding required — configure rules by clicking, then batch-scrape product pages, auto-download images and screenshots, and export to Excel.

---

## 界面概览 / UI Overview

**中文**

插件以侧边栏形式运行，顶部为三步流程导航：

| 步骤 | 标签       | 说明                                 |
| ---- | ---------- | ------------------------------------ |
| 1    | 商品列表   | 管理待采集的商品 SKU 和链接          |
| 2    | 详情页规则 | 配置各站点的字段提取规则与截图设置   |
| 3    | 执行采集   | 配置采集参数、启动采集、查看实时日志 |

**English**

The extension runs as a side panel with a 3-step navigation:

| Step | Tab               | Description                                                       |
| ---- | ----------------- | ----------------------------------------------------------------- |
| 1    | Product List      | Manage SKUs and URLs to scrape                                    |
| 2    | Detail Page Rules | Configure field extraction rules and screenshot settings per site |
| 3    | Run Scraping      | Set scraping parameters, start scraping, view real-time logs      |

---

## 步骤一：商品列表 / Step 1: Product List

### 两种模式 / Two Modes

**中文**

**① 手动输入模式（默认）**

- 导入 CSV 或 Excel 文件（需包含 `SKU`、`URL` 两列）
- 或逐行手动输入，格式：`SKU 空格 URL`
- 列表中 SKU 和 URL 均可点击，在新标签页中打开对应链接
- 支持全选/单选后批量删除

**② 列表页模式**

- 适合从分类页、搜索结果页批量采集
- 配置项：
  - **列表入口 URL**：采集起始页面
  - **列表项选择器**：定位每个商品卡片的 CSS 选择器
  - **列表字段**：在列表页直接提取的字段（标题、价格、主图等）
  - **翻页策略**：下一页按钮 / 无限滚动，支持配置终止条件
  - **进入详情页补充抓取**：勾选后可配置详情链接选择器，抓取详情页数据

> 若列表入口 URL 与当前浏览器活动标签页一致，插件会直接复用该页面，不会新开标签页。

**English**

**① Manual Mode (Default)**

- Import a CSV or Excel file (must include `SKU` and `URL` columns)
- Or enter products manually, one per line: `SKU SPACE URL`
- Both SKU and URL in the list are clickable links (open in new tab)
- Supports select-all / single-select for bulk deletion

**② List Page Mode**

- Ideal for bulk collection from category or search results pages
- Configuration:
  - **Entry URL**: Starting page for scraping
  - **Item Selector**: CSS selector to locate each product card
  - **List Fields**: Fields extracted directly from the list page (title, price, image, etc.)
  - **Pagination**: Next-page button or infinite scroll, with configurable stop conditions
  - **Enter Detail Page**: When checked, configures detail link selector to also scrape detail pages

> If the entry URL matches the currently active browser tab, the extension reuses that tab instead of opening a new one.

---

## 步骤二：详情页规则 / Step 2: Detail Page Rules

### 规则集 / Rule Sets

**中文**

每个**规则集**对应一个或多个站点，包含：

- **名称**：标识该规则集
- **URL 匹配**：正则表达式，可用逗号分隔多个，用于自动匹配商品 URL
- **字段规则列表**：定义要提取的字段
- **高级设置**：页面加载判定、自动滚动、前置脚本
- **截图设置**：见下方截图功能章节

采集时根据商品 URL 自动匹配规则集；无匹配则跳过该商品。

**English**

Each **rule set** corresponds to one or more sites and contains:

- **Name**: Identifier for the rule set
- **URL Pattern**: Regex patterns (comma-separated) to auto-match product URLs
- **Field Rules**: Define which fields to extract
- **Advanced Settings**: Page load conditions, auto-scroll, pre-execution script
- **Screenshot Settings**: See the Screenshot section below

During scraping, the matching rule set is automatically selected based on product URL; unmatched products are skipped.

---

### 字段规则配置 / Field Rule Configuration

**中文**

每个字段规则包含：

| 配置项       | 说明                                            |
| ------------ | ----------------------------------------------- |
| 字段名称     | 导出 Excel 的列标题                             |
| CSS 选择器   | 支持多行（按顺序匹配第一个命中的）              |
| 字段类型     | 文本 / 数字 / 图片 / 图片列表 / HTML / 元素属性 |
| 是否为列表   | 提取多个值（图片列表自动勾选）                  |
| 图片存储方式 | 无 / 主图 / 详情图                              |
| 后处理函数   | JavaScript 函数，对提取结果进行二次处理         |
| 字段宽度     | 控制 Excel 导出时该列的宽度                     |

支持：

- **元素拾取器**：点击按钮后在页面上点选元素，自动生成选择器
- **测试匹配**：实时在当前页面验证选择器命中情况
- **拖拽排序**：左侧手柄拖动调整字段顺序

**English**

Each field rule includes:

| Setting               | Description                                           |
| --------------------- | ----------------------------------------------------- |
| Field Name            | Column header in the exported Excel                   |
| CSS Selector          | Multi-line supported (first match wins)               |
| Field Type            | Text / Number / Image / Image List / HTML / Attribute |
| Is List               | Extract multiple values (auto-checked for Image List) |
| Image Storage         | None / Main Image / Description Image                 |
| Post-process Function | JavaScript to transform the extracted value           |
| Column Width          | Controls Excel column width on export                 |

Features:

- **Element Picker**: Click to pick an element on the page; selector is auto-generated
- **Test Match**: Instantly verify the selector against the current page
- **Drag to Reorder**: Drag the left handle to rearrange fields

---

### 高级设置 / Advanced Settings

**中文**

| 配置项       | 说明                                          |
| ------------ | --------------------------------------------- |
| 页面加载判定 | 元素出现 / 元素消失，配合选择器和超时时间使用 |
| 自动滚动     | 采集前自动滚动页面，触发懒加载内容            |
| 前置脚本     | 在提取数据前执行自定义 JS（支持 async/await） |

**English**

| Setting        | Description                                                       |
| -------------- | ----------------------------------------------------------------- |
| Load Condition | Wait for element to appear / disappear, with selector and timeout |
| Auto Scroll    | Scroll the page before extraction to trigger lazy-loaded content  |
| Pre-script     | Run custom JS before data extraction (supports async/await)       |

---

## 步骤三：执行采集 / Step 3: Run Scraping

**中文**

| 配置项       | 说明                                                                     |
| ------------ | ------------------------------------------------------------------------ |
| 抓取间隔     | 相邻两个商品之间的等待时间（秒），建议 ≥ 5                              |
| 图片命名规则 | 主图 / 详情图的文件名格式（支持 `{sku}`、`{index}`、`{ext}` 变量） |
| 图片存储方式 | 合并模式（主图/详情图共用目录）/ 分离模式（按 SKU 子目录）               |

操作按钮：**开始 → 暂停 → 继续 → 停止**

右侧实时日志面板显示每条商品的采集状态、图片下载进度、截图结果等。

导出目录结构预览（第三个 Tab 底部）会实时反映当前配置，包括是否显示 `screenshots/` 文件夹。

**English**

| Setting         | Description                                                                                 |
| --------------- | ------------------------------------------------------------------------------------------- |
| Scrape Interval | Wait time (seconds) between products; recommended ≥ 5                                      |
| Image Naming    | File name pattern for main/description images (`{sku}`, `{index}`, `{ext}` variables) |
| Image Storage   | Merged (shared dirs) / Separated (per-SKU subdirectories)                                   |

Controls: **Start → Pause → Resume → Stop**

The real-time log panel on the right shows scraping status, image download progress, and screenshot results for each product.

The directory structure preview (bottom of Step 3 tab) updates live based on settings, including whether `screenshots/` is shown.

---

## 截图功能 / Screenshot Feature

**中文**

截图设置位于**详情页规则 Tab** 的字段列表下方，按规则集独立配置。

### 启用截图

勾选**启用截图**后，每个详情页抓取完成后自动截图。

### 截图模式

| 模式                             | 说明                                                      |
| -------------------------------- | --------------------------------------------------------- |
| 可见区域（默认）                 | 截取当前浏览器视口                                        |
| 全页面                           | 截取完整页面（含滚动区域），使用 Chrome DevTools Protocol |
| 滚动到指定元素，然后截取可见区域 | 滚动到目标元素后截取视口                                  |
| 只截取指定元素                   | 精确裁剪到元素边界，不含周围内容                          |

> "全页面"和"只截取指定元素"模式需要 `debugger` 权限，截图完成后立即释放。

### 元素选择器

- 填写 CSS 选择器，指定目标元素
- 点击**拾取器按钮**（准星图标）从页面上直接选取元素
- 点击**测试按钮**（对勾图标）验证选择器是否能正确命中元素，并显示元素位置和尺寸信息

### 文件命名与保存位置

- 文件名：`{SKU}.png`（特殊字符自动替换为 `_`）
- 保存位置：导出目录下的 `screenshots/` 子文件夹
- 采集开始时自动预建该文件夹

### 导入/导出

截图设置随规则集一起保存，支持导入和导出 JSON 规则文件。

---

**English**

Screenshot settings are located **below the field list** in the Detail Page Rules tab, configured per rule set independently.

### Enable Screenshot

When **Enable Screenshot** is checked, a screenshot is taken automatically after each detail page is scraped.

### Screenshot Modes

| Mode                                         | Description                                                                       |
| -------------------------------------------- | --------------------------------------------------------------------------------- |
| Visible Area (Default)                       | Captures the current browser viewport                                             |
| Full Page                                    | Captures the entire page including scrolled content, via Chrome DevTools Protocol |
| Scroll to element, then capture visible area | Scrolls to the target element, then captures the viewport                         |
| Capture only the specific element            | Precisely crops to the element's boundaries                                       |

> "Full Page" and "Capture only the specific element" modes require the `debugger` permission, which is released immediately after each screenshot.

### Element Selector

- Enter a CSS selector for the target element
- Click the **Picker button** (crosshair icon) to pick an element directly from the page
- Click the **Test button** (checkmark icon) to verify the selector matches correctly, showing the element's position and dimensions

### File Naming & Save Location

- Filename: `{SKU}.png` (special characters replaced with `_`)
- Saved to: `screenshots/` subfolder inside the export directory
- The folder is automatically pre-created when scraping begins

### Import / Export

Screenshot settings are saved as part of the rule set and are included in JSON rule set import/export.

---

## 导出结果 / Export Results

**中文**

采集完成后自动导出：

- **Excel 文件**（`.xlsx`，含时间戳）：含表头样式、边框、自动筛选、冻结首行
- **图片文件**：按规则自动下载到对应目录
- **截图文件**：保存到 `screenshots/` 目录（若启用）

### 导出目录结构示例

```
导出目录/
├── 商品数据_2026-04-22T13-30-00.xlsx
├── images/
│   ├── ABC001/
│   │   ├── main/
│   │   │   ├── 01_ABC001.jpg
│   │   │   └── 02_ABC001.jpg
│   │   └── description/
│   │       └── ABC001_01.jpg
│   └── XYZ002/
│       └── ...
└── screenshots/          # 仅启用截图时存在
    ├── ABC001.png
    └── XYZ002.png
```

**English**

After scraping, the following are exported automatically:

- **Excel file** (`.xlsx` with timestamp): styled headers, borders, auto-filter, frozen first row
- **Image files**: downloaded to the configured directories
- **Screenshots**: saved to `screenshots/` directory (if enabled)

### Export Directory Structure Example

```
export-dir/
├── ProductData_2026-04-22T13-30-00.xlsx
├── images/
│   ├── ABC001/
│   │   ├── main/
│   │   │   ├── 01_ABC001.jpg
│   │   │   └── 02_ABC001.jpg
│   │   └── description/
│   │       └── ABC001_01.jpg
│   └── XYZ002/
│       └── ...
└── screenshots/          # Only present when screenshot is enabled
    ├── ABC001.png
    └── XYZ002.png
```

---

## 规则管理 / Rule Management

**中文**

| 操作              | 说明                                         |
| ----------------- | -------------------------------------------- |
| 导出规则集        | 将当前规则集（含截图设置）保存为 JSON 文件   |
| 导入规则集        | 从 JSON 文件加载规则集，可选择立即应用到当前 |
| 备份自定义规则    | 导出所有自定义规则集为一个 JSON 备份文件     |
| 全局配置导出/导入 | 包含所有规则集、设置项、列表配置的完整备份   |

**English**

| Action                      | Description                                                           |
| --------------------------- | --------------------------------------------------------------------- |
| Export Rule Set             | Save current rule set (including screenshot settings) as a JSON file  |
| Import Rule Set             | Load a rule set from a JSON file; optionally apply immediately        |
| Backup Custom Rules         | Export all custom rule sets as a single JSON backup                   |
| Global Config Export/Import | Full backup including all rule sets, settings, and list configuration |

---

## 注意事项 / Notes

**中文**

- 抓取间隔建议 ≥ 5 秒，避免对目标网站造成压力
- 列表模式使用前建议先测试选择器和字段命中情况
- 部分站点需要登录后才能抓取完整数据
- 列表翻页最多 200 页，超限自动停止
- 使用全页面截图时，页面会短暂被 Debugger 附加，部分有反爬机制的站点可能会检测到
- 截图功能要求标签页在采集时保持激活状态（插件会自动切换）

**English**

- Set scrape interval ≥ 5 seconds to avoid overloading the target site
- Always test selectors and field matching before starting a full list scrape
- Some sites require you to be logged in to scrape full product data
- List pagination is capped at 200 pages; scraping stops automatically after that
- Full-page screenshot mode briefly attaches a Debugger to the tab; some anti-bot systems may detect this
- Screenshot capture requires the tab to be active; the extension switches to it automatically