Add crawler-reverse skill

2026-03-10 08:58:27 +08:00
commit 8cbf3a4844
5 changed files with 383 additions and 0 deletions
--- a/crawler-reverse/LICENSE
+++ b/crawler-reverse/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/crawler-reverse/README.md
+++ b/crawler-reverse/README.md
@@ -0,0 +1,143 @@
+# crawler-reverse
+
+中文 | [English](#english)
+
+一个适用于 **OpenClaw 风格技能仓库** 的可复用技能包，用于在**合法授权前提下**进行网页抓包分析、前端 JS 混淆排查、请求签名定位、反爬链路梳理，以及浏览器辅助逆向分析。
+
+## 这个技能能做什么
+
+当你需要下面这些能力时，可以使用 `crawler-reverse`：
+
+- 分析页面请求链路
+- 查找 `sign`、`token`、`timestamp`、`nonce` 或自定义 Header 的生成位置
+- 比较浏览器请求与脚本请求差异
+- 排查与请求相关的前端 JS 混淆逻辑
+- 分析 Cookie / localStorage / sessionStorage / Header 依赖
+- 复现一个已观察到的请求流程，并输出最小验证脚本
+
+## 安全边界
+
+这个技能**仅用于合法授权、正当测试、自有系统调试、教学演示或明确获准的分析场景**。
+
+**不应用于：**
+
+- 未授权访问
+- 绕过登录、权限、付费墙、验证码或限流
+- 撞库、账号滥用
+- 未经授权的大规模采集
+- 为攻击性滥用提供规避安全控制方案
+
+如果授权范围不明确，应该先确认再继续。
+
+## 仓库内容
+
+- `SKILL.md` — 技能主说明
+- `skill.json` — 基础元数据，可用于索引/注册
+- `examples/example.md` — 示例提示词与使用方式
+- `LICENSE` — MIT 许可证
+
+## 推荐用法
+
+典型分析流程：
+
+1. 在浏览器中复现用户操作
+2. 观察 XHR / fetch / websocket / document 请求
+3. 识别动态参数
+4. 追踪这些参数的生成位置
+5. 对比浏览器请求与脚本请求
+6. 产出最小验证脚本
+
+## 推荐配套工具
+
+这个技能适合与以下工具配合使用：
+
+- 浏览器自动化 / 浏览器检查工具
+- 本地文件读取工具
+- shell / grep / ripgrep
+- 小型 Python / JavaScript 验证脚本
+
+## 安装方式
+
+将该目录复制到你的 OpenClaw 兼容 skills 目录，或根据你的 OpenClaw 配置将该 GitHub 仓库作为自定义技能来源引入。
+
+## 技能摘要
+
+- **名称：** crawler-reverse
+- **分类：** web-analysis / reverse-engineering / debugging
+- **主要输出：** 请求链路分析、参数来源说明、安全复现步骤
+
+## 说明
+
+这个仓库目前采用**通用 GitHub skill 仓库布局**生成，后续如果需要适配某个 OpenClaw 技能注册中心或特定格式，可以再进一步调整。
+
+---
+
+## English
+
+A reusable OpenClaw-style skill package for **authorized** web traffic analysis, JS deobfuscation support, request-signature tracing, anti-bot workflow inspection, and browser-assisted reverse engineering.
+
+### What this skill is for
+
+Use `crawler-reverse` when you need to:
+
+- inspect a page's request chain
+- locate where `sign`, `token`, `timestamp`, `nonce`, or custom headers are generated
+- compare browser requests with script requests
+- analyze obfuscated frontend JS related to requests
+- understand cookie / localStorage / sessionStorage / header dependencies
+- reproduce an observed request flow with a minimal script
+
+### Safety boundary
+
+This skill is intended **only for authorized, defensive, educational, self-owned, or explicitly permitted analysis**.
+
+It must **not** be used for:
+
+- unauthorized access
+- bypassing authentication, paywalls, permissions, captchas, or rate limits
+- credential stuffing / account abuse
+- large-scale scraping in violation of authorization
+- evasion of security controls for abusive purposes
+
+If authorization is unclear, ask first.
+
+### Package contents
+
+- `SKILL.md` — full skill instructions
+- `skill.json` — basic metadata for registry/indexing
+- `examples/example.md` — example invocation patterns
+- `LICENSE` — MIT
+
+### Suggested usage
+
+Typical workflow:
+
+1. Reproduce the user action in a browser
+2. Observe XHR / fetch / websocket / document requests
+3. Identify dynamic parameters
+4. Trace where they are generated
+5. Compare browser and script requests
+6. Produce a minimal validation script
+
+### Recommended tools
+
+This skill is designed to pair well with tools such as:
+
+- browser automation / browser inspection tools
+- local file readers
+- shell / grep / ripgrep
+- small Python or JavaScript validation scripts
+
+### Install
+
+Copy this folder into your OpenClaw-compatible skills directory, or add it as a GitHub-hosted custom skill source depending on your OpenClaw setup.
+
+### Skill summary
+
+- **Name:** crawler-reverse
+- **Category:** web-analysis / reverse-engineering / debugging
+- **Primary output:** request-chain analysis, parameter-origin explanation, safe reproduction steps
+
+### Publishing note
+
+This package was generated in a generic GitHub skill-repo layout so it can be adapted to a specific OpenClaw registry format later if needed.
--- a/crawler-reverse/SKILL.md
+++ b/crawler-reverse/SKILL.md
@@ -0,0 +1,178 @@
+---
+name: crawler-reverse
+description: "Use this skill for authorized web traffic analysis, JS obfuscation troubleshooting, request-signature tracing, replay debugging, anti-bot workflow inspection, cookie/token/header origin analysis, and browser-assisted reverse engineering. Refuse or redirect high-risk unauthorized misuse such as bypassing access controls, abusive scraping, credential abuse, or evasion for harmful purposes."
+metadata:
+  { "emoji": "🕷️", "category": "web-analysis", "authoring_format": "generic-openclaw" }
+---
+
+# Crawler Reverse
+
+Use this skill for **authorized** browser/network analysis, request tracing, frontend JS reverse engineering, and anti-bot workflow debugging.
+
+## When to use
+
+Use this skill when the task involves:
+
+- analyzing a page's request chain
+- locating where signature parameters are generated
+- understanding how token / cookie / header values are produced
+- tracing request-building logic inside frontend JavaScript
+- comparing browser behavior with Python / JS script behavior
+- replaying an observed request for validation
+- isolating anti-bot request differences
+
+## Goals
+
+Break the problem into layers:
+
+1. **Entry point**
+   - user action
+   - button click
+   - route change
+   - form submit
+   - lazy load
+
+2. **Network activity**
+   - XHR
+   - fetch
+   - document
+   - websocket
+   - static resources
+
+3. **Dynamic inputs**
+   - query
+   - body
+   - header
+   - cookie
+   - localStorage
+   - sessionStorage
+
+4. **Generation logic**
+   - signature
+   - timestamp
+   - nonce
+   - random values
+   - encryption
+   - serialization
+   - compression
+
+5. **Reproduction**
+   - minimal script
+   - browser automation flow
+   - diff against browser traffic
+
+## Recommended workflow
+
+### A. Confirm authorization and scope first
+
+Clarify:
+
+- target site/system
+- whether the user has permission or a legitimate testing purpose
+- whether the focus is page behavior, API flow, signature generation, or login/session analysis
+
+If the request is clearly about unauthorized access, bypassing protections, mass abuse, or harmful evasion, do not provide an operational bypass.
+
+### B. Observe page behavior and requests
+
+1. open the page
+2. reproduce the relevant user action
+3. record the important requests
+4. capture:
+   - URL
+   - method
+   - status
+   - headers
+   - payload
+   - response structure
+   - page state before/after
+
+### C. Trace dynamic parameter origins
+
+Prioritize searching for:
+
+- `sign`
+- `token`
+- `timestamp`
+- `nonce`
+- `secret`
+- `encrypt`
+- `signature`
+- `authorization`
+- custom `x-` headers
+
+Methods:
+
+- search source files for keywords
+- inspect page variables/functions in-browser
+- trace upward from the request call site
+- compare multiple requests to find changing fields
+
+### D. Check common anti-bot points
+
+Look for:
+
+- cookie-bound sessions
+- CSRF tokens
+- dynamic headers
+- encrypted / wrapped request bodies
+- timestamp/random participation in signatures
+- temporary tokens from websocket or bootstrap APIs
+- parameters assembled after render
+- localStorage/sessionStorage/memory dependencies
+
+### E. Produce a safe, testable output
+
+Prefer output that includes:
+
+- key request list
+- explanation of parameter origins
+- summary of generation logic
+- minimal validation steps
+- if appropriate and safe, a minimal verification script
+
+## Recommended tool pairing
+
+Useful companions include:
+
+- browser automation / browser inspection tools
+- local text/file readers
+- shell search tools
+- short Python / JavaScript validation scripts
+
+## Output template
+
+- **Target page/action:**
+- **Key requests:**
+- **Suspicious/dynamic parameters:**
+- **Evidence and origin hypothesis:**
+- **Open questions:**
+- **Suggested next step:**
+
+## Safety boundary
+
+### Allowed
+
+- authorized API/page analysis
+- debugging your own system
+- tracing frontend parameter generation
+- local validation scripts
+- reproducing observed browser behavior for diagnosis
+
+### Not allowed
+
+- unauthorized bypass of access controls
+- bypassing captcha/paywall/permission systems with abuse intent
+- abusive scraping at scale
+- credential abuse
+- operational evasion guidance for harmful misuse
+
+If intent or authorization is unclear, ask before proceeding.
+
+## Practical reminders
+
+- start from replaying an already observed legitimate request
+- identify the smallest changing fields first
+- compare at least two requests when analyzing signatures
+- for large bundles, center analysis around the actual request trigger path
+- if a visible browser helps, use a visible-browser workflow
--- a/crawler-reverse/examples/example.md
+++ b/crawler-reverse/examples/example.md
@@ -0,0 +1,21 @@
+# Example Usage
+
+## Example prompts
+
+- Analyze the request chain of this page and identify where the signature is generated.
+- Compare the browser request and my Python request and explain why the script fails.
+- Help me trace where this token/header comes from in the frontend.
+- Inspect this page's JS and find the request-building logic.
+- Reproduce this observed API call with a minimal validation script.
+
+## Expected outputs
+
+- key request list
+- dynamic field explanation
+- request diff summary
+- likely signature origin
+- next debugging steps
+
+## Safety reminder
+
+Use only for systems you own, are authorized to test, or are analyzing for legitimate educational/defensive purposes.
--- a/crawler-reverse/skill.json
+++ b/crawler-reverse/skill.json
@@ -0,0 +1,20 @@
+{
+  "name": "crawler-reverse",
+  "version": "1.0.0",
+  "title": "Crawler Reverse",
+  "description": "Authorized web traffic analysis, JS obfuscation troubleshooting, request-signature tracing, anti-bot workflow inspection, and browser-assisted reverse engineering.",
+  "category": "web-analysis",
+  "tags": [
+    "reverse-engineering",
+    "web-analysis",
+    "anti-bot",
+    "request-signature",
+    "debugging",
+    "browser"
+  ],
+  "author": "Generated with ChatGPT",
+  "license": "MIT",
+  "entry": "SKILL.md",
+  "repository_format": "generic-openclaw",
+  "safe_use_only": true
+}