Files
muzi_skills/crawler-reverse/SKILL.md
2026-03-10 08:58:27 +08:00

179 lines
4.4 KiB
Markdown

---
name: crawler-reverse
description: "Use this skill for authorized web traffic analysis, JS obfuscation troubleshooting, request-signature tracing, replay debugging, anti-bot workflow inspection, cookie/token/header origin analysis, and browser-assisted reverse engineering. Refuse or redirect high-risk unauthorized misuse such as bypassing access controls, abusive scraping, credential abuse, or evasion for harmful purposes."
metadata:
{ "emoji": "🕷️", "category": "web-analysis", "authoring_format": "generic-openclaw" }
---
# Crawler Reverse
Use this skill for **authorized** browser/network analysis, request tracing, frontend JS reverse engineering, and anti-bot workflow debugging.
## When to use
Use this skill when the task involves:
- analyzing a page's request chain
- locating where signature parameters are generated
- understanding how token / cookie / header values are produced
- tracing request-building logic inside frontend JavaScript
- comparing browser behavior with Python / JS script behavior
- replaying an observed request for validation
- isolating anti-bot request differences
## Goals
Break the problem into layers:
1. **Entry point**
- user action
- button click
- route change
- form submit
- lazy load
2. **Network activity**
- XHR
- fetch
- document
- websocket
- static resources
3. **Dynamic inputs**
- query
- body
- header
- cookie
- localStorage
- sessionStorage
4. **Generation logic**
- signature
- timestamp
- nonce
- random values
- encryption
- serialization
- compression
5. **Reproduction**
- minimal script
- browser automation flow
- diff against browser traffic
## Recommended workflow
### A. Confirm authorization and scope first
Clarify:
- target site/system
- whether the user has permission or a legitimate testing purpose
- whether the focus is page behavior, API flow, signature generation, or login/session analysis
If the request is clearly about unauthorized access, bypassing protections, mass abuse, or harmful evasion, do not provide an operational bypass.
### B. Observe page behavior and requests
1. open the page
2. reproduce the relevant user action
3. record the important requests
4. capture:
- URL
- method
- status
- headers
- payload
- response structure
- page state before/after
### C. Trace dynamic parameter origins
Prioritize searching for:
- `sign`
- `token`
- `timestamp`
- `nonce`
- `secret`
- `encrypt`
- `signature`
- `authorization`
- custom `x-` headers
Methods:
- search source files for keywords
- inspect page variables/functions in-browser
- trace upward from the request call site
- compare multiple requests to find changing fields
### D. Check common anti-bot points
Look for:
- cookie-bound sessions
- CSRF tokens
- dynamic headers
- encrypted / wrapped request bodies
- timestamp/random participation in signatures
- temporary tokens from websocket or bootstrap APIs
- parameters assembled after render
- localStorage/sessionStorage/memory dependencies
### E. Produce a safe, testable output
Prefer output that includes:
- key request list
- explanation of parameter origins
- summary of generation logic
- minimal validation steps
- if appropriate and safe, a minimal verification script
## Recommended tool pairing
Useful companions include:
- browser automation / browser inspection tools
- local text/file readers
- shell search tools
- short Python / JavaScript validation scripts
## Output template
- **Target page/action:**
- **Key requests:**
- **Suspicious/dynamic parameters:**
- **Evidence and origin hypothesis:**
- **Open questions:**
- **Suggested next step:**
## Safety boundary
### Allowed
- authorized API/page analysis
- debugging your own system
- tracing frontend parameter generation
- local validation scripts
- reproducing observed browser behavior for diagnosis
### Not allowed
- unauthorized bypass of access controls
- bypassing captcha/paywall/permission systems with abuse intent
- abusive scraping at scale
- credential abuse
- operational evasion guidance for harmful misuse
If intent or authorization is unclear, ask before proceeding.
## Practical reminders
- start from replaying an already observed legitimate request
- identify the smallest changing fields first
- compare at least two requests when analyzing signatures
- for large bundles, center analysis around the actual request trigger path
- if a visible browser helps, use a visible-browser workflow