Xiaohongshu Automation Skills: High Compliance Risk, Extract Reusable Browser Components Only
Xiaohongshu (XHS/RED) is one of China's most important content social platforms with 300M+ MAU. Automation needs center on data scraping, content posting, and account management. But unlike X/Twitter (which has an official API), XHS has no public API — all automation is reverse-engineered, creating significantly higher compliance risk than other platforms.
Leading Open Source Projects
NanmiCoder/MediaCrawler (~18k stars) — Multi-platform crawler framework supporting XHS/Douyin/Weibo/Bilibili/Zhihu. Based on Playwright + fingerprint spoofing, scrapes notes/comments/user data. Apache-2.0 license, but the project has added a disclaimer explicitly warning users to bear legal risks themselves.
ReaJason/xhs (~2k stars) — Python SDK wrapping XHS Web API, supporting login/posting/search. Relatively lightweight but also based on reverse-engineered interfaces.
Commercial SaaS: Chanmama, Xinhong, Qiangua etc., monthly fees from hundreds to thousands of RMB, providing data analysis + automated placement. These providers obtain data through commercial partnerships, with better compliance than open-source scrapers.
Core Capabilities
| Capability | Open Source | Commercial |
|---|---|---|
| Data scraping | MediaCrawler core | Chanmama/Xinhong |
| Auto-posting | Partial (high ban risk) | Platform prohibits |
| Comments/DMs | Extremely risky | Compliance red line |
| Account matrix management | Community solutions | MCN tools |
| SEO/keyword optimization | Tag + content analysis | Core selling point |
| Data dashboard | Self-built needed | Standard feature |
Compliance Risk Analysis
This is the most critical section. XHS automation compliance risk far exceeds other platforms:
1. Anti-Scraping ToS Violation — XHS User Agreement explicitly prohibits automated scraping. Violations can trigger civil lawsuits with the platform claiming economic damages.
2. Criminal Law Risk — Large-scale user data scraping may violate "Illegal Acquisition of Computer Information System Data" (Criminal Law Article 285). Multiple crawler teams have been prosecuted in 2024-2025, with sentences from months to years.
3. Personal Information Protection Law — Scraping content containing personal information (usernames, avatars, locations, consumption records) violates PIPL. Even if data is publicly visible, bulk collection requires user consent.
4. Account Ban Risk — Platform anti-scraping continuously upgrades (device fingerprinting + behavioral analysis + risk control models). Automated account ban rates are extremely high. A single ban may permanently blacklist associated devices.
5. Enforcement Trends — Enforcement has tightened in recent years with multiple criminal crawler cases resulting in clear convictions. MediaCrawler has added disclaimers on its homepage warning users to bear legal consequences themselves.
Comparison with Our Existing Skills
| Dimension | xurl (Twitter) | boss-channel-run | browser-use-setup | XHS Automation |
|---|---|---|---|---|
| Platform | X/Twitter | BOSS Zhipin | General browser | Xiaohongshu |
| Core operation | Post/interact | CDP deployment | Browser control | Scrape/post |
| API method | Official API | Platform interface | Playwright | Reverse-engineered Web API |
| Compliance risk | Medium (has API) | Low (commercial partnership) | Low | High (no official API) |
Key difference: xurl is based on the official API with controllable compliance risk. XHS has no official API — all automation is essentially "black operations."
Borrowing Value and Recommendation
Do NOT build as a standalone skill, but extract reusable components.
Extractable generic capabilities:
- Playwright fingerprint spoofing — MediaCrawler's anti-detection approach (browser fingerprints, Canvas noise, WebGL parameters) can be reused in browser-use-setup as generic browser anti-detection capability
- Cookie/session management — QR code login + Cookie pool approach is reusable for any browser automation scenario requiring login sessions
- Data structuring models — Note/user/comment data model design applicable to other content platform data collection
- Multi-platform crawler architecture — Modular design thinking for future platform expansion reference
Recommended path for XHS data needs:
Prioritize compliant data service providers like Chanmama/Xinhong APIs. For posting automation, build only assistive creation tools (content generation + manual posting), avoiding fully automated pipelines. Content creation can be covered by the script-writer skill, with human confirmation before posting.
Sources:
- MediaCrawler: https://github.com/NanmiCoder/MediaCrawler (Apache-2.0, 18k stars)
- xhs Python SDK: https://github.com/ReaJason/xhs (MIT, 2k stars)
- Chanmama: https://www.chanmama.com/ (Commercial SaaS)
- Xinhong: https://www.newrank.cn/ (Commercial SaaS)
- xurl: Hermes Agent built-in skill
- browser-use-setup: Hermes Agent built-in skill