docker安装firecrawl并使用
4/24/25About 1 min
docker安装firecrawl并使用
安装
- 将代码拉取到本地,仓库地址:https://github.com/mendableai/firecrawl
/firecrawl/apps/api目录下的配置拷贝一份cp .env.example .env- 返回根目录,运行
docker-compose up -d
API使用
- Scrape:Turn any url into clean data,这个用得最多
- Batch Scrape:Batch scrape multiple URLs,可以传多个url
- Crawl:Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl.这个是抓取页面及子页面,返回jobid,然后用get请求去查询这个job
- Map:Used to map a URL and get urls of the website. This returns most links present on the website.返回这个链接中的其他链接,返回页面的结构。
scrape
curl --request POST \
--url http://127.0.0.1:3002/v1/scrape \
--header 'content-type: application/json' \
--data '{
"url": "https://juejin.cn/post/7413964058788216869",
"headers": {
"Authorization": "Bearer your_access_token_here",
"X-Custom-Auth": "custom_auth_value"
}
}'没有请求头的可以不设置Authorization,支持的格式有Available options:
markdown,html,rawHtml,links,screenshot,screenshot@fullPage,json,changeTracking
Batch Scrape
curl --request POST \
--url https://127.0.0.1:3002/v1/batch/scrape \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"<string>"
]
}'Crawl
curl --request POST \
--url http://127.0.0.1:3002/v1/crawl \
--header 'content-type: application/json' \
--data '{
"url": "https://juejin.cn",
"limit": 10,
"scrapeOptions":
{
"format": ["markdown"]
}
}'Map
curl --request POST \
--url https://127.0.0.1:3002/v1/map \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"url": "<string>",
"search": "<string>",
"ignoreSitemap": true,
"sitemapOnly": false,
"includeSubdomains": false,
"limit": 5000,
"timeout": 123
}'结合MCP使用
- 安装Firecrawl mcpServers,地址:https://github.com/mendableai/firecrawl-mcp-server/tree/main
- 手动安装:
npm install -g firecrawl-mcp - cline中配置使用mcp
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE"
}
}
}
}注意:本地部署firecrawl的话,不用配置FIRECRAWL_API_KEY,需要FIRECRAWL_API_URL
