CatPtain commited on
Commit
a4a8647
·
verified ·
1 Parent(s): 0637b75

Upload 17 files

Browse files
Files changed (3) hide show
  1. README.md +146 -148
  2. hf-deploy/Dockerfile +6 -13
  3. hf-deploy/server.js +14 -35
README.md CHANGED
@@ -1,149 +1,147 @@
1
- ---
2
- title: Page Shot
3
- emoji: 📈
4
- colorFrom: pink
5
- colorTo: blue
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- ---
10
-
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
12
-
13
- # Page Screenshot API
14
-
15
- A web service that captures screenshots of web pages using Puppeteer.
16
-
17
- ## Features
18
- - Web page screenshot capture
19
- - Customizable dimensions (width/height)
20
- - Adjustable image quality
21
- - Rate limiting for API protection
22
- - CORS enabled for cross-origin requests
23
-
24
- ## API Usage
25
-
26
- ### POST /screenshot
27
-
28
- Capture a screenshot of a web page.
29
-
30
- **Request Body:**
31
- ```json
32
- {
33
- "url": "https://example.com",
34
- "width": 1920,
35
- "height": 1080,
36
- "quality": 80
37
- }
38
- ```
39
-
40
- **Parameters:**
41
- - `url` (required): The URL of the webpage to capture
42
- - `width` (optional): Screenshot width in pixels (default: 1920, range: 100-4000)
43
- - `height` (optional): Screenshot height in pixels (default: 1080, range: 100-4000)
44
- - `quality` (optional): JPEG quality (default: 80, range: 1-100)
45
-
46
- **Response:**
47
- Returns the screenshot as a JPEG image.
48
-
49
- ### GET /
50
-
51
- Health check endpoint that returns API status.
52
-
53
- ## Example Usage
54
-
55
- ```bash
56
- curl -X POST https://your-app.railway.app/screenshot \
57
- -H "Content-Type: application/json" \
58
- -d '{"url": "https://example.com", "width": 1280, "height": 720}' \
59
- --output screenshot.jpg
60
- ```
61
-
62
- ## Rate Limiting
63
-
64
- - 100 requests per 15 minutes per IP address
65
-
66
- ## Deployment
67
-
68
- This application can be deployed on various platforms:
69
- - Hugging Face Spaces (Docker)
70
- - Railway
71
- - Render.com
72
- - Vercel
73
-
74
- For detailed deployment instructions, see `DEPLOYMENT_GUIDE.md`.
75
-
76
- ## Railway部署指南
77
-
78
- ### 1. 准备部署
79
- 确保你的项目包含以下文件:
80
- - `Dockerfile` - 容器化配置
81
- - `railway.toml` - Railway部署配置
82
- - `package.json` - 依赖和启动脚本
83
-
84
- ### 2. 部署到Railway
85
- 有两种方式部署到Railway:
86
-
87
- #### 方式一:通过GitHub连接(推荐)
88
- 1. 将代码推送到GitHub仓库
89
- 2. 访问 [Railway.app](https://railway.app)
90
- 3. 登录并点击 "New Project"
91
- 4. 选择 "Deploy from GitHub repo"
92
- 5. 选择你的仓库
93
- 6. Railway会自动检测Dockerfile并开始部署
94
-
95
- #### 方式二:使用Railway CLI
96
- ```bash
97
- # 安装Railway CLI
98
- npm install -g @railway/cli
99
-
100
- # 登录Railway
101
- railway login
102
-
103
- # 初始化项目
104
- railway init
105
-
106
- # 部署
107
- railway up
108
- ```
109
-
110
- ### 3. 环境变量配置
111
- Railway控制台的Variables标签中添加:
112
- - `NODE_ENV=production`
113
- - `PORT` (Railway自动设置,无需手动配置)
114
-
115
- ### 4. 资源配置
116
- 推荐配置:
117
- - CPU: 1 vCPU
118
- - Memory: 1GB RAM
119
-
120
- 这些配置已在 `railway.toml` 中预设。
121
-
122
- ### 5. 自定义域名(可选)
123
- 在Railway控制台的Settings > Domains中可以:
124
- - 使用Railway提供的免费子域名
125
- - 绑定你自己的域名
126
-
127
- ### 6. 监控和日志
128
- - 在Railway控制台的Deployments标签查看部署状态
129
- - 在Metrics标签监控资源使用情况
130
- - 在Variables标签管理环境变量
131
-
132
- ### 故障排除
133
- 如果部署失败,检查:
134
- 1. Dockerfile语法是否正确
135
- 2. package.json中的start脚本是否正确
136
- 3. 依赖包是否都已安装
137
- 4. 内存使用是否超出限制
138
-
139
- ### 部署后测试
140
- ```bash
141
- # 健康检查
142
- curl https://your-app.railway.app/
143
-
144
- # 截图测试
145
- curl -X POST https://your-app.railway.app/screenshot \
146
- -H "Content-Type: application/json" \
147
- -d '{"url": "https://google.com"}' \
148
- --output test-screenshot.jpg
149
  ```
 
1
+ ---
2
+ title: Page Screenshot API
3
+ emoji: 📸
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ ---
10
+
11
+ # Page Screenshot API
12
+
13
+ A web service that captures screenshots of web pages using Puppeteer.
14
+
15
+ ## Features
16
+ - Web page screenshot capture
17
+ - Customizable dimensions (width/height)
18
+ - Adjustable image quality
19
+ - Rate limiting for API protection
20
+ - CORS enabled for cross-origin requests
21
+
22
+ ## API Usage
23
+
24
+ ### POST /screenshot
25
+
26
+ Capture a screenshot of a web page.
27
+
28
+ **Request Body:**
29
+ ```json
30
+ {
31
+ "url": "https://example.com",
32
+ "width": 1920,
33
+ "height": 1080,
34
+ "quality": 80
35
+ }
36
+ ```
37
+
38
+ **Parameters:**
39
+ - `url` (required): The URL of the webpage to capture
40
+ - `width` (optional): Screenshot width in pixels (default: 1920, range: 100-4000)
41
+ - `height` (optional): Screenshot height in pixels (default: 1080, range: 100-4000)
42
+ - `quality` (optional): JPEG quality (default: 80, range: 1-100)
43
+
44
+ **Response:**
45
+ Returns the screenshot as a JPEG image.
46
+
47
+ ### GET /
48
+
49
+ Health check endpoint that returns API status.
50
+
51
+ ## Example Usage
52
+
53
+ ```bash
54
+ curl -X POST https://your-app.railway.app/screenshot \
55
+ -H "Content-Type: application/json" \
56
+ -d '{"url": "https://example.com", "width": 1280, "height": 720}' \
57
+ --output screenshot.jpg
58
+ ```
59
+
60
+ ## Rate Limiting
61
+
62
+ - 100 requests per 15 minutes per IP address
63
+
64
+ ## Deployment
65
+
66
+ This application can be deployed on various platforms:
67
+ - Hugging Face Spaces (Docker)
68
+ - Railway
69
+ - Render.com
70
+ - Vercel
71
+
72
+ For detailed deployment instructions, see `DEPLOYMENT_GUIDE.md`.
73
+
74
+ ## Railway部署指南
75
+
76
+ ### 1. 准备部署
77
+ 确保你的项目包含以下文件:
78
+ - `Dockerfile` - 容器化配置
79
+ - `railway.toml` - Railway部署配置
80
+ - `package.json` - 依赖和启动脚本
81
+
82
+ ### 2. 部署到Railway
83
+ 有两种方式部署到Railway:
84
+
85
+ #### 方式一:通过GitHub连接(推荐)
86
+ 1. 将代码推送到GitHub仓库
87
+ 2. 访问 [Railway.app](https://railway.app)
88
+ 3. 登录并点击 "New Project"
89
+ 4. 选择 "Deploy from GitHub repo"
90
+ 5. 选择你的仓库
91
+ 6. Railway会自动检测Dockerfile并开始部署
92
+
93
+ #### 方式二:使用Railway CLI
94
+ ```bash
95
+ # 安装Railway CLI
96
+ npm install -g @railway/cli
97
+
98
+ # 登录Railway
99
+ railway login
100
+
101
+ # 初始化项目
102
+ railway init
103
+
104
+ # 部署
105
+ railway up
106
+ ```
107
+
108
+ ### 3. 环境变量配置
109
+ 在Railway控制台的Variables标签中添加:
110
+ - `NODE_ENV=production`
111
+ - `PORT` (Railway自动设置,无需手动配置)
112
+
113
+ ### 4. 资源配置
114
+ 推荐配置:
115
+ - CPU: 1 vCPU
116
+ - Memory: 1GB RAM
117
+
118
+ 这些配置已在 `railway.toml` 中预设。
119
+
120
+ ### 5. 自定义域名(可选)
121
+ 在Railway控制台的Settings > Domains中可以:
122
+ - 使用Railway提供的免费子域名
123
+ - 绑定你自己的域名
124
+
125
+ ### 6. 监控和日志
126
+ - 在Railway控制台的Deployments标签查看部署状态
127
+ - 在Metrics标签监控资源使用情况
128
+ - 在Variables标签管理环境变量
129
+
130
+ ### 故障排除
131
+ 如果部署失败,检查:
132
+ 1. Dockerfile语法是否正确
133
+ 2. package.json中的start脚本是否正确
134
+ 3. 依赖包是否都已安装
135
+ 4. 内存使用是否超出限制
136
+
137
+ ### 部署后测试
138
+ ```bash
139
+ # 健康检查
140
+ curl https://your-app.railway.app/
141
+
142
+ # 截图测试
143
+ curl -X POST https://your-app.railway.app/screenshot \
144
+ -H "Content-Type: application/json" \
145
+ -d '{"url": "https://google.com"}' \
146
+ --output test-screenshot.jpg
 
 
147
  ```
hf-deploy/Dockerfile CHANGED
@@ -1,22 +1,16 @@
1
- # 轻量级 HF Spaces Dockerfile
2
  FROM ghcr.io/puppeteer/puppeteer:21.5.2
3
 
4
- # 切换到 root 用户进行安装
5
- USER root
6
-
7
- # 安装额外的字体和依赖
8
- RUN apt-get update && apt-get install -y \
9
- fonts-liberation \
10
- fonts-dejavu-core \
11
- && rm -rf /var/lib/apt/lists/*
12
-
13
- # 设置工作目录
14
  WORKDIR /usr/src/app
15
 
16
  # 复制 package 文件
17
  COPY package*.json ./
18
 
19
- # 安装依赖
 
 
 
20
  RUN npm ci --only=production && npm cache clean --force
21
 
22
  # 复制应用代码
@@ -28,7 +22,6 @@ USER pptruser
28
  # 设置环境变量
29
  ENV NODE_ENV=production
30
  ENV PORT=7860
31
- ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable
32
 
33
  # 暴露端口
34
  EXPOSE 7860
 
1
+ # 最简 HF Spaces Dockerfile - 修复版
2
  FROM ghcr.io/puppeteer/puppeteer:21.5.2
3
 
4
+ # 直接设置工作目录
 
 
 
 
 
 
 
 
 
5
  WORKDIR /usr/src/app
6
 
7
  # 复制 package 文件
8
  COPY package*.json ./
9
 
10
+ # 切换到 root 用户安装依赖
11
+ USER root
12
+
13
+ # 安装 Node.js 依赖
14
  RUN npm ci --only=production && npm cache clean --force
15
 
16
  # 复制应用代码
 
22
  # 设置环境变量
23
  ENV NODE_ENV=production
24
  ENV PORT=7860
 
25
 
26
  # 暴露端口
27
  EXPOSE 7860
hf-deploy/server.js CHANGED
@@ -9,15 +9,15 @@ const PORT = process.env.PORT || 7860;
9
 
10
  // 中间件配置 - HF Spaces 优化
11
  app.use(helmet({
12
- contentSecurityPolicy: false // HF Spaces 需要
13
  }));
14
  app.use(cors());
15
  app.use(express.json({ limit: '10mb' }));
16
 
17
- // 速率限制 - HF Spaces 调整
18
  const limiter = rateLimit({
19
  windowMs: 15 * 60 * 1000,
20
- max: 30, // 进一步降低限制
21
  message: {
22
  error: 'Too many requests, please try again later.'
23
  }
@@ -39,11 +39,10 @@ app.get('/', (req, res) => {
39
  });
40
  });
41
 
42
- // 截图API端点 - 增强错误处理
43
  app.post('/screenshot', async (req, res) => {
44
  const { url, width = 1280, height = 720, quality = 75 } = req.body;
45
 
46
- // 参数验证
47
  if (!url) {
48
  return res.status(400).json({
49
  error: 'URL is required',
@@ -51,10 +50,8 @@ app.post('/screenshot', async (req, res) => {
51
  });
52
  }
53
 
54
- // URL格式验证
55
  try {
56
  const urlObj = new URL(url);
57
- // 检查协议
58
  if (!['http:', 'https:'].includes(urlObj.protocol)) {
59
  return res.status(400).json({
60
  error: 'Only HTTP and HTTPS URLs are supported'
@@ -66,7 +63,6 @@ app.post('/screenshot', async (req, res) => {
66
  });
67
  }
68
 
69
- // 分辨率验证 - HF Spaces 更严格限制
70
  if (width < 100 || width > 1600 || height < 100 || height > 1200) {
71
  return res.status(400).json({
72
  error: 'Width must be 100-1600px, height must be 100-1200px for HF Spaces'
@@ -75,8 +71,10 @@ app.post('/screenshot', async (req, res) => {
75
 
76
  let browser;
77
  try {
78
- // 启动浏览器 - HF Spaces 专用配置
79
- const browserOptions = {
 
 
80
  headless: 'new',
81
  args: [
82
  '--no-sandbox',
@@ -89,35 +87,21 @@ app.post('/screenshot', async (req, res) => {
89
  '--disable-extensions',
90
  '--disable-background-timer-throttling',
91
  '--disable-backgrounding-occluded-windows',
92
- '--disable-renderer-backgrounding',
93
- '--disable-features=TranslateUI',
94
- '--disable-default-apps',
95
- '--no-default-browser-check',
96
- '--disable-background-networking'
97
  ]
98
- };
99
-
100
- // 在 HF Spaces 中使用系统 Chrome
101
- if (process.env.PUPPETEER_EXECUTABLE_PATH) {
102
- browserOptions.executablePath = process.env.PUPPETEER_EXECUTABLE_PATH;
103
- }
104
-
105
- console.log('Launching browser...');
106
- browser = await puppeteer.launch(browserOptions);
107
 
108
  const page = await browser.newPage();
109
 
110
- // 设置视窗大小
111
  await page.setViewport({
112
  width: parseInt(width),
113
  height: parseInt(height),
114
  deviceScaleFactor: 1
115
  });
116
 
117
- // 设置用户代理和其他页面选项
118
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
119
 
120
- // 拦截不必要的资源以提高性能
121
  await page.setRequestInterception(true);
122
  page.on('request', (req) => {
123
  const resourceType = req.resourceType();
@@ -130,18 +114,15 @@ app.post('/screenshot', async (req, res) => {
130
 
131
  console.log(`Navigating to: ${url}`);
132
 
133
- // 访问页面 - HF Spaces 更短超时
134
  await page.goto(url, {
135
- waitUntil: 'domcontentloaded', // 更快的等待条件
136
- timeout: 15000 // 15秒超时
137
  });
138
 
139
- // 等待页面稳定
140
  await page.waitForTimeout(1000);
141
 
142
  console.log('Taking screenshot...');
143
 
144
- // 截图
145
  const screenshot = await page.screenshot({
146
  type: 'jpeg',
147
  quality: Math.max(10, Math.min(100, parseInt(quality))),
@@ -150,7 +131,6 @@ app.post('/screenshot', async (req, res) => {
150
 
151
  console.log(`Screenshot taken: ${screenshot.length} bytes`);
152
 
153
- // 设置响应头
154
  res.set({
155
  'Content-Type': 'image/jpeg',
156
  'Content-Length': screenshot.length,
@@ -167,7 +147,6 @@ app.post('/screenshot', async (req, res) => {
167
  message: error.message
168
  };
169
 
170
- // 根据错误类型提供更好的错误信息
171
  if (error.message.includes('timeout')) {
172
  errorResponse.suggestion = 'Try a simpler webpage or reduce timeout';
173
  } else if (error.message.includes('net::')) {
@@ -187,7 +166,7 @@ app.post('/screenshot', async (req, res) => {
187
  }
188
  });
189
 
190
- // HF Spaces 演示界面 - 改进版
191
  app.get('/demo', (req, res) => {
192
  res.send(`
193
  <!DOCTYPE html>
 
9
 
10
  // 中间件配置 - HF Spaces 优化
11
  app.use(helmet({
12
+ contentSecurityPolicy: false
13
  }));
14
  app.use(cors());
15
  app.use(express.json({ limit: '10mb' }));
16
 
17
+ // 速率限制
18
  const limiter = rateLimit({
19
  windowMs: 15 * 60 * 1000,
20
+ max: 30,
21
  message: {
22
  error: 'Too many requests, please try again later.'
23
  }
 
39
  });
40
  });
41
 
42
+ // 截图API端点
43
  app.post('/screenshot', async (req, res) => {
44
  const { url, width = 1280, height = 720, quality = 75 } = req.body;
45
 
 
46
  if (!url) {
47
  return res.status(400).json({
48
  error: 'URL is required',
 
50
  });
51
  }
52
 
 
53
  try {
54
  const urlObj = new URL(url);
 
55
  if (!['http:', 'https:'].includes(urlObj.protocol)) {
56
  return res.status(400).json({
57
  error: 'Only HTTP and HTTPS URLs are supported'
 
63
  });
64
  }
65
 
 
66
  if (width < 100 || width > 1600 || height < 100 || height > 1200) {
67
  return res.status(400).json({
68
  error: 'Width must be 100-1600px, height must be 100-1200px for HF Spaces'
 
71
 
72
  let browser;
73
  try {
74
+ console.log('Launching browser...');
75
+
76
+ // HF Spaces 优化配置 - 使用 Puppeteer 镜像的默认 Chrome
77
+ browser = await puppeteer.launch({
78
  headless: 'new',
79
  args: [
80
  '--no-sandbox',
 
87
  '--disable-extensions',
88
  '--disable-background-timer-throttling',
89
  '--disable-backgrounding-occluded-windows',
90
+ '--disable-renderer-backgrounding'
 
 
 
 
91
  ]
92
+ });
 
 
 
 
 
 
 
 
93
 
94
  const page = await browser.newPage();
95
 
 
96
  await page.setViewport({
97
  width: parseInt(width),
98
  height: parseInt(height),
99
  deviceScaleFactor: 1
100
  });
101
 
 
102
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
103
 
104
+ // 拦截资源以提高性能
105
  await page.setRequestInterception(true);
106
  page.on('request', (req) => {
107
  const resourceType = req.resourceType();
 
114
 
115
  console.log(`Navigating to: ${url}`);
116
 
 
117
  await page.goto(url, {
118
+ waitUntil: 'domcontentloaded',
119
+ timeout: 15000
120
  });
121
 
 
122
  await page.waitForTimeout(1000);
123
 
124
  console.log('Taking screenshot...');
125
 
 
126
  const screenshot = await page.screenshot({
127
  type: 'jpeg',
128
  quality: Math.max(10, Math.min(100, parseInt(quality))),
 
131
 
132
  console.log(`Screenshot taken: ${screenshot.length} bytes`);
133
 
 
134
  res.set({
135
  'Content-Type': 'image/jpeg',
136
  'Content-Length': screenshot.length,
 
147
  message: error.message
148
  };
149
 
 
150
  if (error.message.includes('timeout')) {
151
  errorResponse.suggestion = 'Try a simpler webpage or reduce timeout';
152
  } else if (error.message.includes('net::')) {
 
166
  }
167
  });
168
 
169
+ // HF Spaces 演示界面
170
  app.get('/demo', (req, res) => {
171
  res.send(`
172
  <!DOCTYPE html>