Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
@@ -426,6 +426,158 @@ def create_interface():
|
|
426 |
- E-commerce product pages
|
427 |
- Financial data sites (Yahoo Finance, MarketWatch)
|
428 |
- Research papers and academic sites
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
429 |
""")
|
430 |
|
431 |
# Event handlers
|
|
|
426 |
- E-commerce product pages
|
427 |
- Financial data sites (Yahoo Finance, MarketWatch)
|
428 |
- Research papers and academic sites
|
429 |
+
|
430 |
+
## π§ͺ **Test Scenarios**
|
431 |
+
|
432 |
+
### **1. News & Media Sites**
|
433 |
+
```
|
434 |
+
URL: https://www.bbc.com/news
|
435 |
+
Query: Extract the top 5 news headlines with their summaries and create a table with columns: Headline, Category, Summary
|
436 |
+
```
|
437 |
+
|
438 |
+
```
|
439 |
+
URL: https://edition.cnn.com
|
440 |
+
Query: Find all breaking news items and organize them by topic/region in a structured format
|
441 |
+
```
|
442 |
+
|
443 |
+
### **2. Financial Data Sites**
|
444 |
+
```
|
445 |
+
URL: https://finance.yahoo.com/quote/AAPL
|
446 |
+
Query: Extract Apple stock information including current price, daily change, market cap, and any financial metrics into a summary table
|
447 |
+
```
|
448 |
+
|
449 |
+
```
|
450 |
+
URL: https://www.marketwatch.com/investing/stock/tsla
|
451 |
+
Query: Create a table with Tesla's key financial metrics: price, change, volume, market cap, P/E ratio
|
452 |
+
```
|
453 |
+
|
454 |
+
### **3. E-commerce & Product Pages**
|
455 |
+
```
|
456 |
+
URL: https://www.amazon.com/dp/B08N5WRWNW
|
457 |
+
Query: Extract product details including name, price, ratings, key features, and specifications in a structured format
|
458 |
+
```
|
459 |
+
|
460 |
+
```
|
461 |
+
URL: https://www.ebay.com/itm/123456789
|
462 |
+
Query: Extract item details, price, seller information, and shipping details into a comparison-ready table
|
463 |
+
```
|
464 |
+
|
465 |
+
### **4. Educational & Reference Sites**
|
466 |
+
```
|
467 |
+
URL: https://en.wikipedia.org/wiki/Artificial_intelligence
|
468 |
+
Query: Extract the main definition, history timeline, and applications of AI. Create separate sections for each topic.
|
469 |
+
```
|
470 |
+
|
471 |
+
```
|
472 |
+
URL: https://en.wikipedia.org/wiki/List_of_countries_by_population
|
473 |
+
Query: Extract the population data table and create a new table showing top 10 most populous countries with their population and growth rate
|
474 |
+
```
|
475 |
+
|
476 |
+
### **5. Government & Official Statistics**
|
477 |
+
```
|
478 |
+
URL: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports
|
479 |
+
Query: Extract the latest COVID-19 statistics and create a summary table with key global figures
|
480 |
+
```
|
481 |
+
|
482 |
+
```
|
483 |
+
URL: https://www.census.gov/quickfacts
|
484 |
+
Query: Extract key demographic statistics for the United States and organize them into categories: Population, Economy, Geography
|
485 |
+
```
|
486 |
+
|
487 |
+
### **6. Technology & Business News**
|
488 |
+
```
|
489 |
+
URL: https://techcrunch.com
|
490 |
+
Query: Find the latest startup funding news and create a table with: Company Name, Funding Amount, Investors, Industry
|
491 |
+
```
|
492 |
+
|
493 |
+
```
|
494 |
+
URL: https://www.reuters.com/technology
|
495 |
+
Query: Extract top technology news and summarize each story in 2-3 sentences with key points
|
496 |
+
```
|
497 |
+
|
498 |
+
### **7. Scientific & Research Sites**
|
499 |
+
```
|
500 |
+
URL: https://www.nature.com/articles
|
501 |
+
Query: Extract recent scientific article titles, authors, and abstracts. Create a summary table organized by research field
|
502 |
+
```
|
503 |
+
|
504 |
+
```
|
505 |
+
URL: https://pubmed.ncbi.nlm.nih.gov/trending
|
506 |
+
Query: Find trending medical research topics and create a list with brief descriptions of each study's findings
|
507 |
+
```
|
508 |
+
|
509 |
+
### **8. Sports & Entertainment**
|
510 |
+
```
|
511 |
+
URL: https://www.espn.com/nba/standings
|
512 |
+
Query: Extract NBA team standings and create a table with: Team, Wins, Losses, Win Percentage, Conference Position
|
513 |
+
```
|
514 |
+
|
515 |
+
```
|
516 |
+
URL: https://www.imdb.com/chart/top
|
517 |
+
Query: Extract the top 10 movies from IMDb's top 250 list with ratings, year, and brief description
|
518 |
+
```
|
519 |
+
|
520 |
+
### **9. Weather & Environmental Data**
|
521 |
+
```
|
522 |
+
URL: https://weather.com/weather/today
|
523 |
+
Query: Extract current weather conditions and forecast data. Create a summary with temperature, conditions, and weekly outlook
|
524 |
+
```
|
525 |
+
|
526 |
+
### **10. Real Estate & Property**
|
527 |
+
```
|
528 |
+
URL: https://www.zillow.com/homes/for_sale
|
529 |
+
Query: Extract property listings with prices, locations, square footage, and key features into a comparison table
|
530 |
+
```
|
531 |
+
|
532 |
+
## π― **Quick Test Samples (Copy & Paste Ready)**
|
533 |
+
|
534 |
+
### **Simple Test:**
|
535 |
+
```
|
536 |
+
URL: https://httpbin.org/html
|
537 |
+
Query: Extract all text content and identify the page structure
|
538 |
+
```
|
539 |
+
|
540 |
+
### **Table Extraction Test:**
|
541 |
+
```
|
542 |
+
URL: https://www.w3schools.com/html/html_tables.asp
|
543 |
+
Query: Find all HTML tables on this page and convert them to a structured format with proper headers
|
544 |
+
```
|
545 |
+
|
546 |
+
### **Complex Analysis Test:**
|
547 |
+
```
|
548 |
+
URL: https://www.sec.gov/edgar/browse/?CIK=320193
|
549 |
+
Query: Extract Apple Inc.'s recent SEC filings and create a table with: Filing Date, Document Type, Description
|
550 |
+
```
|
551 |
+
|
552 |
+
### **International Site Test:**
|
553 |
+
```
|
554 |
+
URL: https://www.bbc.co.uk/weather
|
555 |
+
Query: Extract UK weather information and create a regional breakdown of current conditions
|
556 |
+
```
|
557 |
+
|
558 |
+
## π **Testing Tips:**
|
559 |
+
|
560 |
+
1. **Start Simple**: Begin with basic sites like Wikipedia or news sites
|
561 |
+
2. **Test Error Handling**: Try invalid URLs to see error messages
|
562 |
+
3. **Check Timeouts**: Use slow-loading sites to test timeout handling
|
563 |
+
4. **Verify Tables**: Test sites with different table structures
|
564 |
+
5. **Content Variety**: Try different content types (news, data, products)
|
565 |
+
|
566 |
+
## π¨ **Sites That May Have Issues:**
|
567 |
+
- Social media sites (require login)
|
568 |
+
- Sites with heavy JavaScript (may have limited content)
|
569 |
+
- Sites with aggressive bot protection
|
570 |
+
- Password-protected pages
|
571 |
+
|
572 |
+
## β
**Reliable Test Sites:**
|
573 |
+
- Wikipedia (excellent for tables and structured content)
|
574 |
+
- BBC News (good for text extraction)
|
575 |
+
- Government sites (.gov domains)
|
576 |
+
- W3Schools (great for HTML table testing)
|
577 |
+
- HttpBin (perfect for testing basic functionality)
|
578 |
+
|
579 |
+
Start with the simpler tests and gradually move to more complex scenarios to fully evaluate your tool's capabilities!
|
580 |
+
|
581 |
""")
|
582 |
|
583 |
# Event handlers
|