shukdevdatta123 commited on
Commit
982639c
Β·
verified Β·
1 Parent(s): c01e49c

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +152 -0
app.py CHANGED
@@ -426,6 +426,158 @@ def create_interface():
426
  - E-commerce product pages
427
  - Financial data sites (Yahoo Finance, MarketWatch)
428
  - Research papers and academic sites
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
429
  """)
430
 
431
  # Event handlers
 
426
  - E-commerce product pages
427
  - Financial data sites (Yahoo Finance, MarketWatch)
428
  - Research papers and academic sites
429
+
430
+ ## πŸ§ͺ **Test Scenarios**
431
+
432
+ ### **1. News & Media Sites**
433
+ ```
434
+ URL: https://www.bbc.com/news
435
+ Query: Extract the top 5 news headlines with their summaries and create a table with columns: Headline, Category, Summary
436
+ ```
437
+
438
+ ```
439
+ URL: https://edition.cnn.com
440
+ Query: Find all breaking news items and organize them by topic/region in a structured format
441
+ ```
442
+
443
+ ### **2. Financial Data Sites**
444
+ ```
445
+ URL: https://finance.yahoo.com/quote/AAPL
446
+ Query: Extract Apple stock information including current price, daily change, market cap, and any financial metrics into a summary table
447
+ ```
448
+
449
+ ```
450
+ URL: https://www.marketwatch.com/investing/stock/tsla
451
+ Query: Create a table with Tesla's key financial metrics: price, change, volume, market cap, P/E ratio
452
+ ```
453
+
454
+ ### **3. E-commerce & Product Pages**
455
+ ```
456
+ URL: https://www.amazon.com/dp/B08N5WRWNW
457
+ Query: Extract product details including name, price, ratings, key features, and specifications in a structured format
458
+ ```
459
+
460
+ ```
461
+ URL: https://www.ebay.com/itm/123456789
462
+ Query: Extract item details, price, seller information, and shipping details into a comparison-ready table
463
+ ```
464
+
465
+ ### **4. Educational & Reference Sites**
466
+ ```
467
+ URL: https://en.wikipedia.org/wiki/Artificial_intelligence
468
+ Query: Extract the main definition, history timeline, and applications of AI. Create separate sections for each topic.
469
+ ```
470
+
471
+ ```
472
+ URL: https://en.wikipedia.org/wiki/List_of_countries_by_population
473
+ Query: Extract the population data table and create a new table showing top 10 most populous countries with their population and growth rate
474
+ ```
475
+
476
+ ### **5. Government & Official Statistics**
477
+ ```
478
+ URL: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports
479
+ Query: Extract the latest COVID-19 statistics and create a summary table with key global figures
480
+ ```
481
+
482
+ ```
483
+ URL: https://www.census.gov/quickfacts
484
+ Query: Extract key demographic statistics for the United States and organize them into categories: Population, Economy, Geography
485
+ ```
486
+
487
+ ### **6. Technology & Business News**
488
+ ```
489
+ URL: https://techcrunch.com
490
+ Query: Find the latest startup funding news and create a table with: Company Name, Funding Amount, Investors, Industry
491
+ ```
492
+
493
+ ```
494
+ URL: https://www.reuters.com/technology
495
+ Query: Extract top technology news and summarize each story in 2-3 sentences with key points
496
+ ```
497
+
498
+ ### **7. Scientific & Research Sites**
499
+ ```
500
+ URL: https://www.nature.com/articles
501
+ Query: Extract recent scientific article titles, authors, and abstracts. Create a summary table organized by research field
502
+ ```
503
+
504
+ ```
505
+ URL: https://pubmed.ncbi.nlm.nih.gov/trending
506
+ Query: Find trending medical research topics and create a list with brief descriptions of each study's findings
507
+ ```
508
+
509
+ ### **8. Sports & Entertainment**
510
+ ```
511
+ URL: https://www.espn.com/nba/standings
512
+ Query: Extract NBA team standings and create a table with: Team, Wins, Losses, Win Percentage, Conference Position
513
+ ```
514
+
515
+ ```
516
+ URL: https://www.imdb.com/chart/top
517
+ Query: Extract the top 10 movies from IMDb's top 250 list with ratings, year, and brief description
518
+ ```
519
+
520
+ ### **9. Weather & Environmental Data**
521
+ ```
522
+ URL: https://weather.com/weather/today
523
+ Query: Extract current weather conditions and forecast data. Create a summary with temperature, conditions, and weekly outlook
524
+ ```
525
+
526
+ ### **10. Real Estate & Property**
527
+ ```
528
+ URL: https://www.zillow.com/homes/for_sale
529
+ Query: Extract property listings with prices, locations, square footage, and key features into a comparison table
530
+ ```
531
+
532
+ ## 🎯 **Quick Test Samples (Copy & Paste Ready)**
533
+
534
+ ### **Simple Test:**
535
+ ```
536
+ URL: https://httpbin.org/html
537
+ Query: Extract all text content and identify the page structure
538
+ ```
539
+
540
+ ### **Table Extraction Test:**
541
+ ```
542
+ URL: https://www.w3schools.com/html/html_tables.asp
543
+ Query: Find all HTML tables on this page and convert them to a structured format with proper headers
544
+ ```
545
+
546
+ ### **Complex Analysis Test:**
547
+ ```
548
+ URL: https://www.sec.gov/edgar/browse/?CIK=320193
549
+ Query: Extract Apple Inc.'s recent SEC filings and create a table with: Filing Date, Document Type, Description
550
+ ```
551
+
552
+ ### **International Site Test:**
553
+ ```
554
+ URL: https://www.bbc.co.uk/weather
555
+ Query: Extract UK weather information and create a regional breakdown of current conditions
556
+ ```
557
+
558
+ ## πŸ” **Testing Tips:**
559
+
560
+ 1. **Start Simple**: Begin with basic sites like Wikipedia or news sites
561
+ 2. **Test Error Handling**: Try invalid URLs to see error messages
562
+ 3. **Check Timeouts**: Use slow-loading sites to test timeout handling
563
+ 4. **Verify Tables**: Test sites with different table structures
564
+ 5. **Content Variety**: Try different content types (news, data, products)
565
+
566
+ ## 🚨 **Sites That May Have Issues:**
567
+ - Social media sites (require login)
568
+ - Sites with heavy JavaScript (may have limited content)
569
+ - Sites with aggressive bot protection
570
+ - Password-protected pages
571
+
572
+ ## βœ… **Reliable Test Sites:**
573
+ - Wikipedia (excellent for tables and structured content)
574
+ - BBC News (good for text extraction)
575
+ - Government sites (.gov domains)
576
+ - W3Schools (great for HTML table testing)
577
+ - HttpBin (perfect for testing basic functionality)
578
+
579
+ Start with the simpler tests and gradually move to more complex scenarios to fully evaluate your tool's capabilities!
580
+
581
  """)
582
 
583
  # Event handlers