ghostai1 commited on
Commit
cf0a3bb
·
verified ·
1 Parent(s): 6ff6074

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md CHANGED
@@ -12,3 +12,114 @@ short_description: Cleans Data for Sagemaker/Azure Training
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+
17
+
18
+
19
+
20
+
21
+ Call Center Data Analysis
22
+
23
+ A powerful data analysis tool for call center logs, built on Hugging Face Spaces (free tier). This demo showcases after-the-fact analysis of call center data, including data cleaning, statistical visualization, and export options for downstream AI modeling in SageMaker or Azure AI. It reflects over 5 years of AI expertise, focusing on real-world challenges in junk data mitigation for enterprise CX workflows.
24
+
25
+ Features
26
+
27
+
28
+
29
+
30
+
31
+ Data Parsing and Cleaning: Processes large call center CSVs, removing nulls, duplicates, short entries, malformed queries, and invalid timestamps, ensuring data integrity.
32
+
33
+
34
+
35
+ Statistical Visualization: Generates plots for call duration distribution, satisfaction scores by agent, and query frequency by language using Matplotlib and Seaborn.
36
+
37
+
38
+
39
+ Export Options: Provides downloadable cleaned CSV for SageMaker/Azure AI modeling and a PDF report summarizing data quality and statistics.
40
+
41
+
42
+
43
+ Gradio-Powered Interface: A responsive, dark-themed UI for viewing raw data, cleanup stats, and visualizations, optimized for enterprise workflows.
44
+
45
+ Setup
46
+
47
+
48
+
49
+
50
+
51
+ Clone this repository to a Hugging Face Space (free tier, public visibility).
52
+
53
+
54
+
55
+ Upload your call_center_logs.csv to the Space.
56
+
57
+
58
+
59
+ Populate requirements.txt with the specified dependencies, ensuring compatibility with Python 3.9+ and CPU-only execution.
60
+
61
+
62
+
63
+ Deploy app.py and launch the Space with Gradio SDK.
64
+
65
+ Usage
66
+
67
+
68
+
69
+
70
+
71
+ Click the "Analyze Data" button to process the call center logs.
72
+
73
+
74
+
75
+ View the raw data (first 50 rows), cleanup statistics, and statistical plots.
76
+
77
+
78
+
79
+ Download the cleaned CSV (cleaned_call_center_logs.csv) for SageMaker/Azure AI modeling.
80
+
81
+
82
+
83
+ Download the PDF report (data_analysis_report.pdf) summarizing the analysis.
84
+
85
+ Technical Architecture
86
+
87
+
88
+
89
+
90
+
91
+ Core Stack:
92
+
93
+
94
+
95
+
96
+
97
+ Python 3.9+: Foundation for data processing and analysis.
98
+
99
+
100
+
101
+ Pandas: High-performance CSV parsing and data cleaning.
102
+
103
+
104
+
105
+ Matplotlib/Seaborn: Statistical visualization of call center metrics.
106
+
107
+
108
+
109
+ Gradio: Interactive UI for data analysis and export.
110
+
111
+
112
+
113
+ ReportLab/Pillow: PDF report generation with embedded plots.
114
+
115
+
116
+
117
+ Free Tier Optimization: Designed for CPU-only execution, minimizing memory footprint.
118
+
119
+
120
+
121
+ Extensibility: Cleaned CSV is structured for SageMaker (e.g., BERT-based intent classification) and Azure AI (e.g., custom ML models).
122
+
123
+ Purpose
124
+
125
+ This Space demonstrates proficiency in after-the-fact data analysis for call center environments, addressing junk data challenges and preparing data for AI modeling, aligning with enterprise CX needs.