Spaces:
Running
Running
title: Callytics Demo | |
emoji: 🚀 | |
colorFrom: green | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.23.1 | |
app_file: app.py | |
pinned: false | |
license: gpl-3.0 | |
short_description: Callytics Demo | |
<div align="center"> | |
<img src=".docs/img/CallyticsIcon.png" alt="CallyticsLogo" width="200"> | |
 | |
 | |
 | |
 | |
[](https://linkedin.com/in/bunyaminergen) | |
# Callytics | |
`Callytics` is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) | |
technologies to analyze phone conversations from customer service and call centers. By processing both the | |
audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection, | |
profanity word detection and summary. These cutting-edge techniques help businesses optimize customer interactions, | |
identify areas for improvement, and enhance overall service quality. | |
When an audio file is placed in the `.data/input` directory, the entire pipeline automatically starts running, and the | |
resulting data is inserted into the database. | |
**Note**: _This is only a `v1.1.0` version; many new features will be added, models | |
will be fine-tuned or trained from scratch, and various optimization efforts will be applied. For more information, | |
you can check out the [Upcoming](#upcoming) section._ | |
**Note**: _If you would like to contribute to this repository, | |
please read the [CONTRIBUTING](.docs/documentation/CONTRIBUTING.md) first._ | |
</div> | |
--- | |
### Table of Contents | |
- [Prerequisites](#prerequisites) | |
- [Architecture](#architecture) | |
- [Math And Algorithm](#math-and-algorithm) | |
- [Features](#features) | |
- [Demo](#demo) | |
- [Installation](#installation) | |
- [File Structure](#file-structure) | |
- [Database Structure](#database-structure) | |
- [Datasets](#datasets) | |
- [Version Control System](#version-control-system) | |
- [Upcoming](#upcoming) | |
- [Documentations](#documentations) | |
- [License](#licence) | |
- [Links](#links) | |
- [Team](#team) | |
- [Contact](#contact) | |
- [Citation](#citation) | |
--- | |
### Prerequisites | |
##### General | |
- `Python 3.11` _(or above)_ | |
##### Llama | |
- `GPU (min 24GB)` _(or above)_ | |
- `Hugging Face Credentials (Account, Token)` | |
- `Llama-3.2-11B-Vision-Instruct` _(or above)_ | |
##### OpenAI | |
- `GPU (min 12GB)` _(for other process such as `faster whisper` & `NeMo`)_ | |
- At least one of the following is required: | |
- `OpenAI Credentials (Account, API Key)` | |
- `Azure OpenAI Credentials (Account, API Key, API Base URL)` | |
--- | |
### Architecture | |
 | |
--- | |
### Math and Algorithm | |
This section describes the mathematical models and algorithms used in the project. | |
_**Note**: The mathematical concepts and algorithms specific to this repository, rather than the models used, will be | |
provided in this section. Please refer to the `RESOURCES` under the [Documentations](#documentations) section for the | |
repositories and models utilized or referenced._ | |
##### Silence Duration Calculation | |
The silence durations are derived from the time intervals between speech segments: | |
$$S = \{s_1, s_2, \ldots, s_n\}$$ | |
represent _the set of silence durations (in seconds)_ between consecutive speech segments. | |
- **A user-defined factor**: | |
$$\text{factor} \in \mathbb{R}^{+}$$ | |
To determine a threshold that distinguishes _significant_ silence from trivial gaps, two statistical methods can be | |
applied: | |
**1. Standard Deviation-Based Threshold** | |
- _Mean_: | |
$$\mu = \frac{1}{n}\sum_{i=1}^{n}s_i$$ | |
- _Standard Deviation_: | |
$$ | |
\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(s_i - \mu)^2} | |
$$ | |
- _Threshold_: | |
$$ | |
T_{\text{std}} = \sigma \cdot \text{factor} | |
$$ | |
**2. Median + Interquartile Range (IQR) Threshold** | |
- _Median_: | |
_Let:_ | |
$$ S = \{s_{(1)} \leq s_{(2)} \leq \cdots \leq s_{(n)}\} $$ | |
be an ordered set. | |
_Then:_ | |
$$ | |
M = \text{median}(S) = | |
\begin{cases} | |
s_{\frac{n+1}{2}}, & \text{if } n \text{ is odd}, \\\\[6pt] | |
\frac{s_{\frac{n}{2}} + s_{\frac{n}{2}+1}}{2}, & \text{if } n \text{ is even}. | |
\end{cases} | |
$$ | |
- _Quartiles:_ | |
$$ | |
Q_1 = s_{(\lfloor 0.25n \rfloor)}, \quad Q_3 = s_{(\lfloor 0.75n \rfloor)} | |
$$ | |
- _IQR_: | |
$$ | |
\text{IQR} = Q_3 - Q_1 | |
$$ | |
- **Threshold:** | |
$$ | |
T_{\text{median\\_iqr}} = M + (\text{IQR} \times \text{factor}) | |
$$ | |
**Total Silence Above Threshold** | |
Once the threshold | |
$$T$$ | |
either | |
$$T_{\text{std}}$$ | |
or | |
$$T_{\text{median\\_iqr}}$$ | |
is defined, we sum only those silence durations that meet or exceed this threshold: | |
$$ | |
\text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) | |
$$ | |
where $$\mathbf{1}(s_i \geq T)$$ is an indicator function defined as: | |
$$ | |
\mathbf{1}(s_i \geq T) = | |
\begin{cases} | |
1 & \text{if } s_i \geq T \\ | |
0 & \text{otherwise} | |
\end{cases} | |
$$ | |
**Summary:** | |
- **Identify the silence durations:** | |
$$ | |
S = \{s_1, s_2, \ldots, s_n\} | |
$$ | |
- **Determine the threshold using either:** | |
_Standard deviation-based:_ | |
$$ | |
T = \sigma \cdot \text{factor} | |
$$ | |
_Median+IQR-based:_ | |
$$ | |
T = M + (\text{IQR} \cdot \text{factor}) | |
$$ | |
- **Compute the total silence above this threshold:** | |
$$ | |
\text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T) | |
$$ | |
--- | |
### Features | |
- [x] Speech Enhancement | |
- [x] Sentiment Analysis | |
- [x] Profanity Word Detection | |
- [x] Summary | |
- [x] Conflict Detection | |
- [x] Topic Detection | |
--- | |
### Demo | |
 | |
--- | |
### Installation | |
##### Linux/Ubuntu | |
```bash | |
sudo apt update -y && sudo apt upgrade -y | |
``` | |
```bash | |
sudo apt install ffmpeg -y | |
``` | |
```bash | |
sudo apt install -y ffmpeg build-essential g++ | |
``` | |
```bash | |
git clone https://github.com/bunyaminergen/Callytics | |
``` | |
```bash | |
cd Callytics | |
``` | |
```bash | |
conda env create -f environment.yaml | |
``` | |
```bash | |
conda activate Callytics | |
``` | |
##### Environment | |
`.env` file sample: | |
```Text | |
# CREDENTIALS | |
# OPENAI | |
OPENAI_API_KEY= | |
# HUGGINGFACE | |
HUGGINGFACE_TOKEN= | |
# AZURE OPENAI | |
AZURE_OPENAI_API_KEY= | |
AZURE_OPENAI_API_BASE= | |
AZURE_OPENAI_API_VERSION= | |
# DATABASE | |
DB_NAME= | |
DB_USER= | |
DB_PASSWORD= | |
DB_HOST= | |
DB_PORT= | |
DB_URL= | |
``` | |
--- | |
##### Database | |
_In this section, an `example database` and `tables` are provided. It is a `well-structured` and `simple design`. If you | |
create the tables | |
and columns in the same structure in your remote database, you will not encounter errors in the code. However, if you | |
want to change the database structure, you will also need to refactor the code._ | |
*Note*: __Refer to the [Database Structure](#database-structure) section for the database schema and tables.__ | |
```bash | |
sqlite3 .db/Callytics.sqlite < src/db/sql/Schema.sql | |
``` | |
##### Grafana | |
_In this section, it is explained how to install `Grafana` on your `local` environment. Since Grafana is a third-party | |
open-source monitoring application, you must handle its installation yourself and connect your database. Of course, you | |
can also use it with `Granafa Cloud` instead of `local` environment._ | |
```bash | |
sudo apt update -y && sudo apt upgrade -y | |
``` | |
```bash | |
sudo apt install -y apt-transport-https software-properties-common wget | |
``` | |
```bash | |
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - | |
``` | |
```bash | |
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list | |
``` | |
```bash | |
sudo apt install -y grafana | |
``` | |
```bash | |
sudo systemctl start grafana-server | |
sudo systemctl enable grafana-server | |
sudo systemctl daemon-reload | |
``` | |
```bash | |
http://localhost:3000 | |
``` | |
**SQLite Plugin** | |
```bash | |
sudo grafana-cli plugins install frser-sqlite-datasource | |
``` | |
```bash | |
sudo systemctl restart grafana-server | |
``` | |
```bash | |
sudo systemctl daemon-reload | |
``` | |
### File Structure | |
```Text | |
. | |
├── automation | |
│ └── service | |
│ └── callytics.service | |
├── config | |
│ ├── config.yaml | |
│ ├── nemo | |
│ │ └── diar_infer_telephonic.yaml | |
│ └── prompt.yaml | |
├── .data | |
│ ├── example | |
│ │ └── LogisticsCallCenterConversation.mp3 | |
│ └── input | |
├── .db | |
│ └── Callytics.sqlite | |
├── .docs | |
│ ├── documentation | |
│ │ ├── CONTRIBUTING.md | |
│ │ └── RESOURCES.md | |
│ └── img | |
│ ├── Callytics.drawio | |
│ ├── Callytics.gif | |
│ ├── CallyticsIcon.png | |
│ ├── Callytics.png | |
│ ├── Callytics.svg | |
│ └── database.png | |
├── .env | |
├── environment.yaml | |
├── .gitattributes | |
├── .github | |
│ └── CODEOWNERS | |
├── .gitignore | |
├── LICENSE | |
├── main.py | |
├── README.md | |
├── requirements.txt | |
└── src | |
├── audio | |
│ ├── alignment.py | |
│ ├── analysis.py | |
│ ├── effect.py | |
│ ├── error.py | |
│ ├── io.py | |
│ ├── metrics.py | |
│ ├── preprocessing.py | |
│ ├── processing.py | |
│ └── utils.py | |
├── db | |
│ ├── manager.py | |
│ └── sql | |
│ ├── AudioPropertiesInsert.sql | |
│ ├── Schema.sql | |
│ ├── TopicFetch.sql | |
│ ├── TopicInsert.sql | |
│ └── UtteranceInsert.sql | |
├── text | |
│ ├── llm.py | |
│ ├── model.py | |
│ ├── prompt.py | |
│ └── utils.py | |
└── utils | |
└── utils.py | |
19 directories, 43 files | |
``` | |
--- | |
### Database Structure | |
 | |
--- | |
### Datasets | |
- [Callytics Speaker Verification Dataset *(CSVD)*](.data/groundtruth/speakerverification/DatasetCard.md) | |
--- | |
### Version Control System | |
##### Releases | |
- [v1.0.0](https://github.com/bunyaminergen/Callytics/archive/refs/tags/v1.0.0.zip) _.zip_ | |
- [v1.0.0](https://github.com/bunyaminergen/Callytics/archive/refs/tags/v1.0.0.tar.gz) _.tar.gz_ | |
- [v1.1.0](https://github.com/bunyaminergen/Callytics/archive/refs/tags/v1.1.0.zip) _.zip_ | |
- [v1.1.0](https://github.com/bunyaminergen/Callytics/archive/refs/tags/v1.1.0.tar.gz) _.tar.gz_ | |
##### Branches | |
- [main](https://github.com/bunyaminergen/Callytics/tree/main) | |
- [develop](https://github.com/bunyaminergen/Callytics/tree/develop) | |
--- | |
### Upcoming | |
- [ ] **Speech Emotion Recognition:** Develop a model to automatically detect emotions from speech data. | |
- [ ] **New Forced Alignment Model:** Train a forced alignment model from scratch. | |
- [ ] **New Vocal Separation Model:** Train a vocal separation model from scratch. | |
- [ ] **Unit Tests:** Add a comprehensive unit testing script to validate functionality. | |
- [ ] **Logging Logic:** Implement a more comprehensive and structured logging mechanism. | |
- [ ] **Warnings:** Add meaningful and detailed warning messages for better user guidance. | |
- [ ] **Real-Time Analysis:** Enable real-time analysis capabilities within the system. | |
- [ ] **Dockerization:** Containerize the repository to ensure seamless deployment and environment consistency. | |
- [ ] **New Transcription Models:** Integrate and test new transcription models | |
suchas [AIOLA’s Multi-Head Speech Recognition Model](https://venturebeat.com/ai/aiola-drops-ultra-fast-multi-head-speech-recognition-model-beats-openai-whisper/). | |
- [ ] **Noise Reduction Model:** Identify, test, and integrate a deep learning-based noise reduction model. Consider | |
existing models like **Facebook Research Denoiser**, **Noise2Noise**, **Audio Denoiser CNN**. Write test scripts for | |
evaluation, and if necessary, train a new model for optimal performance. | |
##### Considerations | |
- [ ] Detect CSR's identity via Voice Recognition/Identification instead of Diarization and LLM. | |
- [ ] Transform the code structure into a pipeline for better modularity and scalability. | |
- [ ] Publish the repository as a Python package on **PyPI** for wider distribution. | |
- [ ] Convert the repository into a Linux package to support Linux-based systems. | |
- [ ] Implement a two-step processing workflow: perform **diarization** (speaker segmentation) first, then apply * | |
*transcription** for each identified speaker separately. This approach can improve transcription accuracy by | |
leveraging speaker separation. | |
- [ ] Enable **parallel processing** for tasks such as diarization, transcription, and model inference to improve | |
overall system performance and reduce processing time. | |
- [ ] Explore using **Docker Compose** for multi-container orchestration if required. | |
- [ ] Upload the models and relevant resources to **Hugging Face** for easier access, sharing, and community | |
collaboration. | |
- [ ] Consider writing a **Command Line Interface (CLI)** to simplify user interaction and improve usability. | |
- [ ] Test the ability to use **different language models (LLMs)** for specific tasks. For instance, using **BERT** for | |
profanity detection. Evaluate their performance and suitability for different use cases as a feature. | |
--- | |
### Documentations | |
- [RESOURCES](.docs/documentation/RESOURCES.md) | |
- [CONTRIBUTING](.docs/documentation/CONTRIBUTING.md) | |
- [PRESENTATION](.docs/presentation/CallyticsPresentationEN.pdf) | |
--- | |
### Licence | |
- [LICENSE](LICENSE) | |
--- | |
### Links | |
- [Github](https://github.com/bunyaminergen/Callytics) | |
- [Website](https://bunyaminergen.com) | |
- [Linkedin](https://www.linkedin.com/in/bunyaminergen) | |
--- | |
### Team | |
- [Bunyamin Ergen](https://www.linkedin.com/in/bunyaminergen) | |
--- | |
### Contact | |
- [Mail](mailto:[email protected]) | |
--- | |
### Citation | |
```bibtex | |
@software{ Callytics, | |
author = {Bunyamin Ergen}, | |
title = {{Callytics}}, | |
year = {2024}, | |
month = {12}, | |
url = {https://github.com/bunyaminergen/Callytics}, | |
version = {v1.1.0}, | |
} | |
``` | |
--- | |