bunyaminergen commited on
Commit
896caf6
·
1 Parent(s): 7d90704
.data/example/LogisticsCallCenterConversation.mp3 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b8d26ffeb161abdf165a1f4700b627962e5b64ddb921daaa79d6373437824f84
3
- size 551372
 
 
 
 
.data/example/noisy/LookOncetoHearTargetSpeechHearingwithNoisyExamples.mp3 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:7cc7bc2d31959e65d6856860038cb57cd5d8cba39380e24a25597afdf29ce813
3
- size 677902
 
 
 
 
.db/Callytics.sqlite DELETED
Binary file (53.2 kB)
 
.docs/documentation/CONTRIBUTING.md DELETED
@@ -1,606 +0,0 @@
1
- # Contributing to [Project]
2
-
3
- Thank you for your interest in contributing to this project! We’re excited to have you on board. This guide is designed
4
- to make the contribution process clear and efficient.
5
-
6
- ---
7
-
8
- ## Table of Contents
9
-
10
- 1. [How to Contribute?](#how-to-contribute)
11
- - [Reporting Issues](#reporting-issues)
12
- - [Suggesting Features](#suggesting-features)
13
- - [Coding Standards](#coding-standards)
14
- - [File Structure](#file-structure)
15
- - [Commit Message Guidelines](#commit-message-guidelines)
16
- - [Branches](#branches)
17
- - [File Naming Convention](#file-naming-convention)
18
- - [Versioning, Release, Tag](#versioning-release-tag)
19
-
20
- ---
21
-
22
- ## How to Contribute?
23
-
24
- ### Reporting Issues
25
-
26
- - If you find a bug, please open a new issue on the [GitHub Issues](https://github.com/[username]/[project-name]/issues)
27
- page.
28
- - Include the following details:
29
- - A clear and descriptive title.
30
- - Steps to reproduce the issue.
31
- - Expected vs. actual behavior.
32
- - Screenshots or error logs if applicable.
33
-
34
- ### Suggesting Features
35
-
36
- - Have a great idea for a new feature? Open a new **Issue** and describe your suggestion.
37
- - Explain how this feature will improve the project.
38
-
39
- ### Coding Standards
40
-
41
- ##### Import Order
42
-
43
- Follow the import order below to maintain consistency in the codebase:
44
-
45
- 1. **Standard library imports**
46
- 2. **Third-party imports**
47
- 3. **Local application/library specific imports**
48
-
49
- **Example:**
50
-
51
- ```python
52
- # Standard library imports
53
- import os
54
- import sys
55
-
56
- # Third-party imports
57
- import numpy as np
58
- import pandas as pd
59
-
60
- # Local imports
61
- from my_module import my_function
62
- ```
63
-
64
- > **INFO**
65
- >
66
- > *For more detail, please check*:
67
- > - [PEP 8 – Style Guide for Python Code | peps.python.org](https://peps.python.org/pep-0008/)
68
-
69
- ##### Docstring
70
-
71
- Use `NumPy` format for docstrings in both functions and classes:
72
-
73
- - Each function or class should have a docstring explaining its purpose, parameters, return values, and examples if
74
- applicable.
75
- - For classes, include a class-level docstring that describes the overall purpose of the class, any parameters for
76
- `__init__`, and details on attributes and methods.
77
- - If you include references (e.g., research papers, algorithms, or external resources) in your docstring, create a
78
- separate `References` section clearly listing these sources at the end of your docstring.
79
-
80
- Example for a function:
81
-
82
- ```python
83
- # Standard library imports
84
- from typing import Annotated
85
-
86
-
87
- def example_function(
88
- param1: Annotated[int, "Description of param1"],
89
- param2: Annotated[str, "Description of param2"]
90
- ) -> Annotated[bool, "Description of the return value"]:
91
- """
92
- Brief description of what the function does.
93
-
94
- Parameters
95
- ----------
96
- param1 : int
97
- Description of param1.
98
- param2 : str
99
- Description of param2.
100
-
101
- Returns
102
- -------
103
- bool
104
- Description of the return value.
105
-
106
- Examples
107
- --------
108
- >>> example_function(5, 'hello')
109
- True
110
- >>> example_function(0, '')
111
- False
112
-
113
- References
114
- ----------
115
- * Doe, John. "A Study on Example Functions." Journal of Examples, 2021.
116
- """
117
- return bool(param1) and bool(param2)
118
- ```
119
-
120
- Example for a class:
121
-
122
- ```python
123
- class MyClass:
124
- """
125
- MyClass demonstrates the use of docstrings with a separate References section.
126
-
127
- This class provides an example of structuring docstrings, including attributes,
128
- methods, usage examples, and a references section when external sources are cited.
129
-
130
- Parameters
131
- ----------
132
- param1 : str
133
- Description of `param1`, explaining its purpose and specific constraints.
134
- param2 : int, optional
135
- Description of `param2`. Defaults to 0.
136
-
137
- Attributes
138
- ----------
139
- attribute1 : str
140
- Description of `attribute1`, explaining its purpose and possible values.
141
- attribute2 : int
142
- Description of `attribute2`, outlining constraints or expected values.
143
-
144
- Methods
145
- -------
146
- example_method(param1, param2=None)
147
- Example method description.
148
-
149
- Examples
150
- --------
151
- Create an instance and use methods:
152
-
153
- >>> my_instance = MyClass("example", 5)
154
- >>> my_instance.example_method("sample")
155
-
156
- References
157
- ----------
158
- * Smith, Jane. "Guide to Effective Python Documentation." Python Publishing, 2020.
159
- * Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-Excitation Networks."
160
- IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
161
- """
162
-
163
- def __init__(self, param1, param2=0):
164
- """
165
- Initializes the class.
166
-
167
- Parameters
168
- ----------
169
- param1 : str
170
- Explanation of `param1`.
171
- param2 : int, optional
172
- Explanation of `param2`. Defaults to 0.
173
- """
174
- self.attribute1 = param1
175
- self.attribute2 = param2
176
-
177
- @staticmethod
178
- def example_method(param1, param2=None):
179
- """
180
- Example method.
181
-
182
- Parameters
183
- ----------
184
- param1 : str
185
- Description of param1.
186
- param2 : int, optional
187
- Description of param2. Defaults to None.
188
-
189
- Returns
190
- -------
191
- bool
192
- Outcome of the method's action.
193
- """
194
- return bool(param1) and bool(param2)
195
- ```
196
-
197
- This structure ensures clear, consistent, and comprehensive documentation of classes and functions, including proper
198
- citation of external sources.
199
-
200
- ##### Type Annotation
201
-
202
- Add type annotations to all functions using `Annotated` with descriptions:
203
-
204
- Example:
205
-
206
- ```python
207
- # Standard library imports
208
- from typing import Annotated
209
-
210
-
211
- def calculate_area(
212
- radius: Annotated[float, "Radius of the circle"]
213
- ) -> Annotated[float, "Area of the circle"]:
214
- """
215
- Calculate the area of a circle given its radius.
216
-
217
- Parameters
218
- ----------
219
- radius : float
220
- Radius of the circle.
221
-
222
- Returns
223
- -------
224
- float
225
- Area of the circle.
226
-
227
- Examples
228
- --------
229
- >>> calculate_area(5)
230
- 78.53999999999999
231
- >>> calculate_area(0)
232
- 0.0
233
- """
234
- if not isinstance(radius, (int, float)):
235
- raise TypeError("Expected int or float for parameter 'radius'")
236
- if radius < 0:
237
- raise ValueError("Radius cannot be negative")
238
- return 3.1416 * radius ** 2
239
-
240
- ```
241
-
242
- ##### Type Check
243
-
244
- Add type check within functions to ensure the correctness of input parameters:
245
-
246
- Example:
247
-
248
- ```python
249
- # Standard library imports
250
- from typing import Annotated
251
-
252
- def add_numbers(
253
- a: Annotated[int, "First integer"],
254
- b: Annotated[int, "Second integer"]
255
- ) -> Annotated[int, "Sum of a and b"]:
256
- """
257
- Add two integers and return the result.
258
-
259
- Parameters
260
- ----------
261
- a : int
262
- First integer.
263
- b : int
264
- Second integer.
265
-
266
- Returns
267
- -------
268
- int
269
- The sum of `a` and `b`.
270
-
271
- Examples
272
- --------
273
- >>> add_numbers(2, 3)
274
- 5
275
- >>> add_numbers(-1, 5)
276
- 4
277
- """
278
- if not isinstance(a, int):
279
- raise TypeError("Expected int for parameter 'a'")
280
- if not isinstance(b, int):
281
- raise TypeError("Expected int for parameter 'b'")
282
- return a + b
283
- ```
284
-
285
- ##### Doctest
286
-
287
- Include doctest examples in docstrings using the `>>>` format:
288
-
289
- Example:
290
-
291
- ```python
292
- # Standard library imports
293
- from typing import Annotated
294
-
295
- def multiply(
296
- a: Annotated[int, "First integer"],
297
- b: Annotated[int, "Second integer"]
298
- ) -> Annotated[int, "Product of a and b"]:
299
- """
300
- Multiply two integers and return the result.
301
-
302
- Parameters
303
- ----------
304
- a : int
305
- First integer.
306
- b : int
307
- Second integer.
308
-
309
- Returns
310
- -------
311
- int
312
- The product of `a` and `b`.
313
-
314
- Examples
315
- --------
316
- >>> multiply(2, 3)
317
- 6
318
- >>> multiply(-1, 5)
319
- -5
320
-
321
- This is a doctest example.
322
- """
323
- if not isinstance(a, int):
324
- raise TypeError("Expected int for parameter 'a'")
325
- if not isinstance(b, int):
326
- raise TypeError("Expected int for parameter 'b'")
327
-
328
- return a * b
329
- ```
330
-
331
- > **INFO**
332
- >
333
- > *For more detail, please check*:
334
- > - [doctest — Test interactive Python examples — Python 3.13.1 documentation](https://docs.python.org/3/library/doctest.html)
335
-
336
- ##### Main Execution
337
-
338
- Add each file to `Name-Main` code script:
339
-
340
- Example:
341
-
342
- ```python
343
- # Standard library imports
344
- from typing import Annotated
345
-
346
- def example_function(
347
- x: Annotated[int, "An integer parameter"]
348
- ) -> Annotated[int, "The square of x"]:
349
- """
350
- Calculate the square of an integer.
351
-
352
- Parameters
353
- ----------
354
- x : int
355
- The integer to be squared.
356
-
357
- Returns
358
- -------
359
- int
360
- The square of `x`.
361
-
362
- Examples
363
- --------
364
- >>> example_function(5)
365
- 25
366
- >>> example_function(-3)
367
- 9
368
- """
369
- return x * x
370
-
371
- if __name__ == "__main__":
372
- value = 5
373
- result = example_function(value)
374
- print(f"The square of {value} is {result}.")
375
- ```
376
-
377
- ##### General
378
-
379
- - **Always Print Outputs to the Terminal**
380
- - Ensure that any significant results or status messages are displayed to the user via `print` statements.
381
- - Consider adding an optional parameter (e.g., `verbose=True`) that controls whether to print the outputs. This way,
382
- users can disable or enable printed outputs as needed.
383
-
384
- - **Reduce Code Complexity if It Does Not Disrupt the Flow**
385
- - Whenever possible, simplify or refactor functions, methods, and classes.
386
- - Clear, straightforward logic is easier to maintain and less error-prone.
387
-
388
- - **Keep Your Code Modular at All Times**
389
- - Break down larger tasks into smaller, reusable functions or modules.
390
- - Modular design improves readability, promotes code reuse, and simplifies testing and maintenance.
391
-
392
- - **Use Base Classes if Classes Become Too Complex**
393
- - If a class starts to grow unwieldy or complicated, consider extracting shared logic into a base (parent) class.
394
- - Child classes can inherit from this base class, reducing duplication and making the code more organized.
395
-
396
- ### File Structure
397
-
398
- Follow the [Default Project Template](https://github.com/bunyaminergen/DefaultProjectTemplate)'s File Structure
399
-
400
- - Adhere to the predetermined file hierarchy and naming conventions defined in the Default Project Template.
401
- - Review the existing layout in the repository to ensure your contributions align with the project’s organization.
402
-
403
- ### Commit Message Guidelines
404
-
405
- - **Keep It Short and Concise:**
406
- - The subject line (summary) should typically not exceed **50 characters**.
407
- - If more details are needed, include them in the body of the message on a separate line.
408
-
409
- - **Use Present Tense and Imperative Mood:**
410
- - **Start your commit message with only of the following verbs** and then explain what you did:
411
- - `Add`
412
- - `Fix`
413
- - `Remove` or `Delete`
414
- - `Update`
415
- - `Test`
416
- - `Refactor`
417
- - Messages should use the present tense and imperative mood.
418
- - **Examples:**
419
- - `Add user authentication`
420
- - `Fix bug in payment processing`
421
- - `Remove unused dependencies`
422
-
423
- - **Separate Subject and Details:**
424
- - The first line (subject) should be short and descriptive.
425
- - Leave a blank line between the subject and the detailed description (if needed).
426
- - **Example:**
427
-
428
- ```text
429
- Fix login issue
430
-
431
- Updated the authentication service to handle null values in the session token.
432
- ```
433
-
434
- - **Mistakes to Avoid:**
435
- - **Vague Messages:**
436
- - *Bad Example:* `Fix stuff`, `Update files`, `Work done`.
437
- - **Combining Multiple Changes in One Commit:**
438
- - Avoid bundling unrelated changes into a single commit.
439
- - **Copy-Paste Descriptions:**
440
- - Ensure that the commit message is directly relevant to the change.
441
-
442
- - **Benefits of Good Commit Messages:**
443
- - A well-written commit history makes the project easier to understand.
444
- - It simplifies debugging and troubleshooting.
445
- - It improves collaboration within the team by providing clear and meaningful information.
446
-
447
- ### Branches
448
-
449
- To maintain consistency across all branches, follow these guidelines:
450
-
451
- - **Start with one of the following action keywords in lowercase:**
452
- - `add`
453
- - `fix`
454
- - `remove` or `delete`
455
- - `update`
456
- - `test`
457
- - `refactor`
458
- - Use hyphens (`-`) to separate words in the branch name.
459
- - Avoid special characters, spaces, or uppercase letters.
460
- - Keep branch names concise but descriptive.
461
-
462
- **Example Branch Names:**
463
-
464
- - `add-new-release`
465
- - `fix-critical-bug`
466
- - `remove-unused-dependencies`
467
- - `update-api-endpoints`
468
- - `test-api-performance`
469
- - `refactor-code-structure`
470
-
471
- Please push all development work to the `develop` branch. Once the work on your branch is finished, merge it into
472
- `develop` and then delete the branch to keep the repository clean.
473
-
474
- **Important:** Please only create branches that begin with the prefixes listed below. If you would like to propose a new
475
- prefix, kindly open an issue on GitHub.
476
-
477
- ##### Bug Branches
478
-
479
- Use the `bugfix/` prefix for bug fixes discovered during development or testing.
480
- Examples:
481
-
482
- - `bugfix/fix-typo-in-readme`
483
- - `bugfix/null-pointer-exception`
484
-
485
- ##### Feature Branches
486
-
487
- Use the `feature/` prefix for new features or enhancements.
488
- Examples:
489
-
490
- - `feature/add-login`
491
- - `feature/update-dashboard`
492
- - `feature/fix-bug-123`
493
-
494
- ##### Hotfix Branches
495
-
496
- Use the `hotfix/` prefix for critical fixes that need immediate attention in production.
497
- Example:
498
-
499
- - `hotfix/fix-security-issue`
500
-
501
- ##### Docfix Branches
502
-
503
- Use the `docfix/` prefix for changes regarding documentation.
504
- Example:
505
-
506
- - `docfix/add-readme-to-artitecture-section`
507
-
508
- ##### Test Branches
509
-
510
- Use the `test/` prefix for branches that focus on writing or updating tests, or conducting specific test-related work.
511
- Examples:
512
-
513
- - `test/add-integration-tests`
514
- - `test/refactor-unit-tests`
515
- - `test/performance-testing`
516
-
517
- ##### Experiment Branches
518
-
519
- Use the `experiment/` prefix for experimental or proof-of-concept work.
520
- Example:
521
-
522
- - `experiment/improve-cache`
523
-
524
- ---
525
-
526
- ### File Naming Convention
527
-
528
- This section explains how packages and modules should be named in this project.
529
-
530
- ---
531
-
532
- ##### Package & Module Naming
533
-
534
- The following rules apply to **both** packages and modules. To simplify these guidelines, the term **"file"** will be
535
- used as a generic reference to either a package or a module name.
536
-
537
- - **Single or Concise Compound Words:**
538
- - Whenever possible, each file name should consist of **a single word** or **a short concatenation of words** (
539
- written together).
540
- - Use **lowercase letters** only, and **do not** use underscores (`_`).
541
- *(Although PEP 8 allows snake_case for modules, we do not prefer it in this project.)*
542
- - If more than one word is necessary, write them **together** (e.g., `datautils`, `webparser`).
543
-
544
- - **Consistent Single-Word Usage:**
545
- - Especially for package names, aim to keep them to **one word** whenever possible. If multiple words are necessary,
546
- the package name should remain **short and clear**.
547
-
548
- - **Parent (Package) and Sub-Component (Module) Logic:**
549
- - Use broader, more general names for packages (the “parent”).
550
- - For modules (the “child”), use more specific names within the package to reflect their functionality.
551
-
552
- - **Examples:**
553
- - **Packages**
554
- - `utils`
555
- - `model`
556
- - **Modules**
557
- - `gridsearch.py`
558
- - `convolution.py`
559
- - **Parent–Child (Directory Structure) Example**
560
- - `src/model/backbone.py`
561
- - `src/utils/log/manager.py`
562
-
563
- - **Bad Examples:**
564
- - **Packages**
565
- - `data_reader` *(underscores are discouraged)*
566
- - **Modules**
567
- - `grid_search.py` *(underscores are discouraged)*
568
- - **Parent–Child (Directory Structure) Example**
569
- - `src/train/training.py` *(names are too similar or redundant)*
570
-
571
- ---
572
-
573
- ##### Test Files
574
-
575
- 1. **Using the `test_` Prefix:**
576
- - For test files, prepend `test_` to the **module** name being tested.
577
- - **Example:** If the module name is `dataprocess.py`, then the test file should be named `test_dataprocess.py`.
578
-
579
- > **INFO**
580
- > *For more details, please check:*
581
- > - [PEP 8 – Style Guide for Python Code | peps.python.org](https://peps.python.org/pep-0008/#:~:text=Modules%20should%20have%20short%2C%20all,use%20of%20underscores%20is%20discouraged)
582
- > - [PEP 423 – Naming conventions and recipes related to packaging | peps.python.org](https://peps.python.org/pep-0423/#follow-pep-8-for-syntax-of-package-and-module-names)
583
-
584
- ---
585
-
586
- ### Versioning, Release, Tag
587
-
588
- - When numbering versions, releases, or tags, **only use prime numbers** (e.g., 3, 5, 7, 11, 13...).
589
- - **Do not** use 2 (even though it is prime) or any non-prime numbers (4, 6, 8, 9, 10...).
590
-
591
- #### Examples
592
-
593
- **Good Examples** (using only prime numbers, excluding 2):
594
-
595
- - **Tag**: `v3`
596
- - **Release**: `v3.5.7`
597
- - **Version**: `5.7.11`
598
-
599
- **Bad Examples**:
600
-
601
- - **Tag**: `v2` *(2 is prime but disallowed in this project)*
602
- - **Release**: `v1.2.3` *(1 is not prime, 2 is disallowed, 3 is prime but the rest are invalid)*
603
- - **Version**: `4.6.8` *(4, 6, 8 are not prime)*
604
-
605
- ---
606
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.docs/documentation/RESOURCES.md DELETED
@@ -1,119 +0,0 @@
1
- # Resources
2
-
3
- ---
4
-
5
- ## Github
6
-
7
- - [NeMo](https://github.com/NVIDIA/NeMo)
8
- - [Llama](https://github.com/facebookresearch/llama)
9
- - [Demucs](https://github.com/facebookresearch/demucs)
10
- - [Whisper](https://github.com/openai/whisper)
11
- - [Whisper NeMo Diarization](https://github.com/MahmoudAshraf97/whisper-diarization)
12
- - [Text to speech alignment using CTC forced alignment](https://github.com/MahmoudAshraf97/ctc-forced-aligner)
13
- - [Utilities intended for use with Llama models.](https://github.com/meta-llama/llama-models/)
14
- - [Llama Recipes: Examples to get started using the Llama models from Meta](https://github.com/meta-llama/llama-recipes)
15
- - [timsainb/noisereduce: Noise reduction in python using spectral gating](https://github.com/timsainb/noisereduce/)
16
- - [pyannote/pyannote-audio: Neural building blocks for speaker diarization](https://github.com/pyannote/pyannote-audio)
17
- - [microsoft/DNS-Challenge: This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.](https://github.com/microsoft/DNS-Challenge)
18
- - [WenzheLiu-Speech/awesome-speech-enhancement: speech enhancement\speech seperation\sound source localization](https://github.com/WenzheLiu-Speech/awesome-speech-enhancement)
19
- - [nanahou/Awesome-Speech-Enhancement: A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.](https://github.com/nanahou/Awesome-Speech-Enhancement)
20
- - [jonashaag/speech-enhancement: Collection of papers, datasets and tools on the topic of Speech Dereverberation and Speech Enhancement](https://github.com/jonashaag/speech-enhancement)
21
- - [yxlu-0102/MP-SENet: Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://github.com/yxlu-0102/MP-SENet)
22
- - [Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://yxlu-0102.github.io/MP-SENet/)
23
- - [## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models](https://datashare.ed.ac.uk/handle/10283/1942)
24
-
25
- ---
26
-
27
- ## Web
28
-
29
- - [Llama](https://www.llama.com/)
30
- - [Download Llama](https://www.llama.com/llama-downloads/)
31
- - [Llama 3.2 Requirements](https://llamaimodel.com/requirements-3-2/)
32
- - [Average handle time (AHT): Formula and tips for improvement](https://www.zendesk.com/blog/average-handle-time/)
33
-
34
- ---
35
-
36
- ## Notebooks
37
-
38
- - [Hybrid Demucs Music Source Separation](https://colab.research.google.com/drive/1dC9nVxk3V_VPjUADsnFu8EiT-xnU1tGH)
39
-
40
- ---
41
-
42
- ## PyPI
43
-
44
- - [demucs](https://pypi.org/project/demucs/)
45
- - [MPSENet](https://pypi.org/project/MPSENet/)
46
-
47
- ---
48
-
49
- ## Errors
50
-
51
- - [`The file is already fully retrieved; nothing to do.`](https://github.com/facebookresearch/llama/issues/760)
52
-
53
- ---
54
-
55
- ## Paper
56
-
57
- - [Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation](https://arxiv.org/abs/2007.13975)
58
- - [MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra](https://arxiv.org/abs/2305.13686)
59
- - [FINALLY: fast and universal speech enhancement with studio-like quality](https://arxiv.org/abs/2410.05920)
60
- - [Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://arxiv.org/abs/2308.08926)
61
- - [\[2410.08235\] A Recurrent Neural Network Approach to the Answering Machine Detection Problem](https://arxiv.org/abs/2410.08235)
62
-
63
- ---
64
-
65
- ## Youtube
66
-
67
- - [A Course on Speech Enhancement](https://www.youtube.com/playlist?list=PLO9nFIQB53_DU8o0fToNdNFdZuDxD9fAN)
68
- - [COMS 4995 Final on Speech Enhancement](https://www.youtube.com/watch?v=uRwlSh1FMzc&t=74s)
69
- - [Achieving Studio-Quality Speech with Generative AI](https://www.youtube.com/watch?v=UxbEjpLMU8s)
70
- - [How to Fix Bad Podcast Audio](https://www.youtube.com/watch?v=0mPkPQNHsZc)
71
- - [Speech Enhancement for Cochlear Implant Recipients Using Deep Complex Convolution Transformer With F](https://www.youtube.com/watch?v=i1qTgjMtS2Y)
72
- - [Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors](https://www.youtube.com/watch?v=4jiQdotz6qY)
73
- - [2024 종합설계 3팀 2차, Neural Network for Speech Enhancement](https://www.youtube.com/watch?v=yOfTYuc9FEQ)
74
- - [MIAI Deeptails Seminar : Generative Models as Data-driven Priors for Speech Enhancement](https://www.youtube.com/watch?v=XSLgUsgyzUA)
75
- - [Hardware Efficient Speech Enhancement With Noise Aware Multi Target Deep Learning](https://www.youtube.com/watch?v=qO6JqDUQlsI)
76
- - [Diffusion Models for Speech Enhancement | Julius Richter](https://www.youtube.com/watch?v=HMrs6YWDl5M)
77
- - [Speech Enhancement: Basics & Key Details](https://www.youtube.com/watch?v=5kItH2pq_3E)
78
- - [Guided Speech Enhancement Network (ICASSP 2023)](https://www.youtube.com/watch?v=JoDqXkAjlh4)
79
- - [VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention](https://www.youtube.com/watch?v=GP39vFA2E48)
80
- - [Research intern talk: Unified speech enhancement approach for speech degradation & noise suppression](https://www.youtube.com/watch?v=_ggfv6eMIJs)
81
- - [Magnitude and phase spectrum with example](https://www.youtube.com/watch?v=MFOjUgafq0k)
82
- - [Deep Learning In Audio for Absolute Beginners: From No Experience & No Datasets to a Deployed Model](https://www.youtube.com/watch?v=sqrah49GUkI)
83
- - [Look Once to Hear: Target Speech Hearing with Noisy Examples](https://www.youtube.com/watch?v=V-XCfnjfQmM)
84
-
85
- ---
86
-
87
- ## Wikipedia
88
-
89
- - [Speech enhancement](https://en.m.wikipedia.org/wiki/Speech_enhancement)
90
-
91
- ---
92
-
93
- ## Hugging Face
94
-
95
- - [Models(asteroid)](https://huggingface.co/models?library=asteroid)
96
- - [cankeles/DPTNet_WHAMR_enhsingle_16k](https://huggingface.co/cankeles/DPTNet_WHAMR_enhsingle_16k)
97
- - [JacobLinCool/MP-SENet-VB](https://huggingface.co/JacobLinCool/MP-SENet-VB)
98
- - [JacobLinCool/MP-SENet-DNS](https://huggingface.co/JacobLinCool/MP-SENet-DNS)
99
- - [ENOT-AutoDL/MP-SENet](https://huggingface.co/ENOT-AutoDL/MP-SENet)
100
-
101
- ---
102
-
103
- ## Web
104
-
105
- - [Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation](https://paperswithcode.com/paper/dual-path-transformer-network-direct-context-1)
106
- - [The Audio Developer Conference - ADC is an annual event celebrating all audio development technologies, from music applications and game audio to audio processing and embedded systems.](https://audio.dev/)
107
- - [Look Once to Hear: Target Speech Hearing with Noisy Examples - CHI '24](https://programs.sigchi.org/chi/2024/program/content/147319)
108
- - [Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition > Introduction | Class Central Classroom](https://www.classcentral.com/classroom/youtube-reinforcement-learning-based-speech-enhancement-for-robust-speech-recognition-131999)
109
- - [Sound classification with YAMNet TensorFlow Hub](https://www.tensorflow.org/hub/tutorials/yamnet)
110
- - [DEEP-VOICE: DeepFake Voice Recognition Dataset | Papers With Code](https://paperswithcode.com/dataset/deep-voice-deepfake-voice-recognition)
111
-
112
- ---
113
-
114
- ## Dataset
115
-
116
- - [VoiceBank+DEMAND](https://datashare.ed.ac.uk/handle/10283/1942)
117
- - [VoiceBank+DEMAND](https://drive.google.com/drive/folders/19I_thf6F396y5gZxLTxYIojZXC0Ywm8l)
118
-
119
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.docs/img/Callytics.drawio DELETED
@@ -1,164 +0,0 @@
1
- <mxfile host="Electron" agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/25.0.1 Chrome/128.0.6613.186 Electron/32.2.6 Safari/537.36" version="25.0.1">
2
- <diagram name="Page-1" id="mQKUGW6_SND0Kw_IrtOi">
3
- <mxGraphModel dx="1829" dy="896" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="0" pageScale="1" pageWidth="1600" pageHeight="900" math="0" shadow="0">
4
- <root>
5
- <mxCell id="0" />
6
- <mxCell id="1" parent="0" />
7
- <mxCell id="5rAtEOWpySoy4L2wUreW-1" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;flowAnimation=1;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-2" target="5rAtEOWpySoy4L2wUreW-3">
8
- <mxGeometry relative="1" as="geometry">
9
- <Array as="points">
10
- <mxPoint x="70" y="453" />
11
- </Array>
12
- </mxGeometry>
13
- </mxCell>
14
- <mxCell id="5rAtEOWpySoy4L2wUreW-2" value="&lt;i&gt;&lt;b&gt;Input&lt;br&gt;(Audio File)&lt;/b&gt;&lt;/i&gt;" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=default;strokeWidth=3;" vertex="1" parent="1">
15
- <mxGeometry x="20" y="230" width="105" height="105" as="geometry" />
16
- </mxCell>
17
- <mxCell id="5rAtEOWpySoy4L2wUreW-3" value="" style="shape=cylinder3;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;size=15;rotation=90;strokeWidth=3;fillColor=default;" vertex="1" parent="1">
18
- <mxGeometry x="857.33" y="-321.46" width="141.5" height="1578.78" as="geometry" />
19
- </mxCell>
20
- <mxCell id="5rAtEOWpySoy4L2wUreW-4" style="rounded=0;orthogonalLoop=1;jettySize=auto;html=1;flowAnimation=1;endArrow=none;endFill=0;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-5" target="5rAtEOWpySoy4L2wUreW-8">
21
- <mxGeometry relative="1" as="geometry" />
22
- </mxCell>
23
- <mxCell id="5rAtEOWpySoy4L2wUreW-5" value="&lt;div&gt;&lt;b&gt;Dialogue&amp;nbsp;&lt;br&gt;Detection&lt;/b&gt;&lt;/div&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
24
- <mxGeometry x="160" y="422.5" width="100" height="60" as="geometry" />
25
- </mxCell>
26
- <mxCell id="5rAtEOWpySoy4L2wUreW-6" value="&lt;font style=&quot;font-size: 9px;&quot;&gt;&lt;i&gt;Pyannote&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
27
- <mxGeometry x="176" y="482.94" width="68" height="37" as="geometry" />
28
- </mxCell>
29
- <mxCell id="5rAtEOWpySoy4L2wUreW-7" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;endArrow=none;endFill=0;flowAnimation=1;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-8" target="5rAtEOWpySoy4L2wUreW-10">
30
- <mxGeometry relative="1" as="geometry" />
31
- </mxCell>
32
- <mxCell id="5rAtEOWpySoy4L2wUreW-8" value="&lt;b&gt;Speech Enhancement&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
33
- <mxGeometry x="270" y="422.94" width="100" height="60" as="geometry" />
34
- </mxCell>
35
- <mxCell id="5rAtEOWpySoy4L2wUreW-9" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;MP-SENet&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
36
- <mxGeometry x="286" y="482.94" width="68" height="37" as="geometry" />
37
- </mxCell>
38
- <mxCell id="5rAtEOWpySoy4L2wUreW-10" value="&lt;b&gt;Vocal&lt;/b&gt;&lt;div&gt;&lt;b&gt;Seperation&lt;/b&gt;&lt;/div&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
39
- <mxGeometry x="380" y="422.5" width="100" height="60" as="geometry" />
40
- </mxCell>
41
- <mxCell id="5rAtEOWpySoy4L2wUreW-11" value="&lt;font style=&quot;font-size: 9px;&quot;&gt;&lt;i&gt;Demucs&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
42
- <mxGeometry x="396" y="482.5" width="68" height="37" as="geometry" />
43
- </mxCell>
44
- <mxCell id="5rAtEOWpySoy4L2wUreW-12" value="&lt;b&gt;Transcription&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
45
- <mxGeometry x="494" y="422.94" width="100" height="60" as="geometry" />
46
- </mxCell>
47
- <mxCell id="5rAtEOWpySoy4L2wUreW-13" value="&lt;font style=&quot;font-size: 9px;&quot;&gt;&lt;i&gt;Faster Whisper&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
48
- <mxGeometry x="510" y="482.94" width="68" height="37" as="geometry" />
49
- </mxCell>
50
- <mxCell id="5rAtEOWpySoy4L2wUreW-14" value="&lt;b&gt;Forced Alignment&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
51
- <mxGeometry x="610" y="422.5" width="100" height="60" as="geometry" />
52
- </mxCell>
53
- <mxCell id="5rAtEOWpySoy4L2wUreW-15" value="&lt;div style=&quot;&quot;&gt;&lt;i style=&quot;font-size: 9px; background-color: initial; line-height: 100%;&quot;&gt;ctc forced aligner&lt;/i&gt;&lt;/div&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;align=center;" vertex="1" parent="1">
54
- <mxGeometry x="626" y="482.5" width="68" height="37" as="geometry" />
55
- </mxCell>
56
- <mxCell id="5rAtEOWpySoy4L2wUreW-16" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;flowAnimation=1;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-12" target="5rAtEOWpySoy4L2wUreW-14">
57
- <mxGeometry relative="1" as="geometry" />
58
- </mxCell>
59
- <mxCell id="5rAtEOWpySoy4L2wUreW-17" value="&lt;b&gt;Diarization&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
60
- <mxGeometry x="730" y="422.5" width="100" height="60" as="geometry" />
61
- </mxCell>
62
- <mxCell id="5rAtEOWpySoy4L2wUreW-18" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;Nvidia Nemo&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
63
- <mxGeometry x="746" y="482.5" width="68" height="37" as="geometry" />
64
- </mxCell>
65
- <mxCell id="5rAtEOWpySoy4L2wUreW-19" value="&lt;b&gt;Speaker Role&lt;br&gt;Classification&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
66
- <mxGeometry x="850" y="422.94" width="100" height="60" as="geometry" />
67
- </mxCell>
68
- <mxCell id="5rAtEOWpySoy4L2wUreW-20" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;LLM&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
69
- <mxGeometry x="866" y="482.94" width="68" height="37" as="geometry" />
70
- </mxCell>
71
- <mxCell id="5rAtEOWpySoy4L2wUreW-21" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;flowAnimation=1;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-17" target="5rAtEOWpySoy4L2wUreW-19">
72
- <mxGeometry relative="1" as="geometry" />
73
- </mxCell>
74
- <mxCell id="5rAtEOWpySoy4L2wUreW-22" value="&lt;b&gt;Sentiment&lt;/b&gt;&lt;div&gt;&lt;b&gt;Analysis&lt;/b&gt;&lt;/div&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
75
- <mxGeometry x="970" y="422.94" width="100" height="60" as="geometry" />
76
- </mxCell>
77
- <mxCell id="5rAtEOWpySoy4L2wUreW-23" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;LLM&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
78
- <mxGeometry x="986" y="485.44" width="68" height="37" as="geometry" />
79
- </mxCell>
80
- <mxCell id="5rAtEOWpySoy4L2wUreW-24" value="&lt;b&gt;Profanity Word Detection&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
81
- <mxGeometry x="1090" y="422.5" width="100" height="60" as="geometry" />
82
- </mxCell>
83
- <mxCell id="5rAtEOWpySoy4L2wUreW-25" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;LLM&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
84
- <mxGeometry x="1106" y="482.5" width="68" height="37" as="geometry" />
85
- </mxCell>
86
- <mxCell id="5rAtEOWpySoy4L2wUreW-26" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;flowAnimation=1;endArrow=none;endFill=0;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-22" target="5rAtEOWpySoy4L2wUreW-24">
87
- <mxGeometry relative="1" as="geometry" />
88
- </mxCell>
89
- <mxCell id="5rAtEOWpySoy4L2wUreW-27" value="&lt;b&gt;Summary&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
90
- <mxGeometry x="1213" y="422.5" width="100" height="60" as="geometry" />
91
- </mxCell>
92
- <mxCell id="5rAtEOWpySoy4L2wUreW-28" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;LLM&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
93
- <mxGeometry x="1229" y="482.5" width="68" height="37" as="geometry" />
94
- </mxCell>
95
- <mxCell id="5rAtEOWpySoy4L2wUreW-29" value="&lt;b&gt;Conflict Detection&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
96
- <mxGeometry x="1330" y="422.5" width="100" height="60" as="geometry" />
97
- </mxCell>
98
- <mxCell id="5rAtEOWpySoy4L2wUreW-30" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;LLM&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
99
- <mxGeometry x="1346" y="482.94" width="68" height="37" as="geometry" />
100
- </mxCell>
101
- <mxCell id="5rAtEOWpySoy4L2wUreW-31" value="&lt;b&gt;Topic Detection&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
102
- <mxGeometry x="1450" y="422.94" width="100" height="60" as="geometry" />
103
- </mxCell>
104
- <mxCell id="5rAtEOWpySoy4L2wUreW-32" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;LLM&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
105
- <mxGeometry x="1466" y="482.94" width="68" height="37" as="geometry" />
106
- </mxCell>
107
- <mxCell id="5rAtEOWpySoy4L2wUreW-33" value="&lt;b&gt;Storage&lt;/b&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.stored_data;whiteSpace=wrap;spacingLeft=8;spacingRight=4;spacingBottom=3;direction=west;" vertex="1" parent="1">
108
- <mxGeometry x="1570" y="422.5" width="100" height="60" as="geometry" />
109
- </mxCell>
110
- <mxCell id="5rAtEOWpySoy4L2wUreW-34" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;Database&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
111
- <mxGeometry x="1586" y="485.44" width="68" height="37" as="geometry" />
112
- </mxCell>
113
- <mxCell id="5rAtEOWpySoy4L2wUreW-35" value="&lt;b&gt;Monitoring&amp;nbsp;&lt;br&gt;&lt;/b&gt;&lt;div&gt;&lt;b&gt;&amp;amp;&amp;nbsp;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Alarms&lt;/b&gt;&lt;/div&gt;" style="verticalLabelPosition=middle;verticalAlign=middle;html=1;shape=mxgraph.basic.three_corner_round_rect;dx=6;whiteSpace=wrap;labelPosition=center;align=center;" vertex="1" parent="1">
114
- <mxGeometry x="1760" y="422.5" width="140" height="90" as="geometry" />
115
- </mxCell>
116
- <mxCell id="5rAtEOWpySoy4L2wUreW-36" value="&lt;font size=&quot;1&quot;&gt;&lt;i&gt;Grafana&lt;/i&gt;&lt;/font&gt;" style="strokeWidth=2;html=1;shape=mxgraph.flowchart.document2;whiteSpace=wrap;size=0.25;" vertex="1" parent="1">
117
- <mxGeometry x="1796" y="512.94" width="68" height="37" as="geometry" />
118
- </mxCell>
119
- <mxCell id="5rAtEOWpySoy4L2wUreW-37" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=classic;endFill=1;flowAnimation=1;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-3" target="5rAtEOWpySoy4L2wUreW-35">
120
- <mxGeometry relative="1" as="geometry" />
121
- </mxCell>
122
- <mxCell id="5rAtEOWpySoy4L2wUreW-38" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;flowAnimation=1;endArrow=none;endFill=0;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-10" target="5rAtEOWpySoy4L2wUreW-12">
123
- <mxGeometry relative="1" as="geometry" />
124
- </mxCell>
125
- <mxCell id="5rAtEOWpySoy4L2wUreW-39" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;flowAnimation=1;exitX=0;exitY=0.5;exitDx=0;exitDy=0;exitPerimeter=0;" edge="1" parent="1" source="5rAtEOWpySoy4L2wUreW-14" target="5rAtEOWpySoy4L2wUreW-17">
126
- <mxGeometry relative="1" as="geometry">
127
- <mxPoint x="710" y="452.41" as="sourcePoint" />
128
- <mxPoint x="733" y="452.41" as="targetPoint" />
129
- </mxGeometry>
130
- </mxCell>
131
- <mxCell id="5rAtEOWpySoy4L2wUreW-40" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;flowAnimation=1;" edge="1" parent="1">
132
- <mxGeometry relative="1" as="geometry">
133
- <mxPoint x="950" y="452.4" as="sourcePoint" />
134
- <mxPoint x="977" y="452.4" as="targetPoint" />
135
- </mxGeometry>
136
- </mxCell>
137
- <mxCell id="5rAtEOWpySoy4L2wUreW-41" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;flowAnimation=1;endArrow=none;endFill=0;" edge="1" parent="1">
138
- <mxGeometry relative="1" as="geometry">
139
- <mxPoint x="1190" y="452.87" as="sourcePoint" />
140
- <mxPoint x="1217" y="452.87" as="targetPoint" />
141
- </mxGeometry>
142
- </mxCell>
143
- <mxCell id="5rAtEOWpySoy4L2wUreW-42" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;flowAnimation=1;" edge="1" parent="1">
144
- <mxGeometry relative="1" as="geometry">
145
- <mxPoint x="1430" y="452.84" as="sourcePoint" />
146
- <mxPoint x="1454" y="452.84" as="targetPoint" />
147
- </mxGeometry>
148
- </mxCell>
149
- <mxCell id="5rAtEOWpySoy4L2wUreW-43" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;flowAnimation=1;" edge="1" parent="1">
150
- <mxGeometry relative="1" as="geometry">
151
- <mxPoint x="1313" y="452.45" as="sourcePoint" />
152
- <mxPoint x="1337" y="452.45" as="targetPoint" />
153
- </mxGeometry>
154
- </mxCell>
155
- <mxCell id="5rAtEOWpySoy4L2wUreW-44" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.93;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;flowAnimation=1;" edge="1" parent="1">
156
- <mxGeometry relative="1" as="geometry">
157
- <mxPoint x="1550" y="452.45" as="sourcePoint" />
158
- <mxPoint x="1574" y="452.45" as="targetPoint" />
159
- </mxGeometry>
160
- </mxCell>
161
- </root>
162
- </mxGraphModel>
163
- </diagram>
164
- </mxfile>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.docs/img/Callytics.gif DELETED

Git LFS Details

  • SHA256: cccf7f416c745e58bb1106187e90157f0fb4c0b3307750a961ba92a9f2dae8b4
  • Pointer size: 131 Bytes
  • Size of remote file: 286 kB
.docs/img/Callytics.png DELETED

Git LFS Details

  • SHA256: 56954553615e7211d696f267621af9572e2e11eace167010f52df19872c49c6a
  • Pointer size: 130 Bytes
  • Size of remote file: 86.3 kB
.docs/img/Callytics.svg DELETED
.docs/img/CallyticsIcon.png DELETED

Git LFS Details

  • SHA256: f57d0f56528a07cb956c463335f2ed88db1e0dc5f309c64bb1452559cba92c3b
  • Pointer size: 131 Bytes
  • Size of remote file: 307 kB
.docs/img/callyticsDemo.gif DELETED

Git LFS Details

  • SHA256: 589b4b3cae9a70f71b7bd1e1be1b28eb8fe35ef5bb7e0801f3c596bef08d6e0d
  • Pointer size: 132 Bytes
  • Size of remote file: 6.1 MB
.docs/img/database.png DELETED

Git LFS Details

  • SHA256: 9adcab2d82324a8cdc246bf4179587f4b3f4a795822e224ec426b1db3db4eb80
  • Pointer size: 131 Bytes
  • Size of remote file: 183 kB
.docs/presentation/CallyticsPresentationEN.pdf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:eeeb65cf46a72a930f855b561b403f6e9cd42b5fb178f8aa7fa0fa97c3d42afd
3
- size 16425335
 
 
 
 
.gitattributes CHANGED
@@ -33,9 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- # CallyticsDemo
37
- *.pdf filter=lfs diff=lfs merge=lfs -text
38
- *.mp3 filter=lfs diff=lfs merge=lfs -text
39
- *.wav filter=lfs diff=lfs merge=lfs -text
40
- *.png filter=lfs diff=lfs merge=lfs -text
41
- *.gif filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
.github/CODEOWNERS DELETED
@@ -1 +0,0 @@
1
- * @bunyaminergen
 
 
.gitignore CHANGED
@@ -4,5 +4,5 @@
4
  .agile
5
  .data/input
6
 
7
- # Extension
8
  *.env
 
4
  .agile
5
  .data/input
6
 
7
+ # File
8
  *.env
README.md DELETED
@@ -1,552 +0,0 @@
1
- ---
2
- title: Callytics Demo
3
- emoji: 🚀
4
- colorFrom: green
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.23.1
8
- app_file: app.py
9
- pinned: false
10
- license: gpl-3.0
11
- short_description: Callytics Demo
12
- ---
13
-
14
- <div align="center">
15
- <img src=".docs/img/CallyticsIcon.png" alt="CallyticsLogo" width="200">
16
-
17
- ![License](https://img.shields.io/github/license/bunyaminergen/Callytics)
18
- ![GitHub release (latest by date)](https://img.shields.io/github/v/release/bunyaminergen/Callytics)
19
- ![GitHub Discussions](https://img.shields.io/github/discussions/bunyaminergen/Callytics)
20
- ![GitHub Issues](https://img.shields.io/github/issues/bunyaminergen/Callytics)
21
-
22
- [![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-blue?logo=linkedin)](https://linkedin.com/in/bunyaminergen)
23
-
24
- # Callytics
25
-
26
- `Callytics` is an advanced call analytics solution that leverages speech recognition and large language models (LLMs)
27
- technologies to analyze phone conversations from customer service and call centers. By processing both the
28
- audio and text of each call, it provides insights such as sentiment analysis, topic detection, conflict detection,
29
- profanity word detection and summary. These cutting-edge techniques help businesses optimize customer interactions,
30
- identify areas for improvement, and enhance overall service quality.
31
-
32
- When an audio file is placed in the `.data/input` directory, the entire pipeline automatically starts running, and the
33
- resulting data is inserted into the database.
34
-
35
- **Note**: _This is only a `v1.1.0` version; many new features will be added, models
36
- will be fine-tuned or trained from scratch, and various optimization efforts will be applied. For more information,
37
- you can check out the [Upcoming](#upcoming) section._
38
-
39
- **Note**: _If you would like to contribute to this repository,
40
- please read the [CONTRIBUTING](.docs/documentation/CONTRIBUTING.md) first._
41
-
42
- </div>
43
-
44
- ---
45
-
46
- ### Table of Contents
47
-
48
- - [Prerequisites](#prerequisites)
49
- - [Architecture](#architecture)
50
- - [Math And Algorithm](#math-and-algorithm)
51
- - [Features](#features)
52
- - [Demo](#demo)
53
- - [Installation](#installation)
54
- - [File Structure](#file-structure)
55
- - [Database Structure](#database-structure)
56
- - [Version Control System](#version-control-system)
57
- - [Upcoming](#upcoming)
58
- - [Documentations](#documentations)
59
- - [License](#licence)
60
- - [Links](#links)
61
- - [Team](#team)
62
- - [Contact](#contact)
63
- - [Citation](#citation)
64
-
65
- ---
66
-
67
- ### Prerequisites
68
-
69
- ##### General
70
-
71
- - `Python 3.11` _(or above)_
72
-
73
- ##### Llama
74
-
75
- - `GPU (min 24GB)` _(or above)_
76
- - `Hugging Face Credentials (Account, Token)`
77
- - `Llama-3.2-11B-Vision-Instruct` _(or above)_
78
-
79
- ##### OpenAI
80
-
81
- - `GPU (min 12GB)` _(for other process such as `faster whisper` & `NeMo`)_
82
- - At least one of the following is required:
83
- - `OpenAI Credentials (Account, API Key)`
84
- - `Azure OpenAI Credentials (Account, API Key, API Base URL)`
85
-
86
- ---
87
-
88
- ### Architecture
89
-
90
- ![Architecture](.docs/img/Callytics.gif)
91
-
92
- ---
93
-
94
- ### Math and Algorithm
95
-
96
- This section describes the mathematical models and algorithms used in the project.
97
-
98
- _**Note**: The mathematical concepts and algorithms specific to this repository, rather than the models used, will be
99
- provided in this section. Please refer to the `RESOURCES` under the [Documentations](#documentations) section for the
100
- repositories and models utilized or referenced._
101
-
102
- ##### Silence Duration Calculation
103
-
104
- The silence durations are derived from the time intervals between speech segments:
105
-
106
- $$S = \{s_1, s_2, \ldots, s_n\}$$
107
-
108
- represent _the set of silence durations (in seconds)_ between consecutive speech segments.
109
-
110
- - **A user-defined factor**:
111
-
112
- $$\text{factor} \in \mathbb{R}^{+}$$
113
-
114
- To determine a threshold that distinguishes _significant_ silence from trivial gaps, two statistical methods can be
115
- applied:
116
-
117
- **1. Standard Deviation-Based Threshold**
118
-
119
- - _Mean_:
120
-
121
- $$\mu = \frac{1}{n}\sum_{i=1}^{n}s_i$$
122
-
123
- - _Standard Deviation_:
124
-
125
- $$
126
- \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(s_i - \mu)^2}
127
- $$
128
-
129
- - _Threshold_:
130
-
131
- $$
132
- T_{\text{std}} = \sigma \cdot \text{factor}
133
- $$
134
-
135
- **2. Median + Interquartile Range (IQR) Threshold**
136
-
137
- - _Median_:
138
-
139
- _Let:_
140
-
141
- $$ S = \{s_{(1)} \leq s_{(2)} \leq \cdots \leq s_{(n)}\} $$
142
-
143
- be an ordered set.
144
-
145
- _Then:_
146
-
147
- $$
148
- M = \text{median}(S) =
149
- \begin{cases}
150
- s_{\frac{n+1}{2}}, & \text{if } n \text{ is odd}, \\\\[6pt]
151
- \frac{s_{\frac{n}{2}} + s_{\frac{n}{2}+1}}{2}, & \text{if } n \text{ is even}.
152
- \end{cases}
153
- $$
154
-
155
- - _Quartiles:_
156
-
157
- $$
158
- Q_1 = s_{(\lfloor 0.25n \rfloor)}, \quad Q_3 = s_{(\lfloor 0.75n \rfloor)}
159
- $$
160
-
161
- - _IQR_:
162
-
163
- $$
164
- \text{IQR} = Q_3 - Q_1
165
- $$
166
-
167
- - **Threshold:**
168
-
169
- $$
170
- T_{\text{median\\_iqr}} = M + (\text{IQR} \times \text{factor})
171
- $$
172
-
173
- **Total Silence Above Threshold**
174
-
175
- Once the threshold
176
-
177
- $$T$$
178
-
179
- either
180
-
181
- $$T_{\text{std}}$$
182
-
183
- or
184
-
185
- $$T_{\text{median\\_iqr}}$$
186
-
187
- is defined, we sum only those silence durations that meet or exceed this threshold:
188
-
189
- $$
190
- \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T)
191
- $$
192
-
193
- where $$\mathbf{1}(s_i \geq T)$$ is an indicator function defined as:
194
-
195
- $$
196
- \mathbf{1}(s_i \geq T) =
197
- \begin{cases}
198
- 1 & \text{if } s_i \geq T \\
199
- 0 & \text{otherwise}
200
- \end{cases}
201
- $$
202
-
203
- **Summary:**
204
-
205
- - **Identify the silence durations:**
206
-
207
- $$
208
- S = \{s_1, s_2, \ldots, s_n\}
209
- $$
210
-
211
- - **Determine the threshold using either:**
212
-
213
- _Standard deviation-based:_
214
-
215
- $$
216
- T = \sigma \cdot \text{factor}
217
- $$
218
-
219
- _Median+IQR-based:_
220
-
221
- $$
222
- T = M + (\text{IQR} \cdot \text{factor})
223
- $$
224
-
225
- - **Compute the total silence above this threshold:**
226
-
227
- $$
228
- \text{TotalSilence} = \sum_{i=1}^{n} s_i \cdot \mathbf{1}(s_i \geq T)
229
- $$
230
-
231
- ---
232
-
233
- ### Features
234
-
235
- - [x] Speech Enhancement
236
- - [x] Sentiment Analysis
237
- - [x] Profanity Word Detection
238
- - [x] Summary
239
- - [x] Conflict Detection
240
- - [x] Topic Detection
241
-
242
- ---
243
-
244
- ### Demo
245
-
246
- ![callyticsDemo](.docs/img/callyticsDemo.gif)
247
-
248
- ---
249
-
250
- ### Installation
251
-
252
- ##### Linux/Ubuntu
253
-
254
- ```bash
255
- sudo apt update -y && sudo apt upgrade -y
256
- ```
257
-
258
- ```bash
259
- sudo apt install ffmpeg -y
260
- ```
261
-
262
- ```bash
263
- sudo apt install -y ffmpeg build-essential g++
264
- ```
265
-
266
- ```bash
267
- git clone https://github.com/bunyaminergen/Callytics
268
- ```
269
-
270
- ```bash
271
- cd Callytics
272
- ```
273
-
274
- ```bash
275
- conda env create -f environment.yaml
276
- ```
277
-
278
- ```bash
279
- conda activate Callytics
280
- ```
281
-
282
- ##### Environment
283
-
284
- `.env` file sample:
285
-
286
- ```Text
287
- # CREDENTIALS
288
- # OPENAI
289
- OPENAI_API_KEY=
290
-
291
- # HUGGINGFACE
292
- HUGGINGFACE_TOKEN=
293
-
294
- # AZURE OPENAI
295
- AZURE_OPENAI_API_KEY=
296
- AZURE_OPENAI_API_BASE=
297
- AZURE_OPENAI_API_VERSION=
298
-
299
- # DATABASE
300
- DB_NAME=
301
- DB_USER=
302
- DB_PASSWORD=
303
- DB_HOST=
304
- DB_PORT=
305
- DB_URL=
306
- ```
307
-
308
- ---
309
-
310
- ##### Database
311
-
312
- _In this section, an `example database` and `tables` are provided. It is a `well-structured` and `simple design`. If you
313
- create the tables
314
- and columns in the same structure in your remote database, you will not encounter errors in the code. However, if you
315
- want to change the database structure, you will also need to refactor the code._
316
-
317
- *Note*: __Refer to the [Database Structure](#database-structure) section for the database schema and tables.__
318
-
319
- ```bash
320
- sqlite3 .db/Callytics.sqlite < src/db/sql/Schema.sql
321
- ```
322
-
323
- ##### Grafana
324
-
325
- _In this section, it is explained how to install `Grafana` on your `local` environment. Since Grafana is a third-party
326
- open-source monitoring application, you must handle its installation yourself and connect your database. Of course, you
327
- can also use it with `Granafa Cloud` instead of `local` environment._
328
-
329
- ```bash
330
- sudo apt update -y && sudo apt upgrade -y
331
- ```
332
-
333
- ```bash
334
- sudo apt install -y apt-transport-https software-properties-common wget
335
- ```
336
-
337
- ```bash
338
- wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
339
- ```
340
-
341
- ```bash
342
- echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
343
- ```
344
-
345
- ```bash
346
- sudo apt install -y grafana
347
- ```
348
-
349
- ```bash
350
- sudo systemctl start grafana-server
351
- sudo systemctl enable grafana-server
352
- sudo systemctl daemon-reload
353
- ```
354
-
355
- ```bash
356
- http://localhost:3000
357
- ```
358
-
359
- **SQLite Plugin**
360
-
361
- ```bash
362
- sudo grafana-cli plugins install frser-sqlite-datasource
363
- ```
364
-
365
- ```bash
366
- sudo systemctl restart grafana-server
367
- ```
368
-
369
- ```bash
370
- sudo systemctl daemon-reload
371
- ```
372
-
373
- ### File Structure
374
-
375
- ```Text
376
- .
377
- ├── automation
378
- │ └── service
379
- │ └── callytics.service
380
- ├── config
381
- │ ├── config.yaml
382
- │ ├── nemo
383
- │ │ └── diar_infer_telephonic.yaml
384
- │ └── prompt.yaml
385
- ├── .data
386
- │ ├── example
387
- │ │ └── LogisticsCallCenterConversation.mp3
388
- │ └── input
389
- ├── .db
390
- │ └── Callytics.sqlite
391
- ├── .docs
392
- │ ├── documentation
393
- │ │ ├── CONTRIBUTING.md
394
- │ │ └── RESOURCES.md
395
- │ └── img
396
- │ ├── Callytics.drawio
397
- │ ├── Callytics.gif
398
- │ ├── CallyticsIcon.png
399
- │ ├── Callytics.png
400
- │ ├── Callytics.svg
401
- │ └── database.png
402
- ├── .env
403
- ├── environment.yaml
404
- ├── .gitattributes
405
- ├── .github
406
- │ └── CODEOWNERS
407
- ├── .gitignore
408
- ├── LICENSE
409
- ├── main.py
410
- ├── README.md
411
- ├── requirements.txt
412
- └── src
413
- ├── audio
414
- │ ├── alignment.py
415
- │ ├── analysis.py
416
- │ ├── effect.py
417
- │ ├── error.py
418
- │ ├── io.py
419
- │ ├── metrics.py
420
- │ ├── preprocessing.py
421
- │ ├── processing.py
422
- │ └── utils.py
423
- ├── db
424
- │ ├── manager.py
425
- │ └── sql
426
- │ ├── AudioPropertiesInsert.sql
427
- │ ├── Schema.sql
428
- │ ├── TopicFetch.sql
429
- │ ├── TopicInsert.sql
430
- │ └── UtteranceInsert.sql
431
- ├── text
432
- │ ├── llm.py
433
- │ ├─��� model.py
434
- │ ├── prompt.py
435
- │ └── utils.py
436
- └── utils
437
- └── utils.py
438
-
439
- 19 directories, 43 files
440
- ```
441
-
442
- ---
443
-
444
- ### Database Structure
445
-
446
- ![Database Diagram](.docs/img/database.png)
447
-
448
-
449
- ---
450
-
451
- ### Version Control System
452
-
453
- ##### Releases
454
-
455
- - [v1.0.0](https://github.com/bunyaminergen/Callytics/archive/refs/tags/v1.0.0.zip) _.zip_
456
- - [v1.0.0](https://github.com/bunyaminergen/Callytics/archive/refs/tags/v1.0.0.tar.gz) _.tar.gz_
457
-
458
-
459
- - [v1.1.0](https://github.com/bunyaminergen/Callytics/archive/refs/tags/v1.1.0.zip) _.zip_
460
- - [v1.1.0](https://github.com/bunyaminergen/Callytics/archive/refs/tags/v1.1.0.tar.gz) _.tar.gz_
461
-
462
- ##### Branches
463
-
464
- - [main](https://github.com/bunyaminergen/Callytics/tree/main)
465
- - [develop](https://github.com/bunyaminergen/Callytics/tree/develop)
466
-
467
- ---
468
-
469
- ### Upcoming
470
-
471
- - [ ] **Speech Emotion Recognition:** Develop a model to automatically detect emotions from speech data.
472
- - [ ] **New Forced Alignment Model:** Train a forced alignment model from scratch.
473
- - [ ] **New Vocal Separation Model:** Train a vocal separation model from scratch.
474
- - [ ] **Unit Tests:** Add a comprehensive unit testing script to validate functionality.
475
- - [ ] **Logging Logic:** Implement a more comprehensive and structured logging mechanism.
476
- - [ ] **Warnings:** Add meaningful and detailed warning messages for better user guidance.
477
- - [ ] **Real-Time Analysis:** Enable real-time analysis capabilities within the system.
478
- - [ ] **Dockerization:** Containerize the repository to ensure seamless deployment and environment consistency.
479
- - [ ] **New Transcription Models:** Integrate and test new transcription models
480
- suchas [AIOLA’s Multi-Head Speech Recognition Model](https://venturebeat.com/ai/aiola-drops-ultra-fast-multi-head-speech-recognition-model-beats-openai-whisper/).
481
- - [ ] **Noise Reduction Model:** Identify, test, and integrate a deep learning-based noise reduction model. Consider
482
- existing models like **Facebook Research Denoiser**, **Noise2Noise**, **Audio Denoiser CNN**. Write test scripts for
483
- evaluation, and if necessary, train a new model for optimal performance.
484
-
485
- ##### Considerations
486
-
487
- - [ ] Detect CSR's identity via Voice Recognition/Identification instead of Diarization and LLM.
488
- - [ ] Transform the code structure into a pipeline for better modularity and scalability.
489
- - [ ] Publish the repository as a Python package on **PyPI** for wider distribution.
490
- - [ ] Convert the repository into a Linux package to support Linux-based systems.
491
- - [ ] Implement a two-step processing workflow: perform **diarization** (speaker segmentation) first, then apply *
492
- *transcription** for each identified speaker separately. This approach can improve transcription accuracy by
493
- leveraging speaker separation.
494
- - [ ] Enable **parallel processing** for tasks such as diarization, transcription, and model inference to improve
495
- overall system performance and reduce processing time.
496
- - [ ] Explore using **Docker Compose** for multi-container orchestration if required.
497
- - [ ] Upload the models and relevant resources to **Hugging Face** for easier access, sharing, and community
498
- collaboration.
499
- - [ ] Consider writing a **Command Line Interface (CLI)** to simplify user interaction and improve usability.
500
- - [ ] Test the ability to use **different language models (LLMs)** for specific tasks. For instance, using **BERT** for
501
- profanity detection. Evaluate their performance and suitability for different use cases as a feature.
502
-
503
- ---
504
-
505
- ### Documentations
506
-
507
- - [RESOURCES](.docs/documentation/RESOURCES.md)
508
- - [CONTRIBUTING](.docs/documentation/CONTRIBUTING.md)
509
- - [PRESENTATION](.docs/presentation/CallyticsPresentationEN.pdf)
510
-
511
- ---
512
-
513
- ### Licence
514
-
515
- - [LICENSE](LICENSE)
516
-
517
- ---
518
-
519
- ### Links
520
-
521
- - [Github](https://github.com/bunyaminergen/Callytics)
522
- - [Website](https://bunyaminergen.com)
523
- - [Linkedin](https://www.linkedin.com/in/bunyaminergen)
524
-
525
- ---
526
-
527
- ### Team
528
-
529
- - [Bunyamin Ergen](https://www.linkedin.com/in/bunyaminergen)
530
-
531
- ---
532
-
533
- ### Contact
534
-
535
- - [Mail](mailto:[email protected])
536
-
537
- ---
538
-
539
- ### Citation
540
-
541
- ```bibtex
542
- @software{ Callytics,
543
- author = {Bunyamin Ergen},
544
- title = {{Callytics}},
545
- year = {2024},
546
- month = {12},
547
- url = {https://github.com/bunyaminergen/Callytics},
548
- version = {v1.1.0},
549
- }
550
- ```
551
-
552
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -21,7 +21,6 @@ from src.audio.processing import AudioProcessor, Transcriber, PunctuationRestore
21
  from src.text.utils import Annotator
22
  from src.text.llm import LLMOrchestrator, LLMResultHandler
23
  from src.utils.utils import Cleaner
24
- from src.db.manager import Database
25
 
26
 
27
  async def main(audio_file_path: str):
@@ -48,11 +47,6 @@ async def main(audio_file_path: str):
48
  srt_output_path = ".temp/output.srt"
49
  config_path = "config/config.yaml"
50
  prompt_path = "config/prompt.yaml"
51
- db_path = ".db/Callytics.sqlite"
52
- db_topic_fetch_path = "src/db/sql/TopicFetch.sql"
53
- db_topic_insert_path = "src/db/sql/TopicInsert.sql"
54
- db_audio_properties_insert_path = "src/db/sql/AudioPropertiesInsert.sql"
55
- db_utterance_insert_path = "src/db/sql/UtteranceInsert.sql"
56
 
57
  # Configuration
58
  config = OmegaConf.load(config_path)
@@ -73,7 +67,6 @@ async def main(audio_file_path: str):
73
  llm_result_handler = LLMResultHandler()
74
  cleaner = Cleaner()
75
  formatter = Formatter()
76
- db = Database(db_path)
77
  audio_feature_extractor = Audio(audio_file_path)
78
 
79
  # Step 1: Detect Dialogue
@@ -165,7 +158,12 @@ async def main(audio_file_path: str):
165
  annotator.add_conflict(conflict_result)
166
 
167
  # Step 14: Topic Detection
168
- topics = db.fetch(db_topic_fetch_path)
 
 
 
 
 
169
  topic_result = await llm_handler.generate(
170
  "TopicDetection",
171
  user_input=ssm,
@@ -173,106 +171,16 @@ async def main(audio_file_path: str):
173
  )
174
  annotator.add_topic(topic_result)
175
 
176
- # Step 15: File/Audio Feature Extraction
177
- props = audio_feature_extractor.properties()
178
-
179
- (
180
- name,
181
- file_extension,
182
- absolute_file_path,
183
- sample_rate,
184
- min_frequency,
185
- max_frequency,
186
- audio_bit_depth,
187
- num_channels,
188
- audio_duration,
189
- rms_loudness,
190
- final_features
191
- ) = props
192
-
193
- rms_loudness_db = final_features["RMSLoudness"]
194
- zero_crossing_rate_db = final_features["ZeroCrossingRate"]
195
- spectral_centroid_db = final_features["SpectralCentroid"]
196
- eq_20_250_db = final_features["EQ_20_250_Hz"]
197
- eq_250_2000_db = final_features["EQ_250_2000_Hz"]
198
- eq_2000_6000_db = final_features["EQ_2000_6000_Hz"]
199
- eq_6000_20000_db = final_features["EQ_6000_20000_Hz"]
200
- mfcc_values = [final_features[f"MFCC_{i}"] for i in range(1, 14)]
201
-
202
  final_output = annotator.finalize()
203
 
204
- # Step 16: Total Silence Calculation
205
  stats = SilenceStats.from_segments(final_output["ssm"])
206
  t_std = stats.threshold_std(factor=0.99)
207
  final_output["silence"] = t_std
208
 
209
  print("Final_Output:", final_output)
210
 
211
- # Step 17: Database
212
- # Step 17.1: Insert File Table
213
- summary = final_output.get("summary", "")
214
- conflict_flag = 1 if final_output.get("conflict", False) else 0
215
- silence_value = final_output.get("silence", 0.0)
216
- detected_topic = final_output.get("topic", "Unknown")
217
-
218
- topic_id = db.get_or_insert_topic_id(detected_topic, topics, db_topic_insert_path)
219
-
220
- params = (
221
- name,
222
- topic_id,
223
- file_extension,
224
- absolute_file_path,
225
- sample_rate,
226
- min_frequency,
227
- max_frequency,
228
- audio_bit_depth,
229
- num_channels,
230
- audio_duration,
231
- rms_loudness_db,
232
- zero_crossing_rate_db,
233
- spectral_centroid_db,
234
- eq_20_250_db,
235
- eq_250_2000_db,
236
- eq_2000_6000_db,
237
- eq_6000_20000_db,
238
- *mfcc_values,
239
- summary,
240
- conflict_flag,
241
- silence_value
242
- )
243
-
244
- last_id = db.insert(db_audio_properties_insert_path, params)
245
- print(f"Audio properties inserted successfully into the File table with ID: {last_id}")
246
-
247
- # Step 17.2: Insert Utterance Table
248
- utterances = final_output["ssm"]
249
-
250
- for utterance in utterances:
251
- file_id = last_id
252
- speaker = utterance["speaker"]
253
- sequence = utterance["index"]
254
- start_time = utterance["start_time"] / 1000.0
255
- end_time = utterance["end_time"] / 1000.0
256
- content = utterance["text"]
257
- sentiment = utterance["sentiment"]
258
- profane = 1 if utterance["profane"] else 0
259
-
260
- utterance_params = (
261
- file_id,
262
- speaker,
263
- sequence,
264
- start_time,
265
- end_time,
266
- content,
267
- sentiment,
268
- profane
269
- )
270
-
271
- db.insert(db_utterance_insert_path, utterance_params)
272
-
273
- print("Utterances inserted successfully into the Utterance table.")
274
-
275
- # Step 18: Clean Up
276
  cleaner.cleanup(temp_dir, audio_file_path)
277
 
278
  return final_output
@@ -314,4 +222,4 @@ with gr.Blocks() as demo:
314
  )
315
 
316
  if __name__ == "__main__":
317
- demo.launch(server_name="0.0.0.0", server_port=7860)
 
21
  from src.text.utils import Annotator
22
  from src.text.llm import LLMOrchestrator, LLMResultHandler
23
  from src.utils.utils import Cleaner
 
24
 
25
 
26
  async def main(audio_file_path: str):
 
47
  srt_output_path = ".temp/output.srt"
48
  config_path = "config/config.yaml"
49
  prompt_path = "config/prompt.yaml"
 
 
 
 
 
50
 
51
  # Configuration
52
  config = OmegaConf.load(config_path)
 
67
  llm_result_handler = LLMResultHandler()
68
  cleaner = Cleaner()
69
  formatter = Formatter()
 
70
  audio_feature_extractor = Audio(audio_file_path)
71
 
72
  # Step 1: Detect Dialogue
 
158
  annotator.add_conflict(conflict_result)
159
 
160
  # Step 14: Topic Detection
161
+ topics = [
162
+ "Complaint",
163
+ "Technical Support",
164
+ "Billing",
165
+ "Order Status",
166
+ ]
167
  topic_result = await llm_handler.generate(
168
  "TopicDetection",
169
  user_input=ssm,
 
171
  )
172
  annotator.add_topic(topic_result)
173
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
  final_output = annotator.finalize()
175
 
176
+ # Step 15: Total Silence Calculation
177
  stats = SilenceStats.from_segments(final_output["ssm"])
178
  t_std = stats.threshold_std(factor=0.99)
179
  final_output["silence"] = t_std
180
 
181
  print("Final_Output:", final_output)
182
 
183
+ # Step 16: Clean Up
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
  cleaner.cleanup(temp_dir, audio_file_path)
185
 
186
  return final_output
 
222
  )
223
 
224
  if __name__ == "__main__":
225
+ demo.launch()
automation/service/callytics.service DELETED
@@ -1,19 +0,0 @@
1
- [Unit]
2
- Description=Callytics
3
- After=network.target
4
-
5
- [Service]
6
- Type=simple
7
- User=bunyamin
8
- EnvironmentFile=/home/bunyamin/Callytics/.env
9
- WorkingDirectory=/home/bunyamin/Callytics
10
- ExecStart=/bin/bash -c "source /home/bunyamin/anaconda3/etc/profile.d/conda.sh \
11
- && conda activate Callytics \
12
- && python /home/bunyamin/Callytics/main.py"
13
- Restart=on-failure
14
- RestartSec=5
15
- StandardOutput=journal
16
- StandardError=journal
17
-
18
- [Install]
19
- WantedBy=multi-user.target
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
environment.yaml DELETED
@@ -1,31 +0,0 @@
1
- name: Callytics
2
- channels:
3
- - defaults
4
- - conda-forge
5
- dependencies:
6
- - python=3.11
7
- - pip:
8
- - cython==3.0.11
9
- - nemo_toolkit[asr]>=2.dev
10
- - nltk==3.9.1
11
- - faster-whisper==1.1.0
12
- - demucs==4.0.1
13
- - deepmultilingualpunctuation @ git+https://github.com/oliverguhr/deepmultilingualpunctuation.git@5a0dd7f4fd56687f59405aa8eba1144393d8b74b
14
- - ctc-forced-aligner @ git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git@c7cc7ce609e5f8f1f553fbd1e53124447ffe46d8
15
- - openai==1.57.0
16
- - accelerate>=0.26.0
17
- - torch==2.5.1
18
- - pydub==0.25.1
19
- - omegaconf==2.3.0
20
- - python-dotenv==1.0.1
21
- - transformers==4.47.0
22
- - librosa==0.10.2.post1
23
- - soundfile==0.12.1
24
- - noisereduce==3.0.3
25
- - numpy==1.26.4
26
- - pyannote.audio==3.3.2
27
- - watchdog==6.0.0
28
- - scipy==1.14.1
29
- - IPython==8.30.0
30
- - pyyaml==6.0.2
31
- - MPSENet==1.0.3