husseinelsaadi commited on
Commit
028190e
Β·
1 Parent(s): 277abd1

read me updated

Browse files
Files changed (1) hide show
  1. readme.md +9 -275
readme.md CHANGED
@@ -1,275 +1,9 @@
1
- # Codingo - AI Powered Smart Recruitment System
2
-
3
- This repository contains the implementation of Codingo, an AI-powered online recruitment platform designed to automate and enhance the hiring process through a virtual HR assistant named LUNA.
4
-
5
- ## Project Overview
6
-
7
- Codingo addresses the challenges of traditional recruitment processes by offering:
8
- - Automated CV screening and skill-based shortlisting
9
- - AI-led interviews through the virtual assistant LUNA
10
- - Real-time cheating detection during assessments
11
- - Gamified practice tools for candidates
12
- - Secure administration interface for hiring managers
13
-
14
- ## Getting Started
15
-
16
- This guide outlines the development process, starting with local model training before moving to AWS deployment.
17
-
18
- ### Prerequisites
19
-
20
- - Python 3.8+
21
- - pip (Python package manager)
22
- - Git
23
-
24
- ### Development Process
25
-
26
- We'll implement the project in phases:
27
-
28
- #### Phase 1: Local Training and Feature Extraction (Current Phase)
29
-
30
- This initial phase focuses on building and training the model locally before AWS deployment.
31
-
32
- ### Project Structure
33
-
34
- ```
35
- Codingo/
36
- β”œβ”€β”€ backend/ # Flask API backend
37
- β”‚ β”œβ”€β”€ app.py # Flask server
38
- β”‚ β”œβ”€β”€ predict.py # Predict using trained model
39
- β”‚ β”œβ”€β”€ train_model.py # Model training script
40
- β”‚ β”œβ”€β”€ model/ # Trained model artifacts
41
- β”‚ β”‚ └── cv_classifier.pkl
42
- β”‚ β”œβ”€β”€ utils/
43
- β”‚ β”‚ β”œβ”€β”€ text_extractor.py # PDF/DOCX to text
44
- β”‚ β”‚ └── preprocessor.py # Cleaning, tokenizing
45
- β”‚
46
- β”œβ”€β”€ data/
47
- β”‚ β”œβ”€β”€ training.csv # Your training dataset
48
- β”‚ └── raw_cvs/ # CV files (PDF/DOCX/txt)
49
- β”‚
50
- β”œβ”€β”€ notebooks/
51
- β”‚ └── eda.ipynb # Data exploration & feature work
52
- β”‚
53
- β”œβ”€β”€ requirements.txt # Python dependencies
54
- └── README.md # Project overview
55
- ```
56
-
57
- ## Step-by-Step Implementation Guide
58
-
59
- ### Step 1: Create Training Dataset
60
-
61
- Start by manually collecting ~50-100 CV-like text samples with position labels.
62
-
63
- **File:** `data/training.csv`
64
-
65
- Example format:
66
- ```
67
- text,position
68
- "Experienced in Python, Flask, AWS",Backend Developer
69
- "Built dashboards with React and TypeScript",Frontend Developer
70
- "ML projects using pandas, scikit-learn",Data Scientist
71
- ```
72
-
73
- ### Step 2: Train Model
74
-
75
- Implement a classifier using scikit-learn to predict job roles from CV text.
76
-
77
- **File:** `backend/train_model.py`
78
-
79
- ```python
80
- import pandas as pd
81
- from sklearn.feature_extraction.text import TfidfVectorizer
82
- from sklearn.pipeline import Pipeline
83
- from sklearn.linear_model import LogisticRegression
84
- import joblib
85
-
86
- # Load training data
87
- df = pd.read_csv('data/training.csv')
88
-
89
- # Define model pipeline
90
- model = Pipeline([
91
- ('tfidf', TfidfVectorizer(max_features=5000, ngram_range=(1, 2))),
92
- ('classifier', LogisticRegression(max_iter=1000))
93
- ])
94
-
95
- # Train model
96
- model.fit(df['text'], df['position'])
97
-
98
- # Save model
99
- joblib.dump(model, 'backend/models/cv_classifier.pkl')
100
-
101
- print("Model trained and saved successfully!")
102
- ```
103
-
104
- ### Step 3: Test Prediction Locally
105
-
106
- Create a script to verify your model works correctly.
107
-
108
- **File:** `backend/predict.py`
109
-
110
- ```python
111
- import joblib
112
- import sys
113
-
114
-
115
- def predict_role(cv_text):
116
- # Load the trained model
117
- model = joblib.load('backend/models/cv_classifier.pkl')
118
-
119
- # Make prediction
120
- prediction = model.predict([cv_text])[0]
121
- confidence = max(model.predict_proba([cv_text])[0]) * 100
122
-
123
- return {
124
- 'predicted_position': prediction,
125
- 'confidence': f"{confidence:.2f}%"
126
- }
127
-
128
-
129
- if __name__ == "__main__":
130
- if len(sys.argv) > 1:
131
- # Get CV text from command line argument
132
- cv_text = sys.argv[1]
133
- else:
134
- # Example CV text
135
- cv_text = "Experienced Python developer with 5 years of experience in Flask and AWS."
136
-
137
- result = predict_role(cv_text)
138
- print(f"Predicted Position: {result['predicted_position']}")
139
- print(f"Confidence: {result['confidence']}")
140
- ```
141
-
142
- ### Step 4: Add Text Extraction Utility
143
-
144
- Create utilities to extract text from PDF and DOCX files.
145
-
146
- **File:** `backend/utils/text_extractor.py`
147
-
148
- ```python
149
- import fitz # PyMuPDF
150
- import docx
151
- import os
152
-
153
- def extract_text_from_pdf(path):
154
- """Extract text from PDF file."""
155
- doc = fitz.open(path)
156
- text = ""
157
- for page in doc:
158
- text += page.get_text()
159
- return text.strip()
160
-
161
- def extract_text_from_docx(path):
162
- """Extract text from DOCX file."""
163
- doc = docx.Document(path)
164
- text = "\n".join([paragraph.text for paragraph in doc.paragraphs])
165
- return text.strip()
166
-
167
- def extract_text(file_path):
168
- """Extract text from either PDF or DOCX."""
169
- extension = os.path.splitext(file_path)[1].lower()
170
-
171
- if extension == '.pdf':
172
- return extract_text_from_pdf(file_path)
173
- elif extension in ['.docx', '.doc']:
174
- return extract_text_from_docx(file_path)
175
- elif extension == '.txt':
176
- with open(file_path, 'r', encoding='utf-8') as f:
177
- return f.read().strip()
178
- else:
179
- raise ValueError(f"Unsupported file extension: {extension}")
180
- ```
181
-
182
- ### Step 5: Add Flask API (Simple)
183
-
184
- Create a basic Flask API to accept CV uploads and return predictions.
185
-
186
- **File:** `backend/app.py`
187
-
188
- ```python
189
- from flask import Flask, request, jsonify
190
- from utils.text_extractor import extract_text
191
- import joblib
192
- import os
193
-
194
- app = Flask(__name__)
195
- model = joblib.load("model/cv_classifier.pkl")
196
-
197
- # Ensure directories exist
198
- os.makedirs("data/raw_cvs", exist_ok=True)
199
- os.makedirs("model", exist_ok=True)
200
-
201
- @app.route("/predict", methods=["POST"])
202
- def predict():
203
- if 'file' not in request.files:
204
- return jsonify({"error": "No file provided"}), 400
205
-
206
- file = request.files["file"]
207
- file_path = f"data/raw_cvs/{file.filename}"
208
- file.save(file_path)
209
-
210
- try:
211
- text = extract_text(file_path)
212
- prediction = model.predict([text])[0]
213
- confidence = max(model.predict_proba([text])[0]) * 100
214
-
215
- return jsonify({
216
- "predicted_position": prediction,
217
- "confidence": f"{confidence:.2f}%"
218
- })
219
- except Exception as e:
220
- return jsonify({"error": str(e)}), 500
221
-
222
- if __name__ == "__main__":
223
- app.run(debug=True)
224
- ```
225
-
226
- ### Step 6: Install Dependencies
227
-
228
- **File:** `requirements.txt`
229
-
230
- ```
231
- flask
232
- scikit-learn
233
- pandas
234
- joblib
235
- PyMuPDF
236
- python-docx
237
- ```
238
-
239
- Run: `pip install -r requirements.txt`
240
-
241
- ## Next Steps
242
-
243
- After completing Phase 1, we'll move to:
244
-
245
- 1. **Phase 2: Enhanced Model & NLP Features**
246
- - Implement BERT or DistilBERT for improved semantic understanding
247
- - Add skill extraction from CVs
248
- - Develop job-CV matching scoring
249
-
250
- 2. **Phase 3: Web Interface & Chatbot**
251
- - Develop user interface for admin and candidates
252
- - Implement LUNA virtual assistant using LangChain
253
- - Add interview scheduling functionality
254
-
255
- 3. **Phase 4: Video Interview & Proctoring**
256
- - Add video interview capabilities
257
- - Implement cheating detection using computer vision
258
- - Develop automated scoring system
259
-
260
- 4. **Phase 5: AWS Deployment**
261
- - Set up AWS infrastructure using Terraform
262
- - Deploy application to EC2/Lambda
263
- - Configure S3 for file storage
264
-
265
- ## Authors
266
-
267
- - Hussein El Saadi
268
- - Nour Ali Shaito
269
-
270
- ## Supervisor
271
- - Dr. Ali Ezzedine
272
-
273
- ## License
274
-
275
- This project is licensed under the MIT License - see the LICENSE file for details.
 
1
+ ---
2
+ title: Codingo
3
+ emoji: πŸ€–
4
+ colorFrom: indigo
5
+ colorTo: pink
6
+ sdk: docker
7
+ app_file: app.py
8
+ pinned: false
9
+ ---