Spaces:
Paused
Paused
# Codingo - AI Powered Smart Recruitment System | |
This repository contains the implementation of Codingo, an AI-powered online recruitment platform designed to automate and enhance the hiring process through a virtual HR assistant named LUNA. | |
## Project Overview | |
Codingo addresses the challenges of traditional recruitment processes by offering: | |
- Automated CV screening and skill-based shortlisting | |
- AI-led interviews through the virtual assistant LUNA | |
- Real-time cheating detection during assessments | |
- Gamified practice tools for candidates | |
- Secure administration interface for hiring managers | |
## Getting Started | |
This guide outlines the development process, starting with local model training before moving to AWS deployment. | |
### Prerequisites | |
- Python 3.8+ | |
- pip (Python package manager) | |
- Git | |
### Development Process | |
We'll implement the project in phases: | |
#### Phase 1: Local Training and Feature Extraction (Current Phase) | |
This initial phase focuses on building and training the model locally before AWS deployment. | |
### Project Structure | |
``` | |
Codingo/ | |
βββ backend/ # Flask API backend | |
β βββ app.py # Flask server | |
β βββ predict.py # Predict using trained model | |
β βββ train_model.py # Model training script | |
β βββ model/ # Trained model artifacts | |
β β βββ cv_classifier.pkl | |
β βββ utils/ | |
β β βββ text_extractor.py # PDF/DOCX to text | |
β β βββ preprocessor.py # Cleaning, tokenizing | |
β | |
βββ data/ | |
β βββ training.csv # Your training dataset | |
β βββ raw_cvs/ # CV files (PDF/DOCX/txt) | |
β | |
βββ notebooks/ | |
β βββ eda.ipynb # Data exploration & feature work | |
β | |
βββ requirements.txt # Python dependencies | |
βββ README.md # Project overview | |
``` | |
## Step-by-Step Implementation Guide | |
### Step 1: Create Training Dataset | |
Start by manually collecting ~50-100 CV-like text samples with position labels. | |
**File:** `data/training.csv` | |
Example format: | |
``` | |
text,position | |
"Experienced in Python, Flask, AWS",Backend Developer | |
"Built dashboards with React and TypeScript",Frontend Developer | |
"ML projects using pandas, scikit-learn",Data Scientist | |
``` | |
### Step 2: Train Model | |
Implement a classifier using scikit-learn to predict job roles from CV text. | |
**File:** `backend/train_model.py` | |
```python | |
import pandas as pd | |
from sklearn.feature_extraction.text import TfidfVectorizer | |
from sklearn.pipeline import Pipeline | |
from sklearn.linear_model import LogisticRegression | |
import joblib | |
# Load training data | |
df = pd.read_csv('data/training.csv') | |
# Define model pipeline | |
model = Pipeline([ | |
('tfidf', TfidfVectorizer(max_features=5000, ngram_range=(1, 2))), | |
('classifier', LogisticRegression(max_iter=1000)) | |
]) | |
# Train model | |
model.fit(df['text'], df['position']) | |
# Save model | |
joblib.dump(model, 'backend/models/cv_classifier.pkl') | |
print("Model trained and saved successfully!") | |
``` | |
### Step 3: Test Prediction Locally | |
Create a script to verify your model works correctly. | |
**File:** `backend/predict.py` | |
```python | |
import joblib | |
import sys | |
def predict_role(cv_text): | |
# Load the trained model | |
model = joblib.load('backend/models/cv_classifier.pkl') | |
# Make prediction | |
prediction = model.predict([cv_text])[0] | |
confidence = max(model.predict_proba([cv_text])[0]) * 100 | |
return { | |
'predicted_position': prediction, | |
'confidence': f"{confidence:.2f}%" | |
} | |
if __name__ == "__main__": | |
if len(sys.argv) > 1: | |
# Get CV text from command line argument | |
cv_text = sys.argv[1] | |
else: | |
# Example CV text | |
cv_text = "Experienced Python developer with 5 years of experience in Flask and AWS." | |
result = predict_role(cv_text) | |
print(f"Predicted Position: {result['predicted_position']}") | |
print(f"Confidence: {result['confidence']}") | |
``` | |
### Step 4: Add Text Extraction Utility | |
Create utilities to extract text from PDF and DOCX files. | |
**File:** `backend/utils/text_extractor.py` | |
```python | |
import fitz # PyMuPDF | |
import docx | |
import os | |
def extract_text_from_pdf(path): | |
"""Extract text from PDF file.""" | |
doc = fitz.open(path) | |
text = "" | |
for page in doc: | |
text += page.get_text() | |
return text.strip() | |
def extract_text_from_docx(path): | |
"""Extract text from DOCX file.""" | |
doc = docx.Document(path) | |
text = "\n".join([paragraph.text for paragraph in doc.paragraphs]) | |
return text.strip() | |
def extract_text(file_path): | |
"""Extract text from either PDF or DOCX.""" | |
extension = os.path.splitext(file_path)[1].lower() | |
if extension == '.pdf': | |
return extract_text_from_pdf(file_path) | |
elif extension in ['.docx', '.doc']: | |
return extract_text_from_docx(file_path) | |
elif extension == '.txt': | |
with open(file_path, 'r', encoding='utf-8') as f: | |
return f.read().strip() | |
else: | |
raise ValueError(f"Unsupported file extension: {extension}") | |
``` | |
### Step 5: Add Flask API (Simple) | |
Create a basic Flask API to accept CV uploads and return predictions. | |
**File:** `backend/app.py` | |
```python | |
from flask import Flask, request, jsonify | |
from utils.text_extractor import extract_text | |
import joblib | |
import os | |
app = Flask(__name__) | |
model = joblib.load("model/cv_classifier.pkl") | |
# Ensure directories exist | |
os.makedirs("data/raw_cvs", exist_ok=True) | |
os.makedirs("model", exist_ok=True) | |
@app.route("/predict", methods=["POST"]) | |
def predict(): | |
if 'file' not in request.files: | |
return jsonify({"error": "No file provided"}), 400 | |
file = request.files["file"] | |
file_path = f"data/raw_cvs/{file.filename}" | |
file.save(file_path) | |
try: | |
text = extract_text(file_path) | |
prediction = model.predict([text])[0] | |
confidence = max(model.predict_proba([text])[0]) * 100 | |
return jsonify({ | |
"predicted_position": prediction, | |
"confidence": f"{confidence:.2f}%" | |
}) | |
except Exception as e: | |
return jsonify({"error": str(e)}), 500 | |
if __name__ == "__main__": | |
app.run(debug=True) | |
``` | |
### Step 6: Install Dependencies | |
**File:** `requirements.txt` | |
``` | |
flask | |
scikit-learn | |
pandas | |
joblib | |
PyMuPDF | |
python-docx | |
``` | |
Run: `pip install -r requirements.txt` | |
## Next Steps | |
After completing Phase 1, we'll move to: | |
1. **Phase 2: Enhanced Model & NLP Features** | |
- Implement BERT or DistilBERT for improved semantic understanding | |
- Add skill extraction from CVs | |
- Develop job-CV matching scoring | |
2. **Phase 3: Web Interface & Chatbot** | |
- Develop user interface for admin and candidates | |
- Implement LUNA virtual assistant using LangChain | |
- Add interview scheduling functionality | |
3. **Phase 4: Video Interview & Proctoring** | |
- Add video interview capabilities | |
- Implement cheating detection using computer vision | |
- Develop automated scoring system | |
4. **Phase 5: AWS Deployment** | |
- Set up AWS infrastructure using Terraform | |
- Deploy application to EC2/Lambda | |
- Configure S3 for file storage | |
## Authors | |
- Hussein El Saadi | |
- Nour Ali Shaito | |
## Supervisor | |
- Dr. Ali Ezzedine | |
## License | |
This project is licensed under the MIT License - see the LICENSE file for details. | |