Spaces:
Paused
Codingo - AI Powered Smart Recruitment System
This repository contains the implementation of Codingo, an AI-powered online recruitment platform designed to automate and enhance the hiring process through a virtual HR assistant named LUNA.
Project Overview
Codingo addresses the challenges of traditional recruitment processes by offering:
- Automated CV screening and skill-based shortlisting
- AI-led interviews through the virtual assistant LUNA
- Real-time cheating detection during assessments
- Gamified practice tools for candidates
- Secure administration interface for hiring managers
Getting Started
This guide outlines the development process, starting with local model training before moving to AWS deployment.
Prerequisites
- Python 3.8+
- pip (Python package manager)
- Git
Development Process
We'll implement the project in phases:
Phase 1: Local Training and Feature Extraction (Current Phase)
This initial phase focuses on building and training the model locally before AWS deployment.
Project Structure
Codingo/
βββ backend/ # Flask API backend
β βββ app.py # Flask server
β βββ predict.py # Predict using trained model
β βββ train_model.py # Model training script
β βββ model/ # Trained model artifacts
β β βββ cv_classifier.pkl
β βββ utils/
β β βββ text_extractor.py # PDF/DOCX to text
β β βββ preprocessor.py # Cleaning, tokenizing
β
βββ data/
β βββ training.csv # Your training dataset
β βββ raw_cvs/ # CV files (PDF/DOCX/txt)
β
βββ notebooks/
β βββ eda.ipynb # Data exploration & feature work
β
βββ requirements.txt # Python dependencies
βββ README.md # Project overview
Step-by-Step Implementation Guide
Step 1: Create Training Dataset
Start by manually collecting ~50-100 CV-like text samples with position labels.
File: data/training.csv
Example format:
text,position
"Experienced in Python, Flask, AWS",Backend Developer
"Built dashboards with React and TypeScript",Frontend Developer
"ML projects using pandas, scikit-learn",Data Scientist
Step 2: Train Model
Implement a classifier using scikit-learn to predict job roles from CV text.
File: backend/train_model.py
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
import joblib
# Load training data
df = pd.read_csv('data/training.csv')
# Define model pipeline
model = Pipeline([
('tfidf', TfidfVectorizer(max_features=5000, ngram_range=(1, 2))),
('classifier', LogisticRegression(max_iter=1000))
])
# Train model
model.fit(df['text'], df['position'])
# Save model
joblib.dump(model, 'backend/models/cv_classifier.pkl')
print("Model trained and saved successfully!")
Step 3: Test Prediction Locally
Create a script to verify your model works correctly.
File: backend/predict.py
import joblib
import sys
def predict_role(cv_text):
# Load the trained model
model = joblib.load('backend/models/cv_classifier.pkl')
# Make prediction
prediction = model.predict([cv_text])[0]
confidence = max(model.predict_proba([cv_text])[0]) * 100
return {
'predicted_position': prediction,
'confidence': f"{confidence:.2f}%"
}
if __name__ == "__main__":
if len(sys.argv) > 1:
# Get CV text from command line argument
cv_text = sys.argv[1]
else:
# Example CV text
cv_text = "Experienced Python developer with 5 years of experience in Flask and AWS."
result = predict_role(cv_text)
print(f"Predicted Position: {result['predicted_position']}")
print(f"Confidence: {result['confidence']}")
Step 4: Add Text Extraction Utility
Create utilities to extract text from PDF and DOCX files.
File: backend/utils/text_extractor.py
import fitz # PyMuPDF
import docx
import os
def extract_text_from_pdf(path):
"""Extract text from PDF file."""
doc = fitz.open(path)
text = ""
for page in doc:
text += page.get_text()
return text.strip()
def extract_text_from_docx(path):
"""Extract text from DOCX file."""
doc = docx.Document(path)
text = "\n".join([paragraph.text for paragraph in doc.paragraphs])
return text.strip()
def extract_text(file_path):
"""Extract text from either PDF or DOCX."""
extension = os.path.splitext(file_path)[1].lower()
if extension == '.pdf':
return extract_text_from_pdf(file_path)
elif extension in ['.docx', '.doc']:
return extract_text_from_docx(file_path)
elif extension == '.txt':
with open(file_path, 'r', encoding='utf-8') as f:
return f.read().strip()
else:
raise ValueError(f"Unsupported file extension: {extension}")
Step 5: Add Flask API (Simple)
Create a basic Flask API to accept CV uploads and return predictions.
File: backend/app.py
from flask import Flask, request, jsonify
from utils.text_extractor import extract_text
import joblib
import os
app = Flask(__name__)
model = joblib.load("model/cv_classifier.pkl")
# Ensure directories exist
os.makedirs("data/raw_cvs", exist_ok=True)
os.makedirs("model", exist_ok=True)
@app.route("/predict", methods=["POST"])
def predict():
if 'file' not in request.files:
return jsonify({"error": "No file provided"}), 400
file = request.files["file"]
file_path = f"data/raw_cvs/{file.filename}"
file.save(file_path)
try:
text = extract_text(file_path)
prediction = model.predict([text])[0]
confidence = max(model.predict_proba([text])[0]) * 100
return jsonify({
"predicted_position": prediction,
"confidence": f"{confidence:.2f}%"
})
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == "__main__":
app.run(debug=True)
Step 6: Install Dependencies
File: requirements.txt
flask
scikit-learn
pandas
joblib
PyMuPDF
python-docx
Run: pip install -r requirements.txt
Next Steps
After completing Phase 1, we'll move to:
Phase 2: Enhanced Model & NLP Features
- Implement BERT or DistilBERT for improved semantic understanding
- Add skill extraction from CVs
- Develop job-CV matching scoring
Phase 3: Web Interface & Chatbot
- Develop user interface for admin and candidates
- Implement LUNA virtual assistant using LangChain
- Add interview scheduling functionality
Phase 4: Video Interview & Proctoring
- Add video interview capabilities
- Implement cheating detection using computer vision
- Develop automated scoring system
Phase 5: AWS Deployment
- Set up AWS infrastructure using Terraform
- Deploy application to EC2/Lambda
- Configure S3 for file storage
Authors
- Hussein El Saadi
- Nour Ali Shaito
Supervisor
- Dr. Ali Ezzedine
License
This project is licensed under the MIT License - see the LICENSE file for details.