Codingo / readme.md
husseinelsaadi's picture
Flask integrated and adjusted applying
504df0f
|
raw
history blame
7.37 kB

Codingo - AI Powered Smart Recruitment System

This repository contains the implementation of Codingo, an AI-powered online recruitment platform designed to automate and enhance the hiring process through a virtual HR assistant named LUNA.

Project Overview

Codingo addresses the challenges of traditional recruitment processes by offering:

  • Automated CV screening and skill-based shortlisting
  • AI-led interviews through the virtual assistant LUNA
  • Real-time cheating detection during assessments
  • Gamified practice tools for candidates
  • Secure administration interface for hiring managers

Getting Started

This guide outlines the development process, starting with local model training before moving to AWS deployment.

Prerequisites

  • Python 3.8+
  • pip (Python package manager)
  • Git

Development Process

We'll implement the project in phases:

Phase 1: Local Training and Feature Extraction (Current Phase)

This initial phase focuses on building and training the model locally before AWS deployment.

Project Structure

Codingo/
β”œβ”€β”€ backend/                     # Flask API backend
β”‚   β”œβ”€β”€ app.py                   # Flask server
β”‚   β”œβ”€β”€ predict.py               # Predict using trained model
β”‚   β”œβ”€β”€ train_model.py           # Model training script
β”‚   β”œβ”€β”€ model/                   # Trained model artifacts
β”‚   β”‚   └── cv_classifier.pkl
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ text_extractor.py    # PDF/DOCX to text
β”‚   β”‚   └── preprocessor.py      # Cleaning, tokenizing
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ training.csv             # Your training dataset
β”‚   └── raw_cvs/                 # CV files (PDF/DOCX/txt)
β”‚
β”œβ”€β”€ notebooks/
β”‚   └── eda.ipynb                # Data exploration & feature work
β”‚
β”œβ”€β”€ requirements.txt             # Python dependencies
└── README.md                    # Project overview

Step-by-Step Implementation Guide

Step 1: Create Training Dataset

Start by manually collecting ~50-100 CV-like text samples with position labels.

File: data/training.csv

Example format:

text,position
"Experienced in Python, Flask, AWS",Backend Developer
"Built dashboards with React and TypeScript",Frontend Developer
"ML projects using pandas, scikit-learn",Data Scientist

Step 2: Train Model

Implement a classifier using scikit-learn to predict job roles from CV text.

File: backend/train_model.py

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
import joblib

# Load training data
df = pd.read_csv('data/training.csv')

# Define model pipeline
model = Pipeline([
    ('tfidf', TfidfVectorizer(max_features=5000, ngram_range=(1, 2))),
    ('classifier', LogisticRegression(max_iter=1000))
])

# Train model
model.fit(df['text'], df['position'])

# Save model
joblib.dump(model, 'backend/models/cv_classifier.pkl')

print("Model trained and saved successfully!")

Step 3: Test Prediction Locally

Create a script to verify your model works correctly.

File: backend/predict.py

import joblib
import sys


def predict_role(cv_text):
    # Load the trained model
    model = joblib.load('backend/models/cv_classifier.pkl')

    # Make prediction
    prediction = model.predict([cv_text])[0]
    confidence = max(model.predict_proba([cv_text])[0]) * 100

    return {
        'predicted_position': prediction,
        'confidence': f"{confidence:.2f}%"
    }


if __name__ == "__main__":
    if len(sys.argv) > 1:
        # Get CV text from command line argument
        cv_text = sys.argv[1]
    else:
        # Example CV text
        cv_text = "Experienced Python developer with 5 years of experience in Flask and AWS."

    result = predict_role(cv_text)
    print(f"Predicted Position: {result['predicted_position']}")
    print(f"Confidence: {result['confidence']}")

Step 4: Add Text Extraction Utility

Create utilities to extract text from PDF and DOCX files.

File: backend/utils/text_extractor.py

import fitz  # PyMuPDF
import docx
import os

def extract_text_from_pdf(path):
    """Extract text from PDF file."""
    doc = fitz.open(path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text.strip()

def extract_text_from_docx(path):
    """Extract text from DOCX file."""
    doc = docx.Document(path)
    text = "\n".join([paragraph.text for paragraph in doc.paragraphs])
    return text.strip()

def extract_text(file_path):
    """Extract text from either PDF or DOCX."""
    extension = os.path.splitext(file_path)[1].lower()
    
    if extension == '.pdf':
        return extract_text_from_pdf(file_path)
    elif extension in ['.docx', '.doc']:
        return extract_text_from_docx(file_path)
    elif extension == '.txt':
        with open(file_path, 'r', encoding='utf-8') as f:
            return f.read().strip()
    else:
        raise ValueError(f"Unsupported file extension: {extension}")

Step 5: Add Flask API (Simple)

Create a basic Flask API to accept CV uploads and return predictions.

File: backend/app.py

from flask import Flask, request, jsonify
from utils.text_extractor import extract_text
import joblib
import os

app = Flask(__name__)
model = joblib.load("model/cv_classifier.pkl")

# Ensure directories exist
os.makedirs("data/raw_cvs", exist_ok=True)
os.makedirs("model", exist_ok=True)

@app.route("/predict", methods=["POST"])
def predict():
    if 'file' not in request.files:
        return jsonify({"error": "No file provided"}), 400
        
    file = request.files["file"]
    file_path = f"data/raw_cvs/{file.filename}"
    file.save(file_path)

    try:
        text = extract_text(file_path)
        prediction = model.predict([text])[0]
        confidence = max(model.predict_proba([text])[0]) * 100
        
        return jsonify({
            "predicted_position": prediction,
            "confidence": f"{confidence:.2f}%"
        })
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    app.run(debug=True)

Step 6: Install Dependencies

File: requirements.txt

flask
scikit-learn
pandas
joblib
PyMuPDF
python-docx

Run: pip install -r requirements.txt

Next Steps

After completing Phase 1, we'll move to:

  1. Phase 2: Enhanced Model & NLP Features

    • Implement BERT or DistilBERT for improved semantic understanding
    • Add skill extraction from CVs
    • Develop job-CV matching scoring
  2. Phase 3: Web Interface & Chatbot

    • Develop user interface for admin and candidates
    • Implement LUNA virtual assistant using LangChain
    • Add interview scheduling functionality
  3. Phase 4: Video Interview & Proctoring

    • Add video interview capabilities
    • Implement cheating detection using computer vision
    • Develop automated scoring system
  4. Phase 5: AWS Deployment

    • Set up AWS infrastructure using Terraform
    • Deploy application to EC2/Lambda
    • Configure S3 for file storage

Authors

  • Hussein El Saadi
  • Nour Ali Shaito

Supervisor

  • Dr. Ali Ezzedine

License

This project is licensed under the MIT License - see the LICENSE file for details.