Spaces:

husseinelsaadi
/

Codingo

Paused

App Files Files Community

Codingo / readme.md

husseinelsaadi

Flask integrated and adjusted applying

504df0f 3 months ago

preview code

raw

history blame

7.37 kB

	# Codingo - AI Powered Smart Recruitment System

	This repository contains the implementation of Codingo, an AI-powered online recruitment platform designed to automate and enhance the hiring process through a virtual HR assistant named LUNA.

	## Project Overview

	Codingo addresses the challenges of traditional recruitment processes by offering:
	- Automated CV screening and skill-based shortlisting
	- AI-led interviews through the virtual assistant LUNA
	- Real-time cheating detection during assessments
	- Gamified practice tools for candidates
	- Secure administration interface for hiring managers

	## Getting Started

	This guide outlines the development process, starting with local model training before moving to AWS deployment.

	### Prerequisites

	- Python 3.8+
	- pip (Python package manager)
	- Git

	### Development Process

	We'll implement the project in phases:

	#### Phase 1: Local Training and Feature Extraction (Current Phase)

	This initial phase focuses on building and training the model locally before AWS deployment.

	### Project Structure

	```
	Codingo/
	├── backend/ # Flask API backend
	│ ├── app.py # Flask server
	│ ├── predict.py # Predict using trained model
	│ ├── train_model.py # Model training script
	│ ├── model/ # Trained model artifacts
	│ │ └── cv_classifier.pkl
	│ ├── utils/
	│ │ ├── text_extractor.py # PDF/DOCX to text
	│ │ └── preprocessor.py # Cleaning, tokenizing
	│
	├── data/
	│ ├── training.csv # Your training dataset
	│ └── raw_cvs/ # CV files (PDF/DOCX/txt)
	│
	├── notebooks/
	│ └── eda.ipynb # Data exploration & feature work
	│
	├── requirements.txt # Python dependencies
	└── README.md # Project overview
	```

	## Step-by-Step Implementation Guide

	### Step 1: Create Training Dataset

	Start by manually collecting ~50-100 CV-like text samples with position labels.

	File: `data/training.csv`

	Example format:
	```
	text,position
	"Experienced in Python, Flask, AWS",Backend Developer
	"Built dashboards with React and TypeScript",Frontend Developer
	"ML projects using pandas, scikit-learn",Data Scientist
	```

	### Step 2: Train Model

	Implement a classifier using scikit-learn to predict job roles from CV text.

	File: `backend/train_model.py`

	```python
	import pandas as pd
	from sklearn.feature_extraction.text import TfidfVectorizer
	from sklearn.pipeline import Pipeline
	from sklearn.linear_model import LogisticRegression
	import joblib

	# Load training data
	df = pd.read_csv('data/training.csv')

	# Define model pipeline
	model = Pipeline([
	('tfidf', TfidfVectorizer(max_features=5000, ngram_range=(1, 2))),
	('classifier', LogisticRegression(max_iter=1000))
	])

	# Train model
	model.fit(df['text'], df['position'])

	# Save model
	joblib.dump(model, 'backend/models/cv_classifier.pkl')

	print("Model trained and saved successfully!")
	```

	### Step 3: Test Prediction Locally

	Create a script to verify your model works correctly.

	File: `backend/predict.py`

	```python
	import joblib
	import sys


	def predict_role(cv_text):
	# Load the trained model
	model = joblib.load('backend/models/cv_classifier.pkl')

	# Make prediction
	prediction = model.predict([cv_text])[0]
	confidence = max(model.predict_proba([cv_text])[0]) * 100

	return {
	'predicted_position': prediction,
	'confidence': f"{confidence:.2f}%"
	}


	if __name__ == "__main__":
	if len(sys.argv) > 1:
	# Get CV text from command line argument
	cv_text = sys.argv[1]
	else:
	# Example CV text
	cv_text = "Experienced Python developer with 5 years of experience in Flask and AWS."

	result = predict_role(cv_text)
	print(f"Predicted Position: {result['predicted_position']}")
	print(f"Confidence: {result['confidence']}")
	```

	### Step 4: Add Text Extraction Utility

	Create utilities to extract text from PDF and DOCX files.

	File: `backend/utils/text_extractor.py`

	```python
	import fitz # PyMuPDF
	import docx
	import os

	def extract_text_from_pdf(path):
	"""Extract text from PDF file."""
	doc = fitz.open(path)
	text = ""
	for page in doc:
	text += page.get_text()
	return text.strip()

	def extract_text_from_docx(path):
	"""Extract text from DOCX file."""
	doc = docx.Document(path)
	text = "\n".join([paragraph.text for paragraph in doc.paragraphs])
	return text.strip()

	def extract_text(file_path):
	"""Extract text from either PDF or DOCX."""
	extension = os.path.splitext(file_path)[1].lower()

	if extension == '.pdf':
	return extract_text_from_pdf(file_path)
	elif extension in ['.docx', '.doc']:
	return extract_text_from_docx(file_path)
	elif extension == '.txt':
	with open(file_path, 'r', encoding='utf-8') as f:
	return f.read().strip()
	else:
	raise ValueError(f"Unsupported file extension: {extension}")
	```

	### Step 5: Add Flask API (Simple)

	Create a basic Flask API to accept CV uploads and return predictions.

	File: `backend/app.py`

	```python
	from flask import Flask, request, jsonify
	from utils.text_extractor import extract_text
	import joblib
	import os

	app = Flask(__name__)
	model = joblib.load("model/cv_classifier.pkl")

	# Ensure directories exist
	os.makedirs("data/raw_cvs", exist_ok=True)
	os.makedirs("model", exist_ok=True)

	@app.route("/predict", methods=["POST"])
	def predict():
	if 'file' not in request.files:
	return jsonify({"error": "No file provided"}), 400

	file = request.files["file"]
	file_path = f"data/raw_cvs/{file.filename}"
	file.save(file_path)

	try:
	text = extract_text(file_path)
	prediction = model.predict([text])[0]
	confidence = max(model.predict_proba([text])[0]) * 100

	return jsonify({
	"predicted_position": prediction,
	"confidence": f"{confidence:.2f}%"
	})
	except Exception as e:
	return jsonify({"error": str(e)}), 500

	if __name__ == "__main__":
	app.run(debug=True)
	```

	### Step 6: Install Dependencies

	File: `requirements.txt`

	```
	flask
	scikit-learn
	pandas
	joblib
	PyMuPDF
	python-docx
	```

	Run: `pip install -r requirements.txt`

	## Next Steps

	After completing Phase 1, we'll move to:

	1. Phase 2: Enhanced Model & NLP Features
	- Implement BERT or DistilBERT for improved semantic understanding
	- Add skill extraction from CVs
	- Develop job-CV matching scoring

	2. Phase 3: Web Interface & Chatbot
	- Develop user interface for admin and candidates
	- Implement LUNA virtual assistant using LangChain
	- Add interview scheduling functionality

	3. Phase 4: Video Interview & Proctoring
	- Add video interview capabilities
	- Implement cheating detection using computer vision
	- Develop automated scoring system

	4. Phase 5: AWS Deployment
	- Set up AWS infrastructure using Terraform
	- Deploy application to EC2/Lambda
	- Configure S3 for file storage

	## Authors

	- Hussein El Saadi
	- Nour Ali Shaito

	## Supervisor
	- Dr. Ali Ezzedine

	## License

	This project is licensed under the MIT License - see the LICENSE file for details.