|
<!DOCTYPE html> |
|
<html lang="en"> |
|
<head> |
|
<meta charset="UTF-8" /> |
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> |
|
<title>SpaCy NER Training Guide</title> |
|
<link |
|
rel="stylesheet" |
|
href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" |
|
/> |
|
<style> |
|
body { |
|
background-color: #121212; |
|
font-family: "Poppins", sans-serif; |
|
color: #e0e0e0; |
|
margin: 0; |
|
padding: 0; |
|
} |
|
h1, |
|
h2 { |
|
color: #007bff; |
|
} |
|
.step { |
|
margin-bottom: 30px; |
|
border: 1px solid #007bff; |
|
border-radius: 5px; |
|
padding: 20px; |
|
background-color: #1e1e1e; |
|
} |
|
.btn-primary { |
|
color: #fff; |
|
background-color: #007bff; |
|
border: 1px solid #007bff; |
|
} |
|
.btn-primary:hover { |
|
background-color: transparent; |
|
border: 1px solid #007bff; |
|
} |
|
</style> |
|
</head> |
|
<body> |
|
<div class="container"> |
|
<h1>SpaCy NER Model Training Guide</h1> |
|
|
|
<div class="step"> |
|
<h2>Step 1: Upload Your Resume File</h2> |
|
<p> |
|
Upload a resume or document file for text extraction. Supported |
|
formats include: |
|
</p> |
|
<ul> |
|
<li>PDF</li> |
|
<li>DOCX (Word Document)</li> |
|
<li>RSF (Rich Structured Format)</li> |
|
<li>ODT (Open Document Text)</li> |
|
<li>PNG, JPG, JPEG (Image Formats)</li> |
|
<li>JSON</li> |
|
</ul> |
|
<p> |
|
Ensure that your file is in one of the supported formats before |
|
uploading. The system will extract and process the text from your |
|
document automatically. |
|
</p> |
|
<a href="{{ url_for('index') }}" class="btn btn-primary" |
|
>Proceed to Upload</a |
|
> |
|
</div> |
|
|
|
<div class="step"> |
|
<h2>Step 2: Preview and Edit Extracted Text</h2> |
|
<p> |
|
After uploading your document, you will be shown a preview of the |
|
extracted text. This preview allows you to edit the text if needed to |
|
correct any extraction errors or remove unwanted content. Once you're |
|
satisfied, click "Next" to proceed to Named Entity Recognition (NER) |
|
annotations. |
|
</p> |
|
<a href="{{ url_for('text_preview') }}" class="btn btn-primary" |
|
>Proceed to Text Preview</a |
|
> |
|
</div> |
|
|
|
<div class="step"> |
|
<h2>Step 3: Annotate Named Entities</h2> |
|
<p> |
|
In this step, you will preview the Named Entity Recognition (NER) |
|
results generated from your text. You can add new entity labels, |
|
select relevant text for each label, and make manual adjustments. Once |
|
you’ve annotated the text with the appropriate labels, save your |
|
annotations and export the data in JSON format for model training. |
|
NOTE:(following labels can be taken in use: ["ABOUT","CERTIFICATE", |
|
"COMPANY","CONTACT","COURSE", "DOB", "EMAIL", "EXPERIENCE", "HOBBIES", |
|
"INSTITUTE", "JOB_TITLE", "LANGUAGE", "LAST_QUALIFICATION_YEAR", "LINK", |
|
"LOCATION", "PERSON", "PROJECTS", "QUALIFICATION", "SCHOOL", "SKILL", |
|
"SOFT_SKILL", "UNIVERSITY", "YEARS_EXPERIENCE"] |
|
</p> |
|
<p>Instructions:</p> |
|
<ul> |
|
<li>Click "Begin!" to load the extracted text.</li> |
|
<li> |
|
Highlight sections of the text and assign them to the available |
|
labels. |
|
</li> |
|
<li>Add new labels if necessary.</li> |
|
<li> |
|
Once done, click "Export" to download your annotations as a JSON |
|
file. |
|
</li> |
|
</ul> |
|
<a href="{{ url_for('ner_preview') }}" class="btn btn-primary" |
|
>Proceed to NER Annotation</a |
|
> |
|
</div> |
|
|
|
<div class="step"> |
|
<h2>Step 4: Save and Format JSON Data</h2> |
|
<p> |
|
Upload your annotated JSON file from the previous step. The system |
|
will process and reformat the JSON file to ensure compatibility with |
|
the SpaCy model training process. After formatting, you can proceed to |
|
the model training step. |
|
</p> |
|
<p>Instructions:</p> |
|
<ul> |
|
<li> |
|
Upload the JSON file you downloaded after the annotation step. |
|
</li> |
|
<li>Click "Process" to reformat the file.</li> |
|
<li> |
|
Once processing is complete, click "Next" to proceed with training. |
|
</li> |
|
</ul> |
|
<a href="{{ url_for('json_file') }}" class="btn btn-primary" |
|
>Proceed to Save JSON</a |
|
> |
|
</div> |
|
|
|
<div class="step"> |
|
<h2>Step 5: Train the NER Model</h2> |
|
<p> |
|
In this final step, you will convert the formatted JSON data into the |
|
SpaCy format and begin training the NER model. You can customize the |
|
training by selecting the number of epochs (iterations) the model will |
|
go through and setting the version for the trained model. |
|
</p> |
|
<p>Guidelines:</p> |
|
<ul> |
|
<li> |
|
Number of epochs: The higher the number of epochs, the more times |
|
the model will learn from the data, but too many epochs can lead to |
|
overfitting. Start with 10 epochs for a balanced training approach. |
|
</li> |
|
<li> |
|
Model versioning: Provide a version name for this training session, |
|
so you can keep track of different versions of the model. |
|
</li> |
|
</ul> |
|
<p> |
|
Once the training is complete, you can download the latest version of |
|
the trained model for use in production. |
|
</p> |
|
<a href="{{ url_for('spacy_file') }}" class="btn btn-primary" |
|
>Proceed to Model Training</a |
|
> |
|
</div> |
|
</div> |
|
|
|
<script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script> |
|
<script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script> |
|
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script> |
|
</body> |
|
</html> |
|
|