|
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta charset="UTF-8" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
|
<title>SpaCy NER Training Guide</title>
|
|
<link
|
|
rel="stylesheet"
|
|
href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css"
|
|
/>
|
|
<style>
|
|
body {
|
|
background-color: #121212;
|
|
font-family: "Poppins", sans-serif;
|
|
color: #e0e0e0;
|
|
margin: 0;
|
|
padding: 0;
|
|
}
|
|
h1,
|
|
h2 {
|
|
color: #007bff;
|
|
}
|
|
.step {
|
|
margin-bottom: 30px;
|
|
border: 1px solid #007bff;
|
|
border-radius: 5px;
|
|
padding: 20px;
|
|
background-color: #1e1e1e;
|
|
}
|
|
.btn-primary {
|
|
color: #fff;
|
|
background-color: #007bff;
|
|
border: 1px solid #007bff;
|
|
}
|
|
.btn-primary:hover {
|
|
background-color: transparent;
|
|
border: 1px solid #007bff;
|
|
}
|
|
</style>
|
|
</head>
|
|
<body>
|
|
<div class="container">
|
|
<h1>SpaCy NER Model Training Guide</h1>
|
|
|
|
<div class="step">
|
|
<h2>Step 1: Upload Your Resume File</h2>
|
|
<p>
|
|
Upload a resume or document file for text extraction. Supported
|
|
formats include:
|
|
</p>
|
|
<ul>
|
|
<li>PDF</li>
|
|
<li>DOCX (Word Document)</li>
|
|
<li>RSF (Rich Structured Format)</li>
|
|
<li>ODT (Open Document Text)</li>
|
|
<li>PNG, JPG, JPEG (Image Formats)</li>
|
|
<li>JSON</li>
|
|
</ul>
|
|
<p>
|
|
Ensure that your file is in one of the supported formats before
|
|
uploading. The system will extract and process the text from your
|
|
document automatically.
|
|
</p>
|
|
<a href="{{ url_for('index') }}" class="btn btn-primary"
|
|
>Proceed to Upload</a
|
|
>
|
|
</div>
|
|
|
|
<div class="step">
|
|
<h2>Step 2: Preview and Edit Extracted Text</h2>
|
|
<p>
|
|
After uploading your document, you will be shown a preview of the
|
|
extracted text. This preview allows you to edit the text if needed to
|
|
correct any extraction errors or remove unwanted content. Once you're
|
|
satisfied, click "Next" to proceed to Named Entity Recognition (NER)
|
|
annotations.
|
|
</p>
|
|
<a href="{{ url_for('text_preview') }}" class="btn btn-primary"
|
|
>Proceed to Text Preview</a
|
|
>
|
|
</div>
|
|
|
|
<div class="step">
|
|
<h2>Step 3: Annotate Named Entities</h2>
|
|
<p>
|
|
In this step, you will preview the Named Entity Recognition (NER)
|
|
results generated from your text. You can add new entity labels,
|
|
select relevant text for each label, and make manual adjustments. Once
|
|
you’ve annotated the text with the appropriate labels, save your
|
|
annotations and export the data in JSON format for model training.
|
|
</p>
|
|
<p>Instructions:</p>
|
|
<ul>
|
|
<li>Click "Begin!" to load the extracted text.</li>
|
|
<li>
|
|
Highlight sections of the text and assign them to the available
|
|
labels.
|
|
</li>
|
|
<li>Add new labels if necessary.</li>
|
|
<li>
|
|
Once done, click "Export" to download your annotations as a JSON
|
|
file.
|
|
</li>
|
|
</ul>
|
|
<a href="{{ url_for('ner_preview') }}" class="btn btn-primary"
|
|
>Proceed to NER Annotation</a
|
|
>
|
|
</div>
|
|
|
|
<div class="step">
|
|
<h2>Step 4: Save and Format JSON Data</h2>
|
|
<p>
|
|
Upload your annotated JSON file from the previous step. The system
|
|
will process and reformat the JSON file to ensure compatibility with
|
|
the SpaCy model training process. After formatting, you can proceed to
|
|
the model training step.
|
|
</p>
|
|
<p>Instructions:</p>
|
|
<ul>
|
|
<li>
|
|
Upload the JSON file you downloaded after the annotation step.
|
|
</li>
|
|
<li>Click "Process" to reformat the file.</li>
|
|
<li>
|
|
Once processing is complete, click "Next" to proceed with training.
|
|
</li>
|
|
</ul>
|
|
<a href="{{ url_for('json_file') }}" class="btn btn-primary"
|
|
>Proceed to Save JSON</a
|
|
>
|
|
</div>
|
|
|
|
<div class="step">
|
|
<h2>Step 5: Train the NER Model</h2>
|
|
<p>
|
|
In this final step, you will convert the formatted JSON data into the
|
|
SpaCy format and begin training the NER model. You can customize the
|
|
training by selecting the number of epochs (iterations) the model will
|
|
go through and setting the version for the trained model.
|
|
</p>
|
|
<p>Guidelines:</p>
|
|
<ul>
|
|
<li>
|
|
Number of epochs: The higher the number of epochs, the more times
|
|
the model will learn from the data, but too many epochs can lead to
|
|
overfitting. Start with 10 epochs for a balanced training approach.
|
|
</li>
|
|
<li>
|
|
Model versioning: Provide a version name for this training session,
|
|
so you can keep track of different versions of the model.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
Once the training is complete, you can download the latest version of
|
|
the trained model for use in production.
|
|
</p>
|
|
<a href="{{ url_for('spacy_file') }}" class="btn btn-primary"
|
|
>Proceed to Model Training</a
|
|
>
|
|
</div>
|
|
</div>
|
|
|
|
<script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script>
|
|
<script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script>
|
|
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script>
|
|
</body>
|
|
</html>
|
|
|