Spaces:
Sleeping
Sleeping
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8" /> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> | |
| <title>SpaCy NER Training Guide</title> | |
| <link | |
| rel="stylesheet" | |
| href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" | |
| /> | |
| <style> | |
| body { | |
| background-color: #121212; | |
| font-family: "Poppins", sans-serif; | |
| color: #e0e0e0; | |
| margin: 0; | |
| padding: 0; | |
| } | |
| h1, | |
| h2 { | |
| color: #007bff; | |
| } | |
| .step { | |
| margin-bottom: 30px; | |
| border: 1px solid #007bff; | |
| border-radius: 5px; | |
| padding: 20px; | |
| background-color: #1e1e1e; | |
| } | |
| .btn-primary { | |
| color: #fff; | |
| background-color: #007bff; | |
| border: 1px solid #007bff; | |
| } | |
| .btn-primary:hover { | |
| background-color: transparent; | |
| border: 1px solid #007bff; | |
| } | |
| </style> | |
| </head> | |
| <body> | |
| <div class="container"> | |
| <h1>SpaCy NER Model Training Guide</h1> | |
| <div class="step"> | |
| <h2>Step 1: Upload Your Resume File</h2> | |
| <p> | |
| Upload a resume or document file for text extraction. Supported | |
| formats include: | |
| </p> | |
| <ul> | |
| <li>PDF</li> | |
| <li>DOCX (Word Document)</li> | |
| <li>RSF (Rich Structured Format)</li> | |
| <li>ODT (Open Document Text)</li> | |
| <li>PNG, JPG, JPEG (Image Formats)</li> | |
| <li>JSON</li> | |
| </ul> | |
| <p> | |
| Ensure that your file is in one of the supported formats before | |
| uploading. The system will extract and process the text from your | |
| document automatically. | |
| </p> | |
| <a href="{{ url_for('index') }}" class="btn btn-primary" | |
| >Proceed to Upload</a | |
| > | |
| </div> | |
| <div class="step"> | |
| <h2>Step 2: Preview and Edit Extracted Text</h2> | |
| <p> | |
| After uploading your document, you will be shown a preview of the | |
| extracted text. This preview allows you to edit the text if needed to | |
| correct any extraction errors or remove unwanted content. Once you're | |
| satisfied, click "Next" to proceed to Named Entity Recognition (NER) | |
| annotations. | |
| </p> | |
| <a href="{{ url_for('text_preview') }}" class="btn btn-primary" | |
| >Proceed to Text Preview</a | |
| > | |
| </div> | |
| <div class="step"> | |
| <h2>Step 3: Annotate Named Entities</h2> | |
| <p> | |
| In this step, you will preview the Named Entity Recognition (NER) | |
| results generated from your text. You can add new entity labels, | |
| select relevant text for each label, and make manual adjustments. Once | |
| you’ve annotated the text with the appropriate labels, save your | |
| annotations and export the data in JSON format for model training. | |
| </p> | |
| <p>Instructions:</p> | |
| <ul> | |
| <li>Click "Begin!" to load the extracted text.</li> | |
| <li> | |
| Highlight sections of the text and assign them to the available | |
| labels. | |
| </li> | |
| <li>Add new labels if necessary.</li> | |
| <li> | |
| Once done, click "Export" to download your annotations as a JSON | |
| file. | |
| </li> | |
| </ul> | |
| <a href="{{ url_for('ner_preview') }}" class="btn btn-primary" | |
| >Proceed to NER Annotation</a | |
| > | |
| </div> | |
| <div class="step"> | |
| <h2>Step 4: Save and Format JSON Data</h2> | |
| <p> | |
| Upload your annotated JSON file from the previous step. The system | |
| will process and reformat the JSON file to ensure compatibility with | |
| the SpaCy model training process. After formatting, you can proceed to | |
| the model training step. | |
| </p> | |
| <p>Instructions:</p> | |
| <ul> | |
| <li> | |
| Upload the JSON file you downloaded after the annotation step. | |
| </li> | |
| <li>Click "Process" to reformat the file.</li> | |
| <li> | |
| Once processing is complete, click "Next" to proceed with training. | |
| </li> | |
| </ul> | |
| <a href="{{ url_for('json_file') }}" class="btn btn-primary" | |
| >Proceed to Save JSON</a | |
| > | |
| </div> | |
| <div class="step"> | |
| <h2>Step 5: Train the NER Model</h2> | |
| <p> | |
| In this final step, you will convert the formatted JSON data into the | |
| SpaCy format and begin training the NER model. You can customize the | |
| training by selecting the number of epochs (iterations) the model will | |
| go through and setting the version for the trained model. | |
| </p> | |
| <p>Guidelines:</p> | |
| <ul> | |
| <li> | |
| Number of epochs: The higher the number of epochs, the more times | |
| the model will learn from the data, but too many epochs can lead to | |
| overfitting. Start with 10 epochs for a balanced training approach. | |
| </li> | |
| <li> | |
| Model versioning: Provide a version name for this training session, | |
| so you can keep track of different versions of the model. | |
| </li> | |
| </ul> | |
| <p> | |
| Once the training is complete, you can download the latest version of | |
| the trained model for use in production. | |
| </p> | |
| <a href="{{ url_for('spacy_file') }}" class="btn btn-primary" | |
| >Proceed to Model Training</a | |
| > | |
| </div> | |
| </div> | |
| <script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script> | |
| <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script> | |
| </body> | |
| </html> | |