KoELECTRA-small-v3-privacy-ner

This model is a fine-tuned version of monologg/koelectra-small-v3-discriminator on a synthesized privacy dataset. It achieves the following results on the evaluation set:

  • f1 = 0.9998728608843798
  • loss = 0.05310981854414328
  • precision = 0.9999237126509853
  • recall = 0.9998220142897098

Model description

ํƒœ๊น… ์‹œ์Šคํ…œ : BIO ์‹œ์Šคํ…œ

  • -B(begin) : ๊ฐœ์ฒด๋ช…์ด ์‹œ์ž‘ํ•  ๋•Œ
  • -I(inside) : ํ† ํฐ์ด ๊ฐœ์ฒด๋ช… ์ค‘๊ฐ„์— ์žˆ์„ ๋•Œ
  • O(outside) : ํ† ํฐ์ด ๊ฐœ์ฒด๋ช…์ด ์•„๋‹ ๊ฒฝ์šฐ

12๊ฐ€์ง€ ํ•œ๊ตญ์ธ ๊ฐœ์ธ์ •๋ณด ํŒจํ„ด์— ๋Œ€ํ•œ ํƒœ๊ทธ์…‹

๋ถ„๋ฅ˜ ํ‘œ๊ธฐ ์ •์˜
PERSON PER ํ•œ๊ตญ์ธ ์ด๋ฆ„
LOCATION LOC ํ•œ๊ตญ ์ฃผ์†Œ
RESIDENT REGISTRATION NUMBER RRN ํ•œ๊ตญ์ธ ์ฃผ๋ฏผ๋“ฑ๋ก๋ฒˆํ˜ธ
EMAIL EMA ์ด๋ฉ”์ผ
ID ID ์ผ๋ฐ˜ ๋กœ๊ทธ์ธ ID
PASSWORD PWD ์ผ๋ฐ˜ ๋กœ๊ทธ์ธ ๋น„๋ฐ€๋ฒˆํ˜ธ
ORGANIZATION ORG ์†Œ์† ๊ธฐ๊ด€
PHONE NUMBER PHN ์ „ํ™”๋ฒˆํ˜ธ
CARD NUMBER CRD ์นด๋“œ๋ฒˆํ˜ธ
ACCOUNT NUMBER ACC ๊ณ„์ขŒ๋ฒˆํ˜ธ
PASSPORT NUMBER PSP ์—ฌ๊ถŒ๋ฒˆํ˜ธ
DRIVER'S LICENSE NUMBER DLN ์šด์ „๋ฉดํ—ˆ๋ฒˆํ˜ธ

How to use

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("amoeba04/test1")
model = AutoModelForTokenClassification.from_pretrained("amoeba04/test1")
ner = pipeline("ner", model=model, tokenizer=tokenizer)

example = "์ง€๋‚œ์ฃผ, ํ™๊ธธ๋™ ์”จ๋Š” ์„œ์šธํŠน๋ณ„์‹œ ๊ฐ•๋‚จ๊ตฌ์— ์œ„์น˜ํ•œ ํ…Œํ—ค๋ž€๋กœ 101๋นŒ๋”ฉ์—์„œ ์ง„ํ–‰๋œ IT ์ปจํผ๋Ÿฐ์Šค์— ์ฐธ์„ํ–ˆ์Šต๋‹ˆ๋‹ค."
ner_results = ner(example)
print(ner_results)

์ถœ๋ ฅ: "PER-B, PER-B ์”จ๋Š” LOC-BLOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-I LOC-ILOC-ILOC-I์—์„œ ์ง„ํ–‰๋œ IT ์ปจํผ๋Ÿฐ์Šค์— ์ฐธ์„ํ–ˆ์Šต๋‹ˆ๋‹ค."

Training and evaluation data

์ž์ฒด ์ œ์ž‘ํ•œ ํ•œ๊ตญ์ธ ๊ฐœ์ธ์ •๋ณด ํŒจํ„ด ๊ธฐ๋ฐ˜ ๊ฐœ์ฒด๋ช… ์ธ์‹ (NER) ๋ฐ์ดํ„ฐ์…‹

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 512
  • eval_batch_size: 1024
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.1+cu118
  • Datasets 2.19.0
  • Tokenizers 0.19.1

Authors and Acknowledgments

This model was developed by the KNU Vision & Learning (KVL) Lab at Kyungpook National University.

For more information about our work, please visit our website.

Downloads last month
486
Safetensors
Model size
14.1M params
Tensor type
I64
ยท
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for amoeba04/koelectra-small-v3-privacy-ner

Finetuned
(4)
this model