Named Entity Tagger dengan korpus bahasa Indonesia menggunakan nltk ClassifierBasedTagger melakukan klasifikasi bagian kalimat yang merupakan Named Entity nama sesorang, lokasi, organisasi, waktu, dll.
Dependensi python:
- Sastrawi stemmer
- CRFTagger (nltk)
python main.py
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request :D
Hasil train terakhir dicoba tidak terlalu baik, untuk kasus kalimat
Per semester pertama 2004, total utang jangka panjang Telkom sebesar Rp 20,648 triliun.
hasil tag = (S
(org P/NN s/NNP)
p/NNP
2/CD
t/FW
(org u/FW)
(org j/FW)
(org p/FW)
(loc T/NNP s/NNP)
(loc R/NNP 2/CD)
t/NND)
# keluaran:
[Per semester] [pertama] [2004,] [total] [utang] [jangka] [panjang] [Telkom sebesar] [Rp 20,648] [triliun.]
org - - - org org org loc loc -
# ekspektasi:
[Per] [semester] [pertama] [2004,] [total] [utang] [jangka] [panjang] [Telkom] [sebesar] [Rp] [20,648] [triliun.]
- - - - - - - - org - - - -
Named Entity Extraction with Python (sebagian besar menggunakan tutorial ini)
http://nlpforhackers.io/named-entity-extraction/
Data Training NETagger
POS Tagger & NER Bahasa Indonesia dengan Python
https://yudiwbs.wordpress.com/2018/02/20/pos-tagger-bahasa-indonesia-dengan-pytho/ https://yudiwbs.wordpress.com/2018/02/18/ner-named-entity-recognition-bahasa-indonesia-dengan-stanford-ner/
Sastrawi Stemmer Python
https://github.com/har07/PySastrawi