Gi*i thi*um*ts* công c* x* lý ngôn ng* t* nhiên và khai phá d
Download
Report
Transcript Gi*i thi*um*ts* công c* x* lý ngôn ng* t* nhiên và khai phá d
GIỚI THIỆU MỘT SỐ CÔNG CỤ XỬ LÝ NGÔN
NGỮ TỰ NHIÊN VÀ KHAI PHÁ DỮ LIỆU
TRẦN MAI VŨ
VIETNAMESE NLP TOOLS
JVnTextPro: http://sourceforge.net/projects/jvntextpro/
Sentence Segmentation, Sentence Tokenization, Word Segmentation, Pos Tagging
VnToolkit: http://www.loria.fr/~lehong/softwares.php
A software for automatically extracting LTAGs* from treebanks.
An automatic tagger for Vietnamese texts
A tokenize for automatic word segmentation of Vietnamese texts
A sentence detector for automatic detecting sentences of Vietnamese texts
VLSP Tools: http://vlsp.vietlp.org:8080/demo/?page=resources
Vietnamese Chunking
(*) Lexicalized Tree Adjoining Grammars
NLP TOOLS
LingPipe: http://alias-i.com/lingpipe/
Gate – General Architecture for Text Engineering: http://gate.ac.uk/
Mallet - Machine Learning for Language Toolkit:
http://mallet.cs.umass.edu/
MinorThird: http://sourceforge.net/projects/minorthird/
OpenNLP: http://opennlp.sourceforge.net/
PREPROCESSING TOOLS
TextCat - Java Text Categorizing Library: http://textcat.sourceforge.net/
HTML Parser: http://htmlparser.sourceforge.net/
CyberNeko HTML Parser: http://nekohtml.sourceforge.net/
Crawler4J: http://code.google.com/p/crawler4j/
Lucene: http://lucene.apache.org/
OTHER TOOLS
SVM-Light Support Vector Machine: http://svmlight.joachims.org/
CRF: http://crf.sourceforge.net/
Text Clustering Toolkit: http://mlg.ucd.ie/tct
A Java Implementation of Latent Dirichlet Allocation (LDA) using Gibbs
Sampling for Parameter Estimation and Inference:
http://jgibblda.sourceforge.net/
DATA MINING TOOLS
Weka - Machine Learning Software in Java:
http://sourceforge.net/projects/weka/
RapidMiner -- Data Mining, ETL, OLAP, BI:
http://sourceforge.net/projects/yale/
RSES - Rough Set Exploration System: http://logic.mimuw.edu.pl/~rses/
ONTOLOGY TOOLS
The Protégé Ontology Editor and Knowledge Acquisition System:
http://protege.stanford.edu/
Jena Semantic Web Framework: http://jena.sourceforge.net/