Gi*i thi*um*ts* công c* x* lý ngôn ng* t* nhiên và khai phá d

Download Report

Transcript Gi*i thi*um*ts* công c* x* lý ngôn ng* t* nhiên và khai phá d

GIỚI THIỆU MỘT SỐ CÔNG CỤ XỬ LÝ NGÔN
NGỮ TỰ NHIÊN VÀ KHAI PHÁ DỮ LIỆU
TRẦN MAI VŨ
VIETNAMESE NLP TOOLS
 JVnTextPro: http://sourceforge.net/projects/jvntextpro/
 Sentence Segmentation, Sentence Tokenization, Word Segmentation, Pos Tagging
 VnToolkit: http://www.loria.fr/~lehong/softwares.php
 A software for automatically extracting LTAGs* from treebanks.
 An automatic tagger for Vietnamese texts
 A tokenize for automatic word segmentation of Vietnamese texts
 A sentence detector for automatic detecting sentences of Vietnamese texts
 VLSP Tools: http://vlsp.vietlp.org:8080/demo/?page=resources
 Vietnamese Chunking
(*) Lexicalized Tree Adjoining Grammars
NLP TOOLS
 LingPipe: http://alias-i.com/lingpipe/
 Gate – General Architecture for Text Engineering: http://gate.ac.uk/
 Mallet - Machine Learning for Language Toolkit:
http://mallet.cs.umass.edu/
 MinorThird: http://sourceforge.net/projects/minorthird/
 OpenNLP: http://opennlp.sourceforge.net/
PREPROCESSING TOOLS
 TextCat - Java Text Categorizing Library: http://textcat.sourceforge.net/
 HTML Parser: http://htmlparser.sourceforge.net/
 CyberNeko HTML Parser: http://nekohtml.sourceforge.net/
 Crawler4J: http://code.google.com/p/crawler4j/
 Lucene: http://lucene.apache.org/
OTHER TOOLS
 SVM-Light Support Vector Machine: http://svmlight.joachims.org/
 CRF: http://crf.sourceforge.net/
 Text Clustering Toolkit: http://mlg.ucd.ie/tct
 A Java Implementation of Latent Dirichlet Allocation (LDA) using Gibbs
Sampling for Parameter Estimation and Inference:
http://jgibblda.sourceforge.net/
DATA MINING TOOLS
 Weka - Machine Learning Software in Java:
http://sourceforge.net/projects/weka/
 RapidMiner -- Data Mining, ETL, OLAP, BI:
http://sourceforge.net/projects/yale/
 RSES - Rough Set Exploration System: http://logic.mimuw.edu.pl/~rses/
ONTOLOGY TOOLS
 The Protégé Ontology Editor and Knowledge Acquisition System:
http://protege.stanford.edu/
 Jena Semantic Web Framework: http://jena.sourceforge.net/