Keyphrase Extraction in Scientific Publications

Download Report

Transcript Keyphrase Extraction in Scientific Publications

Web IR / NLP Group
(WING) Architecture
Min-Yen Kan
School of Computing
National University of Singapore
Min-Yen Kan
Projects
Funded
• CSIDM (CAS, China): Aobo, CSIDM Interns
• ForeCite (Expires Oct 2010): Kaz, Emma, Thang
Proposed
• Data Cleaning in the Cloud (UCI)
• Text Mining Clinical Articles (Duke-NUS / UCI)
– Shreyasee, Justin,
• Text Mining Scientific Articles (Global Asia Institute)
• ForeCite2
WING, NUS
2
Min-Yen Kan
Research Topics
DL
IR/MM/HCI
NLP
•Yee Fan Tan - Record Matching in
Digital Libraries
•Jin Zhao - Math Equation IR
•Kazunari Sugiyama - Recommender
Systems in Digital Libraries
•Minh Thang Luong – ForeCite
•Jesse Gozali – Phototaking
Behavior
•Ziheng Lin – Rhetorical Discourse
Analysis
•Cong Duy Vu Hoang – Related
Work Summarization
•Jun Ping Ng – Logic in Question
Answering
•Aobo Wang – Crowdsourcing for
Machine Translation
•Emma Thuy Dung Nguyen – ForeCite
•Shihong Huang, Wai Hong Loh – Tooltip
translator for Firefox
WING, NUS
Incoming Staff (4 UROP, 1 Intern):
•Shomir Wilson (Intern) – Mention
Detection in Scientific Articles, w/ Jin
•Shawn Tan (UROP) – Continuing
PARCELS, w/ Jesse
•Tamisa Huangsiri, Low Wee Hung –
(UROP) CSIDM Firefox w/ Aobo, Jun
Ping
•Yipeng Huang (UROP) – Cloud Data
Cleaning, w/ Yee Fan, Jin
3
Min-Yen Kan
Responsibilities (to be discussed)
• Kaz: Non-CSIDM UROP guidance
• Yee Fan: None (Thesis Writing!!)
• Jin: RPNLPIR / Meeting and Room Bookings
• Ziheng: Publication Page / Joomla / Social
• Jesse: RoR / FC / CSX
• Aobo: RoR / Web System Admin
• Jun Ping: System Admin Lead
WING, NUS
4
Min-Yen Kan
Cluster Architecture
Systems named after Singapore’s highways
Fixed IP
– CTE – RAID drive host,
LDAP host, source code
repository
– AYE – webserver,
mailserver, mailman, virtual
host on ECP
DHCP (.ddns.)
– ECP – LDAP backup
– PIE – compute server
Windows Server (.ddns)
– KPE
– KJE
– BKE
– SLE
WING, NUS
5
Min-Yen Kan
OS support
All *nix group machines run CentOS 5
• stable Linux Enterprise distribution
• all mount cte’s raid drive, plus other automounts
Future
• use rsync to sync all binaries across machines
• expand RAID to encompass disks over different
machines for more space (more SAN like)
WING, NUS
6
Min-Yen Kan
RAID setup
• Currently 5.0 TB in RAID 5?
• ext3 mounted to cte
– /mnt/homes – home directories
– /mnt/rpnlpir-indep – machine indep data (datasets)
– /mnt/rpnlpir-Linux – binaries
– /mnt/rpnlpir-Windows – binaries
Future
• DB server coming online for Rails applications
WING, NUS
7
Min-Yen Kan
Webserver (aye.comp.nus.edu.sg)
• Apache
• Virtual hosts (wing.comp, linc.comp, opac.comp)
• Hosts Tomcat for java servlets
• Hosts gmond (Gangila monitor)
• Runs webalizer for stats
• Hosts Ruby on Rails apps (Trung’s myror script; to be
deprecated soon)
• Hosts web service server (router for web service
calls)
WING, NUS
8
Min-Yen Kan
Web Services
• Our infrastructure tuned to make many services and
demos by web services.
• External calls to port 4000
• List of Webservices on
http://wing.comp.nus.edu.sg/~forecite/
• Calls handled by WebServiceServer (WSS) ruby code.
• Directory for webservices currently at
/home/forecite/services/
WING, NUS
9
Min-Yen Kan
Joomla
• For our website
• Administration by admin@wing, PhD students
Customizations
• Forum integration (phpbb)
– Forum has contact information for all staff
– Forum userdb not yet synched with shadow pass in LDAP
• RPNLPIR (resource list)
• Blog
WING, NUS
10
Min-Yen Kan
Mailing List
• mailman run on aye
• lists also run on wing (alias for aye)
• both local and international mailing list hosted here
WING, NUS
11
Min-Yen Kan
LDAP
• To keep logins/uids/guids synched
• Main server on cte
• Backup on aye
• Needs to be robust in case of failure of LDAP server
• Local root for all machines must be maintained
WING, NUS
12
Min-Yen Kan
RPNLPIR (Research Project for NLP / IR)
• Common team account
• Keep software repository mirrored by web page
listing
• Keeps CVS repo in ~/CVSDir
• Keeps git repo in ~/repo
• Accessible to all group members
WING, NUS
13