GoldenBullet IC - DEMO

Download Report

Transcript GoldenBullet IC - DEMO

Intelligent Classifier
User-friendly Semi-Automatic
Product Classification System
STI Innsbruck & Excogito
People
1. Supervision: Marcus Spies
2. People: Sigurd Harand, Christian Leibold
3. Contact person: Christian Leibold,
[email protected]
4. Industrial cooperation with Excogito, Maksym Korotkiy
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Outline
1. Context: Product Classification Problem
2. Project intro, positioning and objectives
3. Workflow driven approach
4. GoldenBullet shooting market
a) Improved Software architecture
• Java XML Registries
• User taxonomies
b) Improved (re-)usability and quality
5. Conclusions and Future
6. Online Demo
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Product Classification Problem
1. E-Catalogs contain thousands of cryptic product
descriptions
1. CAREPAQ BUREAU PROSIGNIA3YRS/SITE/J+1/TEL
2. TRAINING ACT/ASEEXCEPT TRU64UNIX and OPENVMS
3. ….
2. Businesses have to deal with thousands of e-catalogs
3. Classification standards have tens of thousands of
product categories (21192 in UNSPSC 8.04)
4. The result: high manual classification effort is required
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
GB IC Positioning and Objectives
• many standards (e.g. UNSPSC, eCl@ss, ebXML, GPC, …),
– ~20.000 classes,
– millions of products
• Current SOA: Outsourcing to low-salary countries or use of
(counterproductive) low level quality software tools with 25% failure rates
• GoldenBullet 2 research prototype offered an exclusive "semi-automatic"
functionality to support the classification by manual intervention and to
achieve by "learning" a classification level of 95% and speed up the
process up to 60 times
• The development of the GB IC product into a marketable product will be
an innovative creation of added value and help to reduce outsourcing of
labor.
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Project intro
1. Project won ProIT funding (cooperation between transIT and CAST)
2. Duration: 1st September 2007 - 31st August 2008
3. Objectives:
• Submission of a debugged, robust and marketable GB IC Prototype
•
Extended Usability and Robustness
•
Extended Reusability
4. Completed tasks & Status:
•
Worked out contract for handling IPR between stakeholders (UIBK, Excogito
NL, BvW Global Pty)
•
Including foundation regulations for marketing and selling
•
1st report with deliverable of the technical specification accepted by CAST
and transIT
•
Cooperation with industrial partner Excogito
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Workflow Driven Approach
1. GoldenBullet semi-automatically classifies product
descriptions into a standard (e.g. UNSPSC) by employing
1. NLP techniques to preprocess descriptions (stemming)
2. Clustering methods to generate representative sub-sets of ecatalog (currently k-means)
3. Machine learning techniques to train the system and automatically
generate ranked classification options (currently Naïve Bayes)
2. The user approves or corrects the proposed
classification
3. GoldenBullet constantly learns from the user choices
and updates the classification options
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Architecture
Mapping the workflow to
functional modules:
• Seperation of concerns
• Workflow support to be
implemented in the GUI
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Architecture
Enhanced Usability and Robustness:
- Provide sort and search functions for catalogue AND classification
schema
- Multi-language GUI and contextual help-system
- Support of catalogue sizes of up to 10^6
- Action logging enables undo / redo for classification and user
workflow
- Implementation of strategies for the avoidance of over-fitting
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Architecture
Enhanced reusability:
- Software can be deployed in a Java Enterprise Edition Application
Server (e.g. Tomcat, all major vendors)
-The Java EE XML Registry is instrumented for storing and accessing
classification schema data
- Enables customer catalogue taxonomies to be stored and
exchanged over a common format.
- Documentation (SW Design, User guide, Feature list), JUnit, JavaDoc
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Conclusions and Future
1. GoldenBullet is a semi-automatic product classification
system that offers significant reduction of e-catalog
classification effort
2. GoldenBullet IC considerably improves (re-) usability
and robustness of the system
3. In future we aim at:
1. Implementation & validation of the technical
specification
2. Generation of awareness (transIT)
3. Evaluation of further (possibly new) options of
marketable exploitation
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Online Demo
- Questions so far?
- http://www.gbclass.com
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Thank you !
Further Questions?
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Backup
The following slides are provided for
the case that no internet connection is
available or the
DEMO is not reachable
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
GoldenBullet IC GUI Outline
1. Wizards
1. Data Import/Export
2. Simple and Expert Training
3. Classification
2. E-Catalog and UNSPSC Browsers
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
“CI” Style
GoldenBullet IC has an integrated GUI style and continuous
designed and brand-like Interface.
- Recognition as product
- Usability through commoly used symbols
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Data Import/Export Wizards
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
E-Catalog Browser
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Expert Training
Automatically created representative sub-catalog is provided to the user
for semi-automatic classification
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
Classification
Automatically created classification options are proposed to the user for
approval
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.
UNSPSC Browser
The Browser allows the user to locate an appropriate UNSPSC category
and manually assign it to a product description
© 2002 - 2007 STI Innsbruck & Excogito. All Rights Reserved.