From OSM-L to JAVA - BYU Data Extraction Research Group

Download Report

Transcript From OSM-L to JAVA - BYU Data Extraction Research Group

From OSM-L to JAVA

Cui Tao Yihong Ding

Overview of OSM

OSM  OSM (Object-oriented Systems Model) – Use for system analysis, specification, design, implementation, and evaluation – Structural components: object sets and relationship sets • Object set: generalization/specialization • Relationship set: n-ary relationships, cardinality constraints – Usually shown graphically

Sample OSM for Cars (Graphic Version) Year 1..* Price 1..* Make 1..* Model has 1..* has has 1..* Mileage 0..1

0..1

0..1

0..1

Car has 0..1

has 0..* 0..1

is for has 1..* 1..* PhoneNr 0..1

has Feature 1..* Extension

OSM-L and Ontology  OSM-L: A textual language for representing OSM application models.

 Ontology: A program written in OSM-L to provide the database schema, relationship sets and a knowledge base to the extractor  For each application domain, we have to write a new ontology depend on the user’s request

Car-Ads Ontology Car [->object]; Car [0..1] has Year [1..*]; Car [0..1] has Make [1..*]; Car [0...1] has Model [1..*]; Car [0..1] has Mileage [1..*]; Car [0..*] has Feature [1..*]; Car [0..1] has Price [1..*]; PhoneNr [1..*] is for Car [0..*]; PhoneNr [0..1] has Extension [1..*]; Year matches [4] constant {extract “\d{2}”; context "([^\$\d]|^)[4-9]\d,[^\d]"; substitute "^" -> "19"; }, … End;

Data Extraction

Information Exchange Source Target Information Extraction Leverage this … … to do this Schema Matching

Extracting Pertinent Information from Documents

Recognition and Extraction Car Year Make Model Mileage Price PhoneNr 0001 1989 Subaru SW $1900 (363)835-8597 0002 1998 Elandra (336)526-5444 0003 1994 HONDA ACCORD EX 100K (336)526-1081 Car Feature 0001 Auto 0001 AC 0002 Black 0002 4 door 0002 tinted windows 0002 Auto 0002 pb 0002 ps 0002 cruise 0002 am/fm 0002 cassette stero 0002 a/c 0003 Auto 0003 jade green 0003 gold

OSM • Object Set Nonlexical Lexical { object name data frame } • Relationship Set { -- connection { object set } constraint } Structure Schema Generation Interface Schema implements Table-Insertion Interface{ relational database tables } insert methods Data frame { } extraction rule context rule substitution rule keyword Matching Process Retrieved Data Database Population Interface

Parser and Symbol Table  Generate parse tree  Design the structure of symbol table

Data Extraction

Extraction Rules Defines the expecting pattern of string to extract.

Context Rules Defines the context constraint of the target pattern.

Substitution Rules Defines the substitution situation if applicable.

Keywords Defines keywords to get rid of ambiguity if it happens.

Knowledge Representation  Current knowledge base – Static – Need peripheral programs  Our predicating knowledge base – Functional – Adaptive – Object-oriented

Schema Generation Domain Attribute Relation Constraint

Schema Generation if(!existTable(“car”) createStatement(creat eTable( “createCar”); createCar =“ create table Car( ObjNr char(4) primary key, VIN char(4) unique, Make char(10), : PhoneNr char(20), );

Schema Generation if(!existTable(“Feature”)) createStatement(createTable( “createFeature”); createFeature =“ create table Feature( ObjNr char(4) primary key, Feature char(20), );

Schema Generation if(!existTable(“Extension”)) createStatement(createTable( “createExtension”); createExtension =“ create table Extension( PhoneNr char(14) primary key, Extension char(3), );

Insert Data Data Record Table: Data.attribute

Data.value

Data.objNr

 Collect all the values available for each object  Find out the position of each insert value  Insert values for each object

Populate Database