XML: An Overview

Download Report

Transcript XML: An Overview

Random Thought on
Research Methods
in CS/CIS
CSCI 6530
July 1, 2010
Kwok-Bun Yue
University of Houston-Clear Lake
1
Random
• Random: not organized.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 2
Merriam-Webster
• Research
– 1 : careful or diligent search
– 2 : studious inquiry or examination;
especially : investigation or experimentation
aimed at the discovery and interpretation of
facts, revision of accepted theories or laws in
the light of new facts, or practical application
of such new or revised theories or laws
– 3 : the collecting of information about a
particular subject
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 3
For what?
• Finding new things: facts, theories,
processes, tools, relationships,
techniques.
• Solving problems
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 4
Why Research?
•
•
•
•
•
Solving problems.
Enhancing understanding.
Career enhancement.
Curiosity and fun.
…
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 5
Research Methods
• Discipline dependent.
– E.g. medical research: double blind test
with control.
• Scientific methods.
• Empirical methods.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 6
Starting Research
• What do you need to start your
research?
– Talk! Talk! Talk!
– Think! Think! Think!
– Read! Read! Read!
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 7
Asking Questions
• ASK! ASK! ASK!
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 8
Not Asking Questions
•
•
•
•
Easy
Comfortable
Familiar
…
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 9
Asking is crucial
• Get a context of the problem from many
angles.
• Organize your thought.
• Model and refine your understanding.
• Discover new information and insight.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 10
Intellectual Curiosity
• A key for deep understanding, important
discovery and … fun.
• Sometimes not too output driven: need
of ‘down’ time.
• Recommended reading: Surely
You're Joking, Mr. Feynman!
(Adventures of a Curious
Character) by Richard Feynman.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 11
Keeping an open mind
• Keep an open mind as long as possible.
– Do not jump to the first solution that you
have come up with.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 12
Research in Physics
• Scientific Methods:
1. Observe, ask questions and understand
2. Make hypothesis and model
3. Make (precise) predictions using the
hypothesis.
4. Test the predictions.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 13
Questions in Physics
• Fundamental questions: e.g.
– Can the four fundamental forces be unified:
theory of everything?
– Where do our universe come back?
– What are elementary particles make of?
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 14
Results in Physics
• Theories: e.g.
– Superstring theory.
– Big bang theory
– Quarks
• New facts.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 15
Validations in Physics
• Experiment with predictions by theories.
• E.g.: Big bang theory predicts
abundance of light elements.
– Positive results: add confidence.
– Negative results: reject theory.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 16
Questions in Computing
• Much more diverse. Have aspects from
most other areas: engineering, science,
humanities, …
• Can create your own ‘universe’. (vs
economic, for example)
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 17
Result in CS
• New theories, algorithms, processes,
methods, facts, etc.
• New models, problems and application
areas.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 18
Validations
•
•
•
•
•
•
Direct validation
Theoretical analysis
Simulation
Benchmarking
Statistical methods
…
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 19
Planning: Goals
• Output oriented incentives can be too
‘far away’.
• Setting plans and goals.
– Create a detail plan of steps and
benchmarks.
– Small goals every step.
– Consider input-oriented goals.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 20
Early Web Business Model
Build
Websites
Attract
Huge Traffic
Something
happens
Rich!
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 21
Thesis
Understand
Problem
Design and
Implement
Solution
Good thing
happens
Done!
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 22
Detailed Plan
• Create a road map with enough details
to the final goals.
– Preparation.
– Planning
– Risk Management
• Recommended reading: Ed Viesturs,
“No Shortcuts to the Top: Climbing the
World's 14 Highest Peaks”
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 23
Areas of My Research Interest
•
•
•
•
Internet Computing
XML and semi-structured data
CS and IS education
Concurrent Programming
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 24
(Older) XML Projects
• Storage of XML in relational database
(Used as an example)
• XML Metrics
10/5/2005
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 25
Storing XML in RDB
• Advantages:
– Mature database technologies.
– May be queried by
• XML technology: e.g. XPath, XQuery.
• RDB technology: e.g. SQL.
• Disadvantages:
– impedance mismatch: XML and relations
are different data models.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 26
Related Issues
• Effective mapping XML DTDs (~
ordered tree model) to relational
schemas.
• Mapping of XML queries (e.g. XQuery)
to RDB queries (e.g. SQL).
• Mapping of RDB query results back to
XML format.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 27
Related Work and Context
• Mapping
– With or without schemas for XML.
– With or without user input.
• Schemas for XML:
– Document Type Definition (DTD)
– XML Schema
• We consider mapping with DTD and
without user input.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 28
Naïve Mapping
• An XML element is mapped to a
relation.
Example 1a:
XML:
<a><b><c><d>hello</d></c></b></a>
-> Relations: a, b, c and d.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 29
Problems of Naïve Mapping
• Many relations.
• Ineffective queries: multiple query joins.
Example 1b:
XPath Query: //a
SQL Query: need to join the relations a, b,
c and d.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 30
Inlining Algorithms
• First proposed by Shanmugasundaram,
et. al.
• Expanded by Lu, Lee, Chu and others.
• Extended in various directions by
various researchers, e.g.,
– Preserving XML element orders.
– Preserving XML constraints.
• Do not consider extensions here.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 31
Basic Idea of Inlining
Algorithms
• Inline child element into the relation for
the parent element when appropriate.
• Different inlining algorithms differ in
inlining criteria.
Example 1c: XML:
<a><b><c><d>hello</d></c></b></a>
Inlined Relation: a.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 32
Inlining Algorithms
• Child elements & attributes may be
inlined.
• Child elements may not have their own
relations.
• Results in less number of relations.
• In general, more inlining -> less joins.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 33
Inlining Algorithm Structure
1.
2.
3.
Simplification of DTD.
Generation of DTD graphs
Generation of Relational Schemas
10/5/2005
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 34
Our work
• Improved on simplification of DTD and
generation of DTD graphs.
• Constructed a new aggressive inlining
algorithm.
• Student: Alakappan.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 35
Internet Computing
• Web bias (older project)
• Web 2.0 framework (IS project)
• Content Management Software (CMS):
Joomla (CS/IS Education)
• Mashup: Yahoo Pipe (CS/IS Education)
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 36
Measuring Web Bias
• Search engines dominate how information
are accessed.
• Search results have major social, political and
commercial consequences.
• Are search engines biased?
• How bias are them?
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 37
Previous Works
• To measure bias, results should be compared
to a norm.
• The norm may be from human experts.
• Mowshowitz and Kawaguchi: the
average search result of a collection of
popular search engines as the norm.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 38
Mowshowitz and Kawaguchi
union
SE1
SEn
7/1/2010
URLS1
NORM
URLS
NORM
URL
Vector
URL
Vector1
Bias1
URL
Vectorn
Biasn
URLSn
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 39
Limitations
• Based on URL Vector -> cannot
measure bias quality.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 40
Our Approach
• Use Kleinberg’s HITS algorithm to create
clusters, authorities and hubs of the result
norm URLs.
• Use them as norm clusters, authorities and
hubs.
• Measure distances between norms and
individual results as bias.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 41
Our Approach
union
SE1
SEn
10/5/2005
URLS1
NORM
URLS
NORM
Cluster
NORM
Cluster
Vector
URL
Vector1
Cluster
Vector1
Bias1
URL
Vectorn
Cluster
Vectorn
Biasn
URLSn
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 42
Recent Projects
• Web 2.0 framework:
– A model and framework to study Web 2.0
technologies, implications and trends.
– Collaborator: Mr. Tracy Gate.
– Publications: Pre-ICIS Workshop and
Communications of AIS.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 43
CMS: Joomla
• Question: Using CMS/Joomla for
capstone project.
• Methodology: projects and surveys.
• Collaborator:
– Capstone project teams.
– Industrial mentor: Dilhar DeSilva
• Publication: Journal of Information
Systems Education.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 44
End User Programming
• Use of Yahoo/Pipeline in constructing
Web Mashup.
• Methodology: projects and surveys.
• Collaborators: students in the XML
class in Summer 2009.
• Publication: Journal of Information
Systems Education.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 45
Ongoing projects
• Googlewave as
communications/collaboration tools in
capstone projects and software project
management.
• Collaborators: capstone project
students.
• Publications: under preparation.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 46
Open Source Software
• Use of OSS in educational institutes.
• Methodology: meta-analysis.
• Collaborators: two master students.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 47
Other recent projects
• Assessment
• Scholarship
• Student Response Systems
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 48
Interested?
• Come and talk with me.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 49
Conclusions
• Good time to do applied computing
research in the Web, XML and other
areas.
• Style: hands-on supervision +
publications.
• Don't forget to donate a scholarship to
the School if your future research leads
to a windfall.
7/1/2010
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 50
Questions?
• Any Questions?
• Thanks!
10/5/2005
Bun Yue: [email protected], http://dcm.uhcl.edu/yue
slide 51