JCDL Presentation

Download Report

Transcript JCDL Presentation

Research 101 (or How
to Graduate Quickly)?
IST 501
Fall 2014
Dongwon Lee, Ph.D.
Justification

I am probably a qualified person to give a talk
on this topic… because





I’m still STRUGGLING to publish
Yes, I still do get rejections 
I’m still learning from failures
What’s being presented here is purely my
suggestion (+ some other colleagues)
Take it or leave it – up to you !!
IST 501
2
TOC






How to start?
How to find research problems?
How to write research papers?
How/Where to submit?
Ethics
Misc.
IST 501
3
What is “Ph.D.”?
http://gizmodo.com/5613794/what-is-exactly-a-doctorate
IST 501
4
Publish or Perish

has to be written first

has to validated as novel

has to be published

To get a good job, have to have
many & good papers…
IST 501
5
The Goal of Research Papers


Disseminate your ideas to others so that
people appreciate/use/cite them
Graduate… Of course



Without good publications…



MS: need to write a thesis to graduate…
Ph.D.: “Publish or Perish”
No good job, no good career
And possibly no good life either
GPA: nobody cares about PhD’s GPA

Maintain about 2.0/4.0
IST 501
6
Where to Start?

Given that you have acquired


Next, first thing to learn:



basic theory/knowledge/tools from classes and
books…
Read others’ papers
Critique and evaluate them
Which paper to read?

Start from good ones
Classical ones
Ones from good journals or conferences
IST 501
7
Where to Start: eg, Databases
DB Conferences/Symposiums/Workshops (81)
ADB, ADBIS, ADBT, ADC, ARTDB, Berkeley Workshop, BNCOD, CDB, CIDR, CIKM,
CISM, CISMOD, COMAD, COODBSE, CoopIS, DAISD, DANTE, DASFAA, DaWaK,
DBPL, DBSEC, DDB, DDW, DEXA, DIWeB, DMDW, DMKD, DNIS, DOLAP, DOOD,
DPDS, DS, EDBT, EDS, EFIS/EFDBS, ER, EWDW, FODO, FoIKS, FQAS, Future
Databases, GIS, HPTS, IADT, ICDE, ICDM, ICDT, ICOD, IDA, IDC(W), IDEAL,
IDEAS, IDS, IGIS, IWDM, IW-MMDBMS, JCDKB, KDD, KR, KRDB, LID, MDA/MDM,
MFDBS, MLDM, MSS, NLDB, OODBS, OOIS, PAKDD, PKDD, PODS, RIDE, RIDS,
RTDB, SBBD, SDM-SIAM, Semantics in Databases, SIGMOD, SSD, SSDBM, SWDB,
TDB, TSDM, UIDIS, VDB, VLDB, WebDB, WIDM, WISE, XP, XSym
DB Journals (19)
ACM TODS, ACM TOIS, DKE, Data Base, DMKD, DPD, IEEE Data Eng. Bulletin,
IEEE TKDE, Info. Processing and Management, Info. Processing Letters, Info.
Sciences, Info. Systems, J. of Cooperative Info. Systems, J. of Database
Management, JIIS, KAIS, SIGKDD Explorations, SIGMOD Record, VLDB J.
The list excludes Information Retrieval and Digital Library
IST 501
8
Where to Start?

Some good ones:








DB: SIGMOD, VLDB,
ICDE, EDBT
DB Theory: PODS, ICDT
Data Mining: KDD, ICDM,
SDM, PKDD
Modeling: ER
Information Retrieval:
SIGIR, CIKM, ECIR
Digital Library: JCDL,
ECDL, CIKM
Web: WWW, WSDM
Social: ICWSM, WebSci
IST 501
9
Reference Chase

Don’t trap into the “Exponential Reference
Chase” problem
Papers to read
in queue
References that you use
IST 501
10
Symptoms

After chasing relevant works that are
increasing super-exponentially fast, you
might feel…

All relevant problems are ALREADY studied by
someone else
Others have 1000+ history: Mathematics, Art, …

Problem is too BROAD for me to tackle
Divide-n-conquer
IST 501
11
TOC






How to start?
How to find research problems?
How to write research papers?
How/Where to submit?
Ethics
Misc.
IST 501
12
Finding DARN Research Problems?

Easy but non-helpful answer:


Read and think and read and think and…
Subjective but MAYBE-helpful answer




MAP approach
MATRIX approach
DELTA approach
DROP approach
What I Call M2D2
IST 501
13
1. MAP Approach


To start a research, initially, you have to read a lot of
papers anyway
While reading those, why don’t you analyze and
summarize what you’ve read and put them into your
own wording?



Good for a survey paper – a MAP for future readers
To be publishable, your survey must have novel
view-point, taxonomy, comprehensive analysis, or
all of them
Good target: ACM Comp. Survey, ACM C.ACM,
IEEE Computer, JASIST, …
IST 501
14
2. MATRIX Approach




Now, You have read a lot of papers
Draw a MATRIX on a specific problem, and
map the paper that you red to cells of matrix
At the end, non-filled cell is the missing work
that no one has done
But wait… first make sure that:

The hole is worthwhile to fill in
Doable (good as my dissertation topic?)
Value (what’s good?)
IST 501
15
Eg, XML-Relational Conversion Problem
Around 2000
Schem
a
XML

Relati
onal
Relati
onal

XML
Const
raint
O
O
(40+) (5+)
O
O
Query
View
O
O
O
O
Trigger Securit
s
y
O
IST 501
Top-K
Tempo
ral
Spatial
O
16
3. DELTA Approach

Arguably easiest…





The limitation that you pointed out is valid
Your suggestion improved the problem by DELA
The more well-known work you choose, the harder
to improve, but the better for your reputation…


Pick one paper of your interest
Read a lot – more than 10 times
Find limitations and Extend it by DELTA
Prove or demonstrate that
Eg, “E.F. Codd’s relational model is insufficient to handle
semi-structured model because…”
The bigger the DELTA is, the better your paper
gets
IST 501
17
Eg, The optimal wedding problem

When a person has a chance to date K
persons, the optimal wedding algorithm is:

Date upto K/3 persons
Let the best person among K/3 as B using a criteria C


Start dating again from K/3+1 person, p
If p is better than B using C
Stop and Marry p


Otherwise, keep dating till K-th person
How many ways can we improve this algo?
IST 501
18
Possible DELTAs

Parameters fitting:




Scalability? K=10 vs. K=100,00? Sub-optimal?
Question the assumptions:





How to determine K? Estimate?
How to determine C? Comparison?
Monogamy vs. Polygamy vs. N-gamy? (How to find nth best
spouse fast?)
Data distribution? Uniform/Poisson/Scale-free
Application to another domain?
System building?
…
IST 501
19
Which DELTA to Choose


Pick the DELTA that is the most significant
Some criteria are:

Have practical values
Has motivational scenario as of NOW, or
Predicted to be useful in N years


Non-trivial
Hot topics:
Streaming, XML, Sensor, …
IST 501
20
4. DROP Approach
(adopted from J. Widom’s slides)

Pick a simple but fundamental assumption of
existing theory/model/systems/methods


DROP it
Reconsider to see how the drop affects all
aspects of the existing
theory/model/systems/methods

Many Ph.D. theses
From http://www-db.stanford.edu/~widom/stream.ppt
IST 501
21
Eg, Two Stanford DB Projects

The LORE Project



Dropped assumption:
“Data has a fixed schema declared in advance”
Semi-structured data (→ XML)
The STREAM Project


Dropped assumption:
“First load data, then index it, then run queries”
Continuous data streams (+ continuous queries)
From http://www-db.stanford.edu/~widom/stream.ppt
IST 501
22
TOC






How to start?
How to find research problems?
How to write research papers?
How/Where to submit?
Ethics
Misc.
IST 501
23
Facts on Paper Reviews
(adopted from J. Cho’s slides)


3-4 reviewers per paper
10-20% acceptance rate for top-tier venues


Criteria




Very competitive
Accept/Weak Accept
Neutral
Weak Reject/Reject
One reject kills a paper

At least Accept, Weak Accept and Neutral
IST 501
24
About Reviewers


10-15 papers per reviewer (for top conferences)
Reviewer cannot spend 5-10 hours per paper



Give a good impression in 1-2 hours!



20 X 10 = 200 hours = (40 hours X 5) = 5 weeks!
No reviewers can afford this
Impression matters the most
Content comes next!
Reviewer do NOT get paid  no motivation to do
extra work
WARNING: Of course, to start with, your main idea
must be good to get into top-tier…
IST 501
25
Good Impression in 1-2 hours?
1. Good introduction


Everyone reads it
If not interesting, people stop reading
Easy to read
2.


People should understand what you say
Easy to confuse, difficult to understand
Build an excitement and a strong case
3.

What is good?
Broad reference
4.


Sometimes kills a paper
Program committee members
IST 501
26
Good Introduction Sells
Excerpt from “How to do good research, get it published” by Eamonn Keogh
IST 501
27
Good Introduction Sells
Excerpt from “How to do good research, get it published” by Eamonn Keogh
IST 501
28
Good Introduction Sells
Excerpt from “How to do good research, get it published” by Eamonn Keogh
IST 501
29
How to Write an Introduction
Start with 5 bullets
1.



What’s the problem?
Why is it interesting?
…
1-2 sentence answer to each question
Add more content
Spend enough time on introduction
2.
3.
4.
1.
Bullet points enough
IST 501
30
Easy-to-Read Paper
You can always make it complicate later
Lots of examples
Figures & Tables – Figure speaks !!

1.
2.

Summary of notations
Define assumptions/models/architecture
precisely
3.


Explicitly write down assumptions
Input, output, property, goal function
Make a connection
4.

Why this experiment?
IST 501
31
Paper Organization (10 pages)
1.
2.
3.
4.
5.
6.
7.
Introduction (2 pages)
Related Work (half page)
Framework (2 pages)
Main Ideas (3 pages)
Experiments (2 pages)
Conclusion (half page)
References (half page)
Actual idea – only 3 pages!!!


Even tiny idea can turn into a good paper if you
DEVELOP it well
IST 501
32
Short Main Idea

Watson &
Crick’s
Nature paper
on double
helical
structure of
DNA is only
1 page (+ 1
paragraph)
long
IST 501
33
Start Writing Early On…

Even if you feel you are NOT ready yet



Your advisor will throw away your initial draft
anyway
Your initial submission will be rejected anyway
But you get:



(good or bad) Experiences and learn from that
Writing sharpens your ideas and gives more ideas
Writing can be improved only via writing
IST 501
34
TOC






How to start?
How to find research problems?
How to write research papers?
How/Where to submit?
Ethics
Misc.
IST 501
35
Where to Submit?

Top-down




Aim at the best venue in the field
If rejected, go to next-tier venue
If rejected, go to next…
Bottom-up



Aim at workshop
If accepted, work more and aim at better one (symposium
or 2nd-tier conference)
After making sure that the ideas mature enough, aim at the
best conference or journal
IST 501
36
How to assess venue quality?


What venues are reputable?
What can be said about questionable ones?
http://pdos.csail.mit.edu/scigen/
IST 501
37
Avoid Known Fake Venues

From http://www.inesc-id.pt/~aml/trash.html (no longer avail)















IMCSE: International Multiconference in Computer Science and Computer
Engineering
WMSCI or SCI: World Multiconference on Systemics, Cybernetics and
Informatics
ICCCT: International Conference on Computing, Communications and Control
Technologies
PISTA: Conference on Politics and Information Systems: Technologies and
Applications
SSCCII: Symposium of Santa Caterina on Challanges in the Internet and
Interdisciplinary Research
CITSA: International Conference on Cybernetics and Information Technologies,
Systems and Applications
ISAS: International Conference on Information Systems Analysis and Synthesis
CISCI: Conferencia Iberoamericana en Sistemas, Cibernética e Informática
SIECI: Simposium Iberoamericano de Educación, Cibernética e Informática
WCAC: World Congress in Applied Computing
Any IPSI International Conference or journal
Any GESTS international conference or journal
KCPR: International Conference on Knowledge Communication and Peer
Reviewing
International e-Conference on Computer Science
…
IST 501
38
Differences in Disciplines

Computer Science





Pure Sciences (eg, Math, Physics)




Peer-reviewed conferences
Top conferences have 5-15% acceptance rate
Specialized and small conferences (attendance of 500+)
Often value conferences > journals
Pre-print at Arxiv.org
Rigorous reviews for journals
Huge flagship conference (ICM 98 attracted ~4000)
Social Sciences



Often value journals > conferences
Conferences are mostly for gathering or short abstract
based screening
Rigorous reviews for journals39
IST 501
39
My Own Endeavor



Oracle, Where Shall I Submit My Papers?, Ergin Elmacioglu,
Dongwon Lee, In ACM Comm. of the ACM (CACM), Vol. 52, No. 2,
page 115-118, February 2009
Measuring Conference Quality by Mining Program Committee
Characteristics, Ziming Zhuang, Ergin Elmacioglu, Dongwon Lee, C.
Lee Giles, In ACM/IEEE Joint Conf. on Digital Libraries (JCDL),
page 225-234, Vancouver, BC, Canada, June 2007
Studied a data mining technique to detect fake
conferences


Precision=93%, Recall=96%
Used PC (Program Committee) as the main
feature
IST 501
40
Classification with Decision Tree
PC has feature A?
q
p
Yes
PC has feature B?
No
q
p
Reputable venue
PC has feature C?
q
p
Disreputable venue
PC has feature D?
q
p
Reputable ?
training set
IST 501
41
PSU’s MIT Emulation

Apr. 10, 2006, we
generated 3 bogus
papers using MIT
SCIgen software:



P1
P1 by Ethan Patel
P2 by Simon R.
Hathaway
P3 by Richard Zhang
P2
IST 501
42
Measuring Paper Authenticity

Indiana’s
Inauthentic
Paper Detector
says:



P1: 28.9% =>
inauthentic
P2: 61.5% =>
authentic
P3: 38% =>
inauthentic
IST 501
43
Results of Our Experiment
Conference A and B
 April 24 – May 1, 2006



P1 was submitted to Conf A on April 24
P2 was submitted to Conf B on April 26
P3 was submitted to Conf A on May 1
May 15, 2006




P1 and P2 accepted w/o reviews
P3 rejected w/o reviews
Asked for reviews or any rationale  no
response
IST 501
44
“Ethan Patel” made it !
IST 501
45
TOC






How to start?
How to find research problems?
How to write research papers?
How/Where to submit?
Ethics
Misc.
IST 501
46
Fabrication

From Wikipedia… “Fabrication, in the
context of scientific inquiry and academic
research, refers to the act of intentionally
falsifying research results, such as reported
in a journal article. Fabrication is considered
a form of scientific misconduct, and is
regarded as highly unethical. In some
jurisdictions, fabrication may be illegal…”
http://en.wikipedia.org/wiki/Fabrication_%28science%29
IST 501
47
Plagiarism

From Wikipedia “… According to Diana
Hacker, "Three acts are plagiarism: (1) failing
to cite quotations and borrowed ideas, (2)
failing to enclose borrowed language in
quotation marks and (3) failing to put
summaries and paraphrases in your own
words…"
http://en.wikipedia.org/wiki/Plagiarism
IST 501
48
Eg, Fabrication and Plagiarism

“Prominent Physicist Fired for Faking Data
Research: Bell Labs says scientist 'recklessly'
misrepresented work on microprocessors…” (2002,
LA Times)


“Constantinos V. Papadopoulos got caught
plagiarism at EUROPAR (1995)… 7 papers
published and 8 under submission… all plagiarized
from Technical Reports…”


http://www.latimes.com/news/science/la-sciphysicist26sep26.story
http://www.sics.se/europar95/plagiarism.html
NEVER, EVER, do these – professional suicide !!
IST 501
49
TOC






How to start?
How to find research problems?
How to write research papers?
How/Where to submit?
Ethics
Misc.
IST 501
50
Personal Research Log

Maintain personal research log




Sketch your research ideas into a writing
Update your ideas as time passes
Occasionally go back to old writings
Prepare a short review for each paper that you read




Summary
Pros and cons
Limitations or problems
If needed, contact authors and ask questions
Usually authors are willing to discuss with their readers
IST 501
51
Professional Society



Be a member of your
professional society
early on
Ask your advisor to
support membership
Use the mentor
program of societies
IST 501
52
References (available at)
http://pike.psu.edu/resources/advice/










[2002] How to write a paper?, Junghoo Cho, UCLA
[1996] David Dill's Advice on Choosing an Advisor (or) How to
Survive as a Grad Student, David Dill
[1996] How to Survive as a Graduate Student, Brian Noble, David
Dill, Benli Pierce, Jay Sipelstein, Jonathan Shewchuck
[1997] How to Choose a Thesis Advisor, Michael C. Loui
[????] How to have your abstract rejected, Mary-Claire van Leunen
and Richard Lipton
[1994] Dissertation Advice, Olin Shivers
[1999] Advice for Finishing that Damn Ph.D., Daniel M. Berry
[1999] So long, and thanks for the Ph.D.!, Ronald T. Azuma
[2001] How to Have a Bad Research Career, David A. Patterson
[2012] How to do good research, get it published, Eamonn Keogh,
SIAM SDM Tutorial
IST 501
53