PDSC: P2P Document Sharing Community

Download Report

Transcript PDSC: P2P Document Sharing Community

PDSC: P2P Document
Sharing Community
Team No. 4
R91922001 黃振修 PM
R91922020 羅婉琪 RD
B89902012 葉家齊 RD
R91725032 李宜儒 RD
R91922015 張燕君 QA
R91922028 張靜雯 QA
Introduction
The original idea comes from research groups
such as CML Laboratory of NTU.
People want to share their document over Internet
and need the functionality of keyword search.
Thus we need a peer-to-peer mechanism for
document exchange to achieve the goal of
knowledge management. And we also need full
text search to find/filter the sharing resources
before downloading.
Features
Peer-to-peer document sharing over Internet.
Full text keyword searching / search result
ranking within community.
Direct document exchange by sending to
and downloading from others.
We developed our own URL format
Ex: dsc://download/hostname/path/to/file
Market Requirement
A simple application can be installed to connect to
the community.
Entering/leaving the community at any time.
Sharing documents with each other.
The sharing resources must keep up to date.
Easy to see what's on the community.
User can enter keywords to search the community
for documents.
User can direct send files with each other.
Project Roadmap
Version 1.0
Basic functionality:
Version 2.0
Duplication multi-copies in community
Provide central backup mechanism
Version 3.0
User management/authentication
User acknowledge of document exchange
More document formats will be supported in the
future
Stage Goals
Stage 1: Community browsing
Stage 2: Search functionality
Stage 3: Download/send file functionality
Schedule Notes
5/3: 黃振修 should finish the document digest module
5/10: 葉家齊 should finish the architecture prototype and
server side protocol communication
5/12: 羅婉琪 should finish the client browsing
functionality
5/10: QA finishes doc conversion testing (binary/code)
5/10: 李宜儒 should finish Win32 file hook mechanism
5/13: Download/send file should be OK
5/24: Document search QA finishes testing
5/24: The search result should be OK
5/28 ~ 6/4: Code freeze and final testing
Project Meetings
Two types of meetings are defined:
[PRJ]: Project meeting
[DEV]: Developing meeting
Meeting dates:
[PRJ] 4/15, Tue. R319 of CSIE building
[DEV] 4/23, Wed. R505 of CSIE building
After 5/6, no formal meeting is held until
[DEV] 4/28, Mon, R505 of CSIE building
the final. Instead, several small meetings
[DEV] 4/29, Tue. R107 of CSIE building
are 4/29,
held in
QAs
andof
RDs;
sometimes
[DEV]
Tue.
R105
CSIE
building
PM also
cooperate.
[DEV]
5/6,calls
Tue.RDs
R519and
of QAs
CSIEtobuilding
[PRJ] 6/9, Mon. R503 of CSIE building
Documentation
MRD: Market requirement Document [PM]
PRD: Project Requirement Document [PM]
PED: Project Execution Document [PM]
PDD: Project Development Document [RD]
QAD: Quality Assurance Document [QA]
BTD: Bug Tracking Document [QA]
WDD: Working Discussion Document [PM]
User’s Manual [QA]
Development Tools
Microsoft VC++ 6.0
Borland C++ Builder
CVS for source control
Central FTP server for file exchange
Install Shield for SETUP program
Architecture
Kernel
Graphics
User
Interface
Protocol
API for GUI
Client
Host
Lookup
Thread
Server
Server
Thread
Document
Keyword
Processor
Database
Local
Shared
File
Database
Host
Database
Task
Database
See PDD for more detail
Technical Notes (1/2)
Pure peer-to-peer mechanism is implemented.
Each application embeds both the client and server.
(for the efficiency reason)
When search request issued, the application will
search its own document collection and then
forward the message to other computers
Dynamically monitoring of the sharing folder.
Once the documents in the sharing folder are
modified, the digest module will re-digest it realtime; keeping the latest information toward the
community.
See PDD for more detail
Technical Notes (2/2)
Support three main document formats: MS Word,
MS PowerPoint, and PDF files. (No Chinese
support)
Digest is the technique used to extract document’s
feature vector. Searching is based on those digest
vectors.
An algorithm is developed to rate the searching
and the result is ranked according to the points.
Digest for the sharing documents are saved once
exiting the program; only first time initialization is
needed.
Demonstration
See QAD for more detail
Testing Plans
What is to be tested?
Platform
Network status
Command
File Conversion
Download/Upload
Where is going to be tested?
Win32 environment, Windows 2000 OS
PIII 500 CPU, 256 MB RAM, 100 Mbps ethernet
See QAD for more detail
Testing Cases
Document format conversion (binary tools testing)
Document format conversion (integrated as
program module, test for robustness and accuracy)
P2P sharing community (test for the feature
functionalities for UI program)
The sharing module (test for the digest/searching
and sharing folder monitoring)
Setup program (test for the installer’s functionality)
Performance report (memory usage, CPU
utilization, memory leak)
See BTD for more detail
Bug Tracking (1/2)
Empty document files may cause fatal error
Solved by check file completeness first.
Some PDF file may cause the conversion module
to get the wrong page number, causing fatal error.
Check the validity of page number first.
Duplication list when browsing
Stupid bug
Get file list waits too long
Stupid bug
See BTD for more detail
Bug Tracking (2/2)
Download/sending file too slow
Stupid bug (sleeping in the sending loop)
Can not get file list/browsing when clients using
DHCP
Not solved because of the time limit.
Keyword search in sharing folder do not
recursively applied
Solved by writing the recursive code
Keyword search is too slow
Improve the algorithm
Bug Statistics
8
6
4
2
0
1
2
3
4
5
6
5/3
5/7
5/10
5/24
6/8
6/9
Change Control History
Change from client-server architecture to
peer-to-peer architecture [4/23]
Change the document digest from full-text
to digest vector based. [5/6]
Decide to allow recursively sharing in
sharing folder [6/1]
Future Plan
Version 2.0
Duplication multi-copies in community
Provide central backup mechanism
Version 3.0
User management/authentication
User acknowledge of document exchange
Bug fix and support for more document
formats
The END
Project Shipping Checklist:
Source Code
Include all surveyed components, CVS repository.
Development Document
MRD, PRD, PED, PDD, QAD, BTD, and WDD
User’s Manual
Presentation file
Install Program
Project CD with all the stuff