Internet Searching and Browsing in a Multilingual World

Download Report

Transcript Internet Searching and Browsing in a Multilingual World

Internet Searching and
Browsing in a Multilingual World
An Experiment on the Chinese Business
Intelligence Portal
Acknowledgment: NSF/NIJ Grant
Outline
• Motivation
• The Chinese Business Intelligence
Portal
– System Description
– Results of Usability Study
• Conclusions
2
Introduction
Motivation
• As the Internet grows in popularity worldwide,
more users want to access Web content in
their native languages
– The majority of the total global online population
(63.5%) lives in non-English-speaking areas
(Global-Reach, 2002)
– Such population is estimated to grow rapidly,
much faster than English-speaking population
• However, existing search engines may not
serve their needs, because most
technologies have been developed for
English-speaking users
4
This Presentation
• The following slides present our efforts in
creating and evaluating intelligent Web
portals that address the above needs
– The Chinese business information serves as our
research testbed
• Through the studies, we aim to achieve
better understanding of human interaction
and analysis with automated systems
developed for Internet searching and
browsing in a multilingual world
5
The Chinese Business
Intelligence Portal (CBizPort)
CBizPort
• The Chinese Business Intelligence Portal (CBizPort)
– Two versions of user interface: Simplified Chinese and
Traditional Chinese
– URLs
• Introduction: http://ai.bpa.arizona.edu/go/dl/cbizport.html
• Portal: http://ai17.bpa.arizona.edu:8080/big5biz/index.html
– Each version has the same user interface and provides the
same functions
•
•
•
•
Encoding conversion
Meta searching major Chinese information sources
Summarization, Categorization
Providing links to major Chinese business Web resources
– The following slides show the system architecture and
screen shots of CBizPort
7
8
Keywords:
Provides links to major
Chinese business Web sites
and resources
Allows input of
multiple key terms
Provides both Simplified
and Traditional Chinese
versions of user interface
Meta searches 8 major
information sources of
Mainland China, Hong
Kong, and Taiwan
9
Search Page
Result Page
Web pages grouped by key
phrases extracted by mutual
information algorithm (nonexclusive categorization)
Summarizer
A two-sentence
summary on left,
original page on
right
Categorizer
10
Evaluation of CBizPort
Objectives
1. To evaluate the performance of summarizer
as a preview function and categorizer as an
overview function
2. To compare CBizPort with regional Chinese
search engines to study its effectiveness
and usability
3. To evaluate, in comparison with existing
regional Chinese search engines, the
information quality obtained from CBizPort
and its capability of searching for crossregional business information
11
Experimental Design
• Searching and browsing were studied
• Scenario-based, culturally oriented tasks, e.g.,
– A search task (4 min): “Find two cities in mainland China
that Motorola has set up its manufacturing operations”
– A browse task (5 min): “Describe, in a number of distinct
themes, the economic impacts of removing trade barriers
between mainland China and Taiwan towards Hong Kong ”
• Theme identification method (Chen et al., 2001)
– Pilot test: 3 subjects used up all the time in most tasks 
only focused on effectiveness but not efficiency
12
10 Tasks in the Experiment (1 hour)
Subject’s Origin
Tool
Setting
CBizPort
Regional
Chinese SE
Hong Kong
Taiwan
China
Basic searching (with neither
summarizer nor analyzer)
SO1
SO2
SO3
BO1
BO2
BO3
Basic searching + with
summarizer only
SM1
SM1
SM1
BM1
BM1
BM1
Basic searching + with
categorizer only
SA1
SA1
SA1
BA1
BA1
BA1
General searching and
browsing
SG1
SG1
SG1
BG1
BG1
BG1
Cross-regional searching and
browsing
SC1
SC2
SC3
BC1
BC2
BC3
S = search task; B = browse task; O = Basic searching (with neither summarizer nor analyzer); M =
Basic searching + with summarizer only; A = Basic searching + with categorizer only; G = General
searching and browsing; C = Cross-regional searching and browsing; same number signals the same
question across different regions
(Random assignment of tasks is used for different settings)
13
Comparisons
Browse
CBizPort
Search
Compare
Compare
Openfind
With or without
categorizer
With or without
summarizer
Browse
Search
or
YahooHK
or
Sina.com
14
Subjects
• 30 subjects, 10 from each region, were recruited
– Rationale: equal influence of regional impacts
• Each subject used CBizPort and another search
tool according to his/her origin
Subject’s origin
Search tool CBizPort version
Hong Kong
YahooHK
Traditional Chinese
Taiwan
OpenFind
Traditional Chinese
Mainland China
Sina.com
Simplified Chinese
15
Experts
• Three experts, one from each region,
were recruited to provide answers to all
browse tasks
Bla bla bla
– First, the experts identify the set of relevant
answers (organized into themes) to a
browse task
– Then, they modified the answers by adding
some of subjects’ responses that they
judged as relevant
– The above two steps are repeated for all
the other browse tasks
16
Hypotheses
• Three sets of hypotheses were tested
– CBizPort’s Enhanced Analysis Capabilities
• Searching and browsing
• With or without summarizer/categorizer
– SE Performance Comparison
We tried to mimic a
• Searching and browsing capabilities situation that each
• Individual settings and combination* subject was allowed to
use both CBizPort and
– Users’ Subjective Evaluation
benchmark search
• Information quality
engine together to solve
• cross-regional searching capability
the same problem
• overall satisfaction
– Auxiliary hypotheses: Performance of the three
regions are not significantly different
17
Benchmark SE
CBizPort
Experts’ answers
18
19
Performance Measures
• Accuracy = Percentage of correct answers
• Precision = number of correct themes identified by
users / total number of themes identified by users
• Recall = number of correct themes identified by users /
total number of themes identified by an expert
• F value = 2*Recall*Precision / (Precision + Recall)
• Information quality: accessibility, appropriateness of
amount, believability, completeness, …, etc. (Wang &
Strong, 2002)
• Subjective evaluation: cross-regional searching
capability, overall satisfaction, protocol analysis, posthoc test (to study whether the three SEs yield
significantly different results)
20
Accuracy of search tasks
Accuracy
70.00%
65.00%
60.00%
50.00%
40.00%
40.00%
36.67%
35.00%
28.33%
30.00%
25.00%
20.00%
10.00%
0.00%
CBiz
CBiz+Summ
CBiz+Categ
Bench (gen)
Bench (cross)
Combined
21
Precision of browse tasks
Precision
90.00%
76.80%
80.00%
66.37%
70.00%
60.00%
58.65%
51.05%
53.33%
55.67%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
CBiz
CBiz+Summ
CBiz+Categ
Bench (gen)
Bench (cross)
Combined
22
Recall of browse tasks
Recall
50.00%
43.08%
45.00%
40.00%
35.00%
30.00%
25.00%
26.06%
26.56%
22.86%
25.78%
21.83%
20.00%
15.00%
10.00%
5.00%
0.00%
CBiz
CBiz+Summ
CBiz+Categ
Bench (gen)
Bench (cross)
Combined
23
F value of browse tasks
Mean F value
60.00%
52.32%
50.00%
40.00%
31.39%
33.10%
34.31%
32.60%
29.02%
30.00%
20.00%
10.00%
0.00%
CBiz
CBiz+Summ
CBiz+Categ
Bench (gen)
Bench (cross)
Combined
24
Information Quality
Inform ation Quality
4.60
4.55
4.55
4.49
4.50
4.47
4.45
4.40
4.39
4.40
4.37
4.35
4.35
4.30
4.28
4.25
4.20
4.15
4.10
CBiz(Present)
CBiz(Coverage)
CBiz(Usability)
CBiz(all)
Bench(Present)
Bench(Cov)
Bench(Usab)
Bench(all)
25
Users’ Subjective Evaluation
Users' Subjective Evaluation
4.50
4.45
4.40
4.37
4.30
4.20
4.14
4.10
4.03
4.00
3.90
3.80
CBiz(cross)
CBiz(satis)
Bench(cross)
Bench(satis)
26
Subjects’ Verbal Comments
• Subjects liked summarizer and categorizer
– Subj.#15: “… good performance in summarization
and categorization, more focused results can be
found”; #26: “… very handy”; #6: “…useful tools to
enhance the searching ability” (11 subjects)
• CBizPort provides a wide coverage and
variety of searching options
– Subj.#2: “… Yahoo Search Engine is more limited
when search certain term in a specific region …
While CBizport can fulfill what Yahoo couldn’t
do.”; #4: “… more search engines to choose from”
(4 subjects)
27
Subjects’ Verbal Comments (2)
• Subjects are familiar with benchmark SEs
– Subj#27: “I am familiar with the format of Openfind.
So that's the reason that I am more satisfied with it
than CBizPort.”; (4 subjects)
• Benchmark SEs are not good at crossregional information searching
– Subj#15: “Sina gives many results but they are not
focused, and is poor at searching HK and Taiwan
results”; #5: “provide more accurate regional searching”
• CBizPort is user friendly but slow
– #3: “Yahoo not as precise as CBizPort”; #28: “…
easier to search” (7 subjects); “slow” (3 subjects)
28
Conclusions
• CBizPort’s summarizer and categorizer provide helpful
analysis capabilities for users’ search and browse tasks
– CBizPort’s searching and browsing performance is comparable to
that of regional Chinese search engines
• CBizPort can significantly augment the searching and
browsing ability of regional Chinese search engines, thus
improving human integration of regional information and
analysis
– Information quality, cross-regional searching capability and overall
satisfaction of CBizPort are comparable to those of regional
Chinese search engines
• CBizPort is better than regional Chinese search engines in
terms of analysis functions, cross-regional searching
capabilities and user-friendliness
29