Recent Developments at Yahoo! in Search & Mobile, and Future Challenges WikiMania’08, July 18, 2008 Alexandria, Egypt Usama Fayyad Chief Data Officer & Executive VP Yahoo!

Download Report

Transcript Recent Developments at Yahoo! in Search & Mobile, and Future Challenges WikiMania’08, July 18, 2008 Alexandria, Egypt Usama Fayyad Chief Data Officer & Executive VP Yahoo!

Recent Developments at Yahoo!
in Search & Mobile, and Future Challenges
WikiMania’08, July 18, 2008
Alexandria, Egypt
Usama Fayyad
Chief Data Officer & Executive VP
Yahoo! Inc.
[email protected]
Research
0
1
Overview
• About Yahoo! and its business
• Yahoo! Mobile Philosophy
• OneSearch 2.0
• Challenges in Mobile Search
• Some words about search advertising
• Examples of Search Evolution at Yahoo!
• Concrete examples of the changes that are
relevant to Social Web
• Concluding thoughts
Research
2
Globally, Internet Users Number Over 1 Billion
Internet Users in Millions:
Worldwide Total
Japan
Rest of World
Western Europe
Asia/Pacific
United States
1,200.0
1,000.0
1,076.7
974.3
868.3
787.5
800.0
600.0
702.4
602.4
506.3
400.0
200.0
0.0
2001
Research
Source: IDC, December 2003.
2002
2003
2004
2005
2006
2007
3
Yahoo! is the #1 Destination on the
Web
More people visited
Yahoo! in the past
month than:
73% of the U.S. Internet population uses Yahoo!
– Over 500 million users per month globally!
•
Global network of content, commerce, media, search
and access products
•
100+ properties including mail, TV, news, shopping,
finance, autos, travel, games, movies, health, etc.
•
25 terabytes of data collected each day… and
growing
•
•
•
•
•
•
•
Use coupons
Vote
Recycle
Exercise regularly
Have children
living at home
Wear sunscreen
regularly
Representing thousands of cataloged consumer behaviors
Data is used to develop content, consumer, category and campaign insights for our
key content partners and large advertisers
Research Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
4
Yahoo! Data – A league of its own…
Terrabytes of Warehoused Data
Millions of Events Processed Per Day
14,000
5,000
1,000
500
YSM
Y! Global
GRAND CHALLENGE PROBLEMS OF DATA PROCESSING
TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET
Y! Data Challenge Exceeds others by 2 orders of magnitude
Research
Y! Main
warehouse
NYSE
Walmart
VISA
100
Y! Panama
Warehouse
SABRE
94
Y! LiveStor
225
49
AT&T
120
Amazon
50
25
Korea
Telecom
2,000
5
What About Yahoo! Mobile?
• Fast growing initiative that is one of the
companies priorities in the future
• Great success in distribution
– signed deals with 29 carriers, and therefore
it’s accessible to 600 million subscribers, who
are now under contract.
– OneSearch is Yahoo’s mobile search
application that it launched 13 months ago.
Just launched OneSearch 2.0
• Marco Boerries, EVP of Mobile at Yahoo!: “No one
has never amassed that kind of distribution under
that short period of time.”
Research
6
Mobile Device Internet Penetration
Will Eclipse the PC
1 Billion people
across the world
use the Internet *
3.3 Billion people across the world
are mobile service subscribers
(that’s half the global population)**
Research
* U.N Telecommunications Agency, Sept 07 ** Informa, Nov 07
7
Yahoo!’s Global Mobile Reach
16.9
Million
in USA
16.9 Million Unique Users Per
Month In The U.S. Alone
Unique Users Per Month (mm)
Research
Yahoo!
16.9
Google
MSN
AOL
12.1
8.9
8.6
8
Mobile Search Built for the Consumer
PC Search
Research
=
Mobile Search
9
The Mobile Use Case is Different
Give me Answers, Entertainment, Images…
Research
10
Y! oneSearch Changed the Game
Answers Instead of Web Links. Relevant, Complete
Results
Research
11
Yahoo! Mobile Approach to Search
• OneSearch is a special federated search engine
– Analyses Concept and Intent of the query against a
large collection of “vertical” backends
• Web, News, Images, Finance, etc…
• UGC such as Wikipedia and Yahoo! Answers
– Aggregates results from verticals and blends to
optimize to user query and to device used for query
• Goal is to minimize clicks by taking user to
results around tasks
• Query sources:
– Browsers: WAP/XHTML
– Java app interface for Yahoo! Go
– SMS text messaging for Yahoo! Mobile SMS
Research
12
Approach
• Be as Open as possible on interfaces
• Fundamentally believe the mobile OS market will remain
fragmented from a platforms perspective for quite a while
– Windows Mobile only reached 30M users after more than 7 years
of effort
• Provide an environment to allow users to program to one
target platform and let Yahoo! bear the effort of making it run
on wide range of devices
• Focus on the highest value apps for users today involving
access to on-line world (less on client apps)
• Return results and not links
Research
13
Yahoo! Mobile Products
Yahoo! Home Page
Yahoo! Go 3.0
Yahoo! oneSearch
Yahoo!
onePlace
Research
* M:Metrics, October 2008
**All Yahoo! Mobile services are free. Check with your wireless carrier about data plan charges that may apply.
Yahoo!
oneConnect
14
OneSearch 2.0
• OneSearch is being opened up to all publishers and content
owners so they can write rich metadata that will be returned as
part of results, rather than just a link,
– Similar to Yahoo’s Search Monkey service for the Internet.
• More about this later…
• Three new major upgrades
– Search Assist: The search box will predict what you are typing.
– Voice input: Users can search by speaking into the device instead of
typing (provided by Vlingo)
– The search box will be integrated into the home screen of the
phones.
Research
15
OneSearch 2.0
Better answers
Turning web search results into answers
Unlocking the power of the semantic web
Providing more relevant content
Research
16
OneSearch 2.0
Easier, faster input
Predictive text completion
Contextual recommendations
Research
Easier input
17
OneSearch 2.0
Speak your search
Search for anything
Personalized to your voice
Research
Voice input
18
OneSearch 2.0
Persistent 1-click access
Gateway to the Internet
Supports text & voice
Research
Always there
Internet Use on Mobile vs. PC
Research
19
20
Mobile Use
• Today, we believe Internet use on PC is about 10x that of Mobile
• Mobile is faster growing, in all regions
• There are > 3x mobiles today than Internet users globally
– But most phones are not data capable yet
• The world today:
– We are learning from the web, and attempting to figure out what
makes sense for mobile users
– Trying to work with the Smart Phones users as they represent the
early adopters
Research
21
Classical web search user needs
• Informational (~25%) – want to learn about something
Low hemoglobin
• Navigational (~40%) – want to go to that page
United Airlines
• Transactional (~35%) – want to do something (webmediated)
– Access a service
– Downloads
– Shop
• Gray areas
Mendocino weather
Mars surface images
Nikon CoolPix
Car rental Finland
– Find a good hub
– Exploratory search “see what’s there”
Research
Broder 2002, A Taxomony of web search
22
What about on Mobile
• No good classification
• Several studies that cover
– Query frequency distribution
– Words per query
– Characters per query
• Categorization by query type into traditional categories:
– Adult and Entertainment, Autos, Consumer Goods, Finance,
Government & politics, Sports, Technology, Travel, etc…
• Best known studies by
– Kamvar and Baluja (2006 and 2007)
– Yi, Maghoul, and Pedersen (2008)
• Good quantitative statistics, little on qualitative purpose-driven
analysis (early days still)
Research
23
What do We Believe about Mobile
Queries
• We believe it is a different distribution than the query
distribution for PC users
– Bias towards shorter queries
• Data contradicts that: 2.6 words per query, same # chars as PC
– Difficulty of query entry is a significant hurdle
– Much higher location-based activity
– Much more task-oriented than exploration or research
• Notifications adds a whole new “push” dimension
– Trigger alerts (stocks, news, auctions)
– Location-based (geo-driven)
– Event-based (calendar entires such as travel alerts, flight delays,
etc.)
• Can learn much more about user intent and hence eventually
more promising for advertising
Research
24
Implications and Challenges
• Task-orientation
• Specialized content packaging
• Locality Inference from queries
• Locality Inference from device (LBS)
• Minimize typing and round-trips: get results, not just links
– Less room to display SERP + other accessories
• Monetization strategies to fund this model still not decided
– Advertising
– Subscription to “premium services”
– Revenue share on “leads”
– Pay per usage of special high-value areas
In the meantime, the web, and Search are evolving…
Research
25
Even Larger Challenges
• Modeling Social Media and use of mobile in social settings on
the go
– Understanding UGC
– Classifying, categorizing, organizing UGC and folksonomy
• A different problem of search -- Semantics of content are
critical, especially if we are to target
–
–
–
–
–
Intent
Task-orientation
Motion dimension (distance to target of search)
Push and notifications
Understanding the physical world (common sense): what is
close? Business hours? Holidays?
• Web Content growing, changing, diversifying, fragmenting
• Truly leveraging the notification abilities and finding new
everyday uses – far more versatile a space than PC
• Long-term memory (state) for long-running tasks and queries
Research
A Tale of Two Search Engines
Research
26
27
Algorithmic results
=Audience
-$
Research
Advertisements
=Monetization
+$
28
Algorithmic vs. Ad Search
• Analogous to classical separation of editorial vs
commercial content
• Technical underpinnings:
– Some commonalities (IR, ML)
– Many differences (incentives, spam, mechanism design)
Research
29
The two engines
Sponsored Links
CG Appliance Express
Discount Appliances (650) 756-3931
Same Day Certified Installation
www.cgappliance.com
San Francisco-Oakland-San Jose,
CA
User
Miele Vacuum Cleaners
Miele Vacuums- Complete Selection
Free Shipping!
www.vacuums.com
Miele Vacuum Cleaners
Miele-Free Air shipping!
All models. Helpful advice.
www.best-vacuum.com
Web
Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)
Miele, Inc -- Anything else is a compromise
Web spider
At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances.
Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ...
www.miele.com/ - 20k - Cached - Similar pages
Miele
Welcome to Miele, the home of the very best appliances and kitchens in the world.
www.miele.co.uk/ - 3k - Cached - Similar pages
Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this
page ]
Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit
...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes.
www.miele.de/ - 10k - Cached - Similar pages
Herzlich willkommen bei Miele Österreich - [ Translate this page ]
Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch
weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ...
www.miele.at/ - 3k - Cached - Similar pages
Search
Indexer
The Web
Research
Indexes
Ad indexes
30
1995: The Yahoo! Directory
• Apply human expertise and editorial to organize web sites
• What worked
– Practical, Navigable
– Trustworthy, Authoritative
• What didn’t
– Scalability
– Granularity
– Etc.
Research
31
1995 : Altavista (Inktomi, Lycos, etc.)
• Automate the process of acquiring pages; use “information
retrieval” techniques to return pages that contain a particular
term
• What worked
– Scalable (query for “IBM” returns 40M pages)
– Simple
– Granular
• What didn’t
– Scalability a double-edged sword
– Ranking and relevance poor
– Not authoritative (spam, irrelevance, etc.)
Research
32
c. 1999-2006: PageRank (Google, Yahoo)
• Use topology (link structure) of the web to confer authority
• What works
– Relevance is greatly improved
– Navigational query is born (query for “IBM” gets me to ibm.com)
• What doesn’t
– Homogeneity of results (no personalization) means no “subjective”
queries – webmasters vote by proxy for everyone – and their answer
is the only answer
– System easily “gamed” by spammers – leads to arms race
Research
33
Meanwhile, On the Money Front…
• Sponsored search ranking: Goto.com (morphed into
Overture.com  Yahoo!)
– Your search ranking depended on how much you paid
– Auction for keywords: casino was expensive!
• 1998+: Link-based ranking pioneered by Google
– Blew away all early engines except Inktomi
– Great user experience in search of a business model
– Meanwhile Goto/Overture’s annual revenues were nearing $1 billion
• Result: Google added sponsored search “ads” to the side,
independent of search results
– 2003: Yahoo follows suit, acquiring Overture (for paid placement)
and Inktomi (for search)
• The Monetization Mechanisms… Conversion of marketplace
machanisms in 2007
Research
34
Search query
Ad
Research
35
Questions for the audience
• Do you think an “average” user, knows the difference between
sponsored search links and algorithmic search results?
• Do you think an “average” user knows there are sponsored links
on the page?
• Do you think a user knows where a sponsored link would
navigate to upon a click?
Research
36
How it works
Advertiser
I want to bid $5 on
canon camera
I want to bid $2 on
cannon camera
Ad Index
Sponsored
search engine
Engine decides when/where to show this ad.
Landing page
Engine decides how much to charge advertiser on a click.
Research
37
Engine: Three sub-problems
1. Retrieve ads matching query
2. Order the ads
3. Pricing on a click-through
Research
IR
Econ
38
Ads go
in slots
like
these
Research
39
Higher
slots
get
more
clicks
Research
40
2. Order the ads
• Most generally, composite IR+Econ score … for
today’s talk, focus on Econ
• Original GoTo/Overture scheme:
– Order by bid
Research
41
Economic ordering
• Bid and revenue ordering: two forms of ordering by an
econ score
• Does revenue ordering maximize revenue?
• No – advertisers react to ordering scheme, by changing
their bid behavior!
• Lahaie+Pennock ACM EC 2007
– Family of schemes bridging Bid and Revenue ordering
– Game-theoretic analysis
Research
Edelman, Ostrovsky, Schwarz 2006
42
A new convergence
• Monetization and economic value an intrinsic
part of system design
– Not an afterthought
– Mistakes are costly!
• Computing meets humanities like never before –
sociology, economics, anthropology …
Research
Towards Getting Things Done…
vs. Searching
Research
43
44
Example
Start
I want to book a vacation in Tuscany.
Research
Finish
46
Loved the vacation, want to make that
sweet Italian coffee at home
Research
47
Trends in task complexity
• Dawn of search:
– Navigational queries
– Pockets of information
• Today:
– Increasing migration of content online
– New forms of media only available online
– Infrastructure for payments and reputation
sufficient for many users
Research
48
Things to notice
• Long-running user goals
• Search as hub:
– start there
– return for resource discovery and at task
boundaries
– traverse the web broadly to complete task
• Web services integrated into task
Research
Content Growth
Research
49
50
Content trends
[Ramakrishnan and Tomkins 2007]
Research
51
Metadata trends
[Ramakrishnan and Tomkins 2007]
Research
Content Complexity
Research
52
53
Content ownership
• Content consumption is fragmenting –
nobody owns more than 10% of WW PVs
• No single place will own all the content
• Best of breed processing will operate on
the web version (?)
• Value transitions to ecosystem
Research
54
Content access is fragmenting
Research
55
Content itself is fragmenting
Research
56
Evolution of Social Media
• Although the “traditional notion” of portal and
web content is still attracting growing audiences
• The original notion of “publishing content” to
attract audiences is changing fast
– As people discover the fact that the Internet is an
Interactive Medium
– The uses of the Internet enter areas we could not
imagine a short time ago
• A new notion of “publishing” is fast emerging
– The opportunity of user-generated content
Research
57
Challenges in social media
• How do we use these tags for better search?
• What’s the ratings and reputation system?
• How do you cope with spam?
• The bigger challenge: where else can you exploit
the power of the people?
• What are the incentive mechanisms?
Research
The Search Interface
Evolution is starting
Research
58
59
What does this mean for search?
• Few changes through 2005
• Entering period of massive change
to handle more complex content
• Rich media, aggregation, simple
task analysis, etc
• Moving beyond the stateless
query/response paradigm
• Personalization theory
Research
60
Rich media and search assistance
Research
61
Structured aggregation
Research
62
Simple task-focused queries
Research
63
Google Base
Research
Open Ecosystems
64
65
Structured data on the Web
• Structured databases power a vast majority of pages on
the web
– Certainly ecommerce catalogs etc
– But also user generated content (eg blogs)
• Content owners open to exposing structure, but don’t see
how and why
– Microformats adoption at an all-time high
– Yet, it’s produced much more than is consumed
• Experiments with “pure” structured data aggregation have
met with mixed success
– Google Base, Freebase, even Co-op
Research
66
What have we announced?
• Yahoo! Search Monkey: API for publishers to
push metadata and structure to search engine
• Wide-ranging support for semantic web
standards
• Vocabulary to surface structure and semantics
• Community Tools to evolve standards and
vocabulary
Research
67
Search as Killer App for Data Web
• Publishers and search engine collaborate
• Users see richer search experience
• Accomplish their tasks faster and more
effectively
• Example: abstracts surfacing structured
content
Research
68
Search results of the future
yelp.com
Gawker
babycenter
New York Times
epicurious
LinkedIn
answers.com
webmd
Research
69
Search results of the future
yelp.com
Gawker
babycenter
New York Times
epicurious
LinkedIn
answers.com
webmd
Research
70
Comprehensive support for
emerging semantic web standards ++
• Microformats
– hCard, hEvent, hReview, hAtom, XFN
– More as they get adopted
• RDFa and eRDF markup
• OpenSearch
– +extensions to return structured data
• Atom/RSS Feeds
– +extensions to embed structured data
Research
71
Vocabulary to surface structure
• ‘dataRSS’ provides a common framework for
embedding structured data
– Use with RDFa, eRDF or OpenSearch
– Preferred Vocabulary includes
• Atom, Dublin Core
• Creative Commons
• FOAF, GeoRSS, MediaRSS
• RDF, RDFS, RDF Review
• vCal, vCard
Research
72
Community Tools
• We’re seeding the Vocabulary and Standards
Support
• We’ll evolve both of these with the help of the
Web Community
• Yahoo! Groups: used to collect contributor and
community suggestions, feedback, etc…
• Suggestions Board to vote on changes
Research
73
Implications for publishers?
• Yahoo! open search platform does not modify
ranking
• Richer abstracts may provide more
information to users and draw higher
quality/quantity of clicks
• We want rich abstracts that give users a better
experience
– We don’t want misleading abstracts
Research
74
The whole story
• User needs becoming more complex
• Content growing, changing, diversifying,
fragmenting
• Search responding by increase in
sophistication
• Value migrating to ecosystem
• Unlock the value by enabling interoperability –
expose semantics
Research
75
Subjective Queries
The kinds of queries that rely on domain expertise…
• “Do you know a reputable plumber in Atlanta?”
• “Where is the cool nightlife in Soho?”
• “What political blogs do you think I’d enjoy reading?”
• “Where can I buy a cool pair of boots?”
These kinds of queries are ill-served by today’s search
engines, but are ironically the most valuable (i.e.
transactional queries.)
Research
76
Research
77
Research
78
Research
79
No definitive
answer
Unverifiable
answer
Community consensus
Research
80
Incentives
Legitimate?
Research
81
Where is the Science?
• Which questions are legitimate?
• What is the incentive system?
• How do we validate answers?
• What is the role of the community?
• What is the reputation system?
Research
82
What are the challenges?
• Community of users
– Social system
• Incentives and reputations
– Economic system
• Poorly phrased, grammatically limited queries
– Language analysis
• Improving user experience from past data
– Data mining
Research
Back to Business
These are early days…
Research
83
84
Advertising: Brand and DR
Knowledge of users & their behavior throughout the purchase funnel can
grow brand & direct response revenue
Awareness
Consideration
Purchase
> $200B Brand
Advertising Market
Most time & activity is in consideration &
engagement, but there are limited
metrics & reach strategies
> $200B Direct
Response Market
Research
A question for the Audience:
Why is search-related
advertising so powerful?
85
86
It is all about Inferring User Intent
• User type 2.8 keywords
– Note the non-sense use of average
– Average query returns > 600K matches!
• We get an idea of intent
• Coupled with immediacy (recency)
–  an amazing matching engine
–  10x to 100x click through rate over banner ads
Research
Do I know this user’s intent?
Research
87
88
Brand Ads and Search Ads Interact!
• Is ad search strategy enough for a direct
marketer?
• Do brand ads play a role in search advertising?
• Harris Direct Case Study
Awareness
Consideration
Purchase
Research
89
Case Study: Harris Direct
Viewing These Ads:
Had This Effect On:
• Aided Brand Awareness
– Up 7%
• Brand Favorability
– Up 32%
• Purchase Intent
– Up 15%
On:
Research
90
Case Study: Harris Direct
People who saw display ads were 61% more likely to search on related topics…
…and drove 139% more clicks on
algorithmic and sponsored links…
…specifically driving 249% more
sponsored search clicks …
…and driving 91% more activity
on the HarrisDirect.com website.
Research
Yahoo! Research
Inventing the new sciences of the Internet
Research
91
92
New Science?
• The Internet touches all of our lives:
personal, commercial, corporate,
educational, government, etc…
• Yet many of the basic notions we talk
about:
– Search, Community, Personalization,
Engagement, Interactive Content, Information
Navigation, Computational Advertising
– Are not at all understood, or well-defined
– These are not disciplines that academia or any
industry research labs focus on…
Research
93
Areas of Research
• Information Navigation and Advanced Search
– We are in the early days of search and retrieval
– Inferring intent
– New ways of extracting entities and objects
• Community:
–
–
–
–
How do you know what to believe on the Internet?
Trust models on-line and trust propagation
What makes communities thrive? Whither?
Social media, tagging, image and video sharing
• Microeconomics: a new generation of economics driven by
massive interactions
– Auction marketplaces
– The web as a new LEI of activities and economies
• Computational Advertising
– Targeting and matching sciences, Inferring user intent
– Pricing models (CPM, CPC, CPA, CPL, etc…)
– Large-scale optimization and yield management
Research
94
Concluding Thoughts (1)
• The notion of “corpus” and publishing is changing
fundamentally
• We still do not have the basic sciences to understand what is
happening and what needs to happen to combine the new
capabilities
• The problem of mobile search is different, but poorly
understood
• The web is changing, content sources are fragmenting and
changing
– the source distribution is radically changing
– Publisher – consumer divide is becoming fuzzy
• Search engine interface is finally changing to adapt
– Much of the change came from worrying about mobile search
Research
95
Concluding Thoughts (2)
• The view that Search is everything is LIMITED (at best)
– Economics of publishing and advertising
– Users do not differentiate ad and content
– Behavioral data is the most powerful
– “Nothing predicts behavior like behavior”
• Monetization and economic value an intrinsic part of system design
– Not an afterthought
– Mistakes are costly!
• Computing meets humanities like never before – sociology,
economics, anthropology …
• A more holistic view of Search and Information Navigation is
needed
Research
Thank You!
Research
&
Questions?
[email protected]
96
97
No time to cover today
• Micro-Economics of the Web
– Auction marketplaces
– Marketplace and Exchange Design
– The economics of Engineering IT Decisions
• Computational Advertising
– Targeting and matching sciences
– Inferring user intent
– Pricing models (CPM, CPC, CPA, CPL, etc…)
– Large-scale optimization and yield management
Research