Good choices begin with accurate information

Download Report

Transcript Good choices begin with accurate information

Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
For Rochester Regional Library Council
Member Libraries’ Staff
Sponsored by the
Rochester Regional Library Council
Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds
granted by the
New York State Library 2012
The
Searchscape in 2012
Search and the social web
New services
Google
Other established services
Future directions
http://pewinternet.org/Reports/2011/Search-andemail/Report.aspx
http://www.pewinternet.org/Infographics/2010/Gen
erations-2010-Summary.aspx
USC Annenberg School 2011 Digital Future Report
(Highlights)
http://www.digitalcenter.org./pdf/2011_digital_future_fin
al_release.pdf

Internet and Free Speech
 33% - Safe to voice political views
online
 36% - Not safe to voice political views
online
 70% - All should be free to criticize
their government online
 55% - All should be free to voice
political views online, even if extreme
Online Activity Data for Teens
http://www.pew.internet.org/Static-Pages/TrendData-for-Teens
80
70
60
50
40
30
20
10
0
Social
networking
sites
News
Shopping
Share
Health
original art, information
photos,
music
Where college students begin their
information Search Perceptions of Libraries 2010
www.oclc.org/reports/2010perceptions.htm
(2005 in white)
Search Engines in the East
http://searchengineland.com/yandex-expands-to-turkey-first-shots-firedin-new-google-baidu-yandex-stand-off-93201
Web Search in 2012
Who’s crawling the Web?
 Google
 Bing (aka Yahoo!)
 Gigablast
 Blekko
 DuckDuckGo
 Exalead
 Baidu
 Yandex
Blekko
Limits results to your social web
Slashtags-collections of content-related
URL’s tagged by users
 Sign into Blekko with Facebook Connect
and add “/likes” to your terms. Limits
results to sites your Facebook
community have “liked”

gifts/likes
new york shusi/likes
Facebook Social Graph data integrated
into ranking factors
 Eliminates spam, content farm pages

Google, Bing
Social data in results displays

Option to directly influence ranking
your results by your network
communities
 Bing’s Facebook Liked Results
 Google’s Google+ and Social Search

Content being shared in real time
 Bing’s Social Search
 Google’s (former) Realtime Search
Do “social signals” impact
general web search results?

“Authoritative” people
 Twitter- a user’s “authority” based on the
number and “authority” of their followers

Bing
 Authoritative Twitter users add ranking
value to the pages included in their tweets

Google
 Google+ data likely to influence ranking
(Too soon to say)
Facebook’s “shared” or
“liked” pages

Bing
 Looks at links shared with “Everyone” and
shared via Facebook fan pages

Google
 Looks at links from fan pages
 Has no personal wall data from Facebook
 Uses Google+ for personal data,
recommended sites
Topsy – www.topsy.com
Real-time search of the social web
Results from Twitter and Google+
 Ranking factors include

 How often the page is cited in tweets
 “Influence” algorithm
○ “Who is listening to you?”
○ Dynamic process assigns influence score
based on
 Number of followers
 Their influence
 How often your tweets are re-tweeted
Topsy – www.topsy.com
Real-time search of the social web
Unlike other real time se’s, ranking is
based on a deep archive of social
media
 Trending metrics used in ranking

 What’s viral right now?
Experts Search locates authoritative
Twitter users on topics of your choice
 Advanced search filters

 Site/domain
 Language (10)
Twitter user
Date, time posted
Twitter’s Archive
Coming soon ??!!??





Began in March, 2006
Great potential primary source
archive
Twitter only offers last 14 days at
search.twitter.com
Access via Google ended 7/2/11
LC purchased entire archive; not yet
available.
http://blogs.loc.gov/loc/2010/04/how-tweet-itis-library-acquires-entire-twitter-archive/
Quora –
quora.com
Question and answer service
Requires account (free)
 Registered users

 Ask/answer questions
 Manage topic/subtopic structures

Your Quora homepage provides a
feed of topics or answerers you
choose to follow
Quora –
quora.com
Question and answer service
Must choose a topic to begin a
search
 Option to login and connect with
your Facebook or Twitter networks
“Helps Quora surface content that
is relevant to you”

Chacha –
chacha.com
Question and answer service
Largest SMS question/answer service
Answers provided by live “Guides” if not
found in Chacha’s database
 Guides


 Generalists and specialists earn $.10-$.20 per
completed task, depending on the “category of
the question”
 Transcribers for voice-transmitted questions
earn $.02-$.03 per completed task

Option to login and connect with your
Facebook, Twitter Google or Yahoo social
networks
Freebase – www.freebase.com
An open, Creative Commons
licensed repository of structured
data of almost 22 million entities (a
single person, place, or thing often
connected together as a graph)
 Search begins in “Topics”
 “Tables” are rich in factual text
information
 Powerful search capabilities

Zanran – zanran.com





Finds “semi-structured” numerical data
on the web
Charts, graphs, tables in excel, pdf, html
and other formats
Retrieves both static and dynamic data
Search limits by date, location, format,
domain and URL
Data in thumbnails with link to source
URL
Trapit – trap.it
“Personal discovery engine”
 Users create keyword or URL-based
“traps” which search ca. 100k “high
quality” content sites
 News aggregators and content
farms excluded
 Option to login via Facebook or
Twitter to utilize data from your
network(s)
 Over time Trapit increasingly
personalizes results

Voxalead
http://labs.exalead.com/project/voxalead






Experimental full text search of video
and audio content
Utilizes speech-to-text technology
Currently includes major world news
sources
Sources in 9 languages, including Arabic
and Chinese
Locates transcript; search for your terms
by accessing the transcript itself
Machine transcription can be problematic
Key advances in Google Search

1997 – Backrub becomes Google
 Ranking based on number and quality
of incoming links
2003 – G. gives more weight to
links from authoritative sites
 2005 – Option for personalized
results
 2005 – “Bigdaddy” update for more
comprehensive web crawling

Key advances in Google Search
2007 – Universal search yields
links to any medium in results page
 2009 – Real-time search displays
results from Twitter and blogs as
they are published
 2011- Panda update to combat
spam
 2011- QDR update (Query
Deserves Freshness)

Panda
Reduces low-quality sites with “all
the right keywords” that appear
high in results listings.
 Incorporates data on searcherblocked sites
 Affects results from content farms
and scraper sites in particular

“Content farms”





Companies that employ freelance writers
to create content specifically designed to
rank high in SE results
Ads are embedded or linked to this
content
“Deliver as much as possible as cheaply as
possible”
Demand Media alone publishes over 1 m
articles a month
Can subvert a SE’s “organic” search
algorithms
Scraper sites
Sites with low levels of original
content that pull (scrape) content
from other sites without permission
and manipulate their pages to rank
high in results listings
 Pay-per-click advertising abounds
 Quality and quantity of inbound
links are generally poor

QDR - Fresher results
New QDR algorithm identifies topics
that are likely to need up-to-theminute results and ranks results
accordingly
 Affects searches for

 Recent events or hot topics
 Regularly recurring events
 Frequent updates, product reviews
Universal search ranking
Universal search – sections in search
results from Google’s News, Image,
Video, Shopping, Local verticals
 Current ranking within these sections






YouTube #1 in video results
Google Maps #1 in map results
Google Product Search #1 in shopping
Google Blogger #1 in image results
May be an issue in the anti-trust
reviews Google is undergoing
Verbatim tool -
The “+” is gone
Search is limited to exactly what you type
 In More search tools at the left of results
page
 Customary (default) “improvements”

 suggest spelling corrections and alternative




spellings
personalize your search by using information
such as sites you’ve visited before
include synonyms of your search terms
find results that match similar terms
search for words with the same stem
Searching by image





Image is analyzed for shape, texture,
color and other attributes
Possible matches returned based on
similar attributes
Results may be further refined by
adding words in the search box
Use an image from the web or upload
one
In Google Images, click on the
camera in the search box
Sort images by subject
Allows searching for general
classes or types of image, ie.
mountains, ducks, cars from the
1940’s.
 Algorithms identify relationships
through pixel values, patterns,
colors and other visual attributes in
combination with text related to
the image

Google’s mobile search
Search by speaking
 Search nearby
 Search for places open now
 Get interactive weather
 Search using your camera (upload
the image)

Google+
plus.google.com
Newest social network (requires a
Google account)
 Launched 9/19/11 (access to
Twitter ended 7/2/11)
 Currently over 170 m users

 Facebook currently over 800 m active
users
Offers “hangouts” –video chat
rooms within the social network
 Businesses and organizations
allowed 11/9/11

Google+
“Google+1” allows user to give a site
a vote of approval
 Web search results include +1 votes,
sometimes location-based
 Current access to content:

 http://www.google.com/search?q=site:pl
us.google.com+searchterm+searchterm+
searchterm
Google Books – the
collection

G. has scanned over 12 million titles
(U.S. libraries estimated to hold 42
million titles)
 2 million public domain titles (pre-1923)
 2 million scanned with publisher’s
permission
 8 million remain
○ “Orphans” (O.P, in copyright)
○ Copyrights not renewed (pub. 1923-1963)

Proposed agreement – Google would pay
$125 m to authors of scanned, copyrighted
books
Proposed agreement

Public domain works
 Free to read, download, print

In-print, out-of-copyright (orphans)
 To read, download and/or print
 $$$ institutional license (colleges,
universities, research centers)
 $$$ consumer license (individuals)
 Free – public access license through
ONE terminal in public libraries – read
only
($$$ to print or download)

In-print copyrighted – depends on
agreement with publisher
Judge Denny Chin
3/22/11
Second Circuit Court of Appeals, NY NY
Proposed agreement gives Google an
unfair competitive advantage and
breaks antitrust laws
 Urged an agreement that would only
include works whose copyright owners
have agreed, rather than the current
plan which would require authors to
“opt out” of selling their books online
through Google.

Googleopoly: Antitrust allegations

Active Investigations 2010-2011
 Texas – Is Google using its ad business
to boost ranking? (Buy an ad, improve
your ranking)
 EU – Allegations that Google is keeping
competitors out of its rankings
 US Senate Antitrust Subcommittee to
investigate possible unfair treatment of
competitors in ranking and ability to
purchase advertising.
Content Removal Requests from
Governments: January- June, 2011
http://www.google.com/transparencyreport/governmentreque
sts/removals/
Brazil
 Germany
 United States
 South Korea
 Taiwan
 India
 UK
 China

224
125
92
88
69
68
65
3 (Youtube blocked 6 mo.)
User Data Requests from
Governments: Jan.- June, 2011
http://www.google.com/transparencyreport
/governmentrequests/userdata/
Country
US
India
France
UK
Germany
Italy
Brazil
Requests
5950
1739
1300
1273
1060
934
703
% Compliance Rate
93
70
48
64
67
60
87
Binghoo!






Yahoo! now using Bing search results
Maintains other Yahoo services
independently
Bing’s real-time search results from
Twitter continues; Facebook added this
year
Users encouraged to bring “liked by your
friends” to Bing results via Facebook
Reward program – credits given for
searching and exploring Bing search
features
Rewards include gift cards, Facebook
credits, Hulu subscription
SE success rate –
Experian Hitwise
“% of search queries that result in a visit to a website”
July, 2011
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
Curating the web with Blekko
http://blekko.com (still in beta!)

Human/crawler service
 Blekko (human) editors create “topic” and




“built-in” tags used to label content in the
Blekko crawler database.
Registered users can create their own tags for
any site in the Blekko database for a personal,
searchable web
Small db of 3 b web pages
50 m queries in 5th month
“AdSpam” algo blocked over 1.5 m sites to date
Blekko: Under the hood

3 search options
 Web results
 Slashtags (human/expert curation)
 Likes (Facebook friends’ curation)
Adding a slashtag limits the search
to those sites so tagged
 Note: adding multiple Blekko
“topic” slashtags limits the search
to sites which have ALL the tags

Other independent Services
to watch…
DuckDuckGo.com
 Exalead (www.exalead.fr)
 WolframAlpha.com
 Subject directories

 ipl2.org (ipl + lii)
 botw.org (strong in international sites)
 europeana.eu (European digital collections)
 intute.ac.uk (no new content since 7/11)
Is a universal notion of
relevancy possible for
today’s web searchers?
What are searchers looking
for and how do engines
enable (or hinder) them?
Modalities of Search
“Three Mindsets of Search” - About.com 2011
http://www.advertiseonabout.com/wpcontent/uploads/2011/09/3-Mindsets-of-Search-PR.pdf

“Answer me” 48%
 Give me exactly what I ask for…

“Educate me” 26%
 Broad, contextual understanding, with
multiple perspectives

“Inspire me” 28%
 Fun, “browsy” search; Looking for
surprises, travel, etc.
Can any single search service
based on full-text retrieval
satisfy all these modalities,
regardless of how many ranking
algorithms come into play?
Social networks and Search
“Three Mindsets of Search” - About.com 2011

How would you define “an expert”?
 96% include “self-taught “allies with
experience and motivation
 89% include the wisdom of crowds
in social or outside networks
 87% include traditional credentialed
experts
Growing Trends…
Social networks as sources for
information and se ranking data
 Facebook as e-commerce hub (stores
and companies’ fanpages)
 Content packaged in video and audio
 Mobile search apps

 May only search for one thing but it does
that very well

Development of niche engines for
different search modalities (Return of
vertical engines?)
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
Geneva, NY 14456
(315) 781-3014
[email protected]