Good choices begin with accurate information
Download
Report
Transcript Good choices begin with accurate information
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
For Rochester Regional Library Council
Member Libraries’ Staff
Sponsored by the
Rochester Regional Library Council
Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds
granted by the
New York State Library 2012
The
Searchscape in 2012
Search and the social web
New services
Google
Other established services
Future directions
http://pewinternet.org/Reports/2011/Search-andemail/Report.aspx
http://www.pewinternet.org/Infographics/2010/Gen
erations-2010-Summary.aspx
USC Annenberg School 2011 Digital Future Report
(Highlights)
http://www.digitalcenter.org./pdf/2011_digital_future_fin
al_release.pdf
Internet and Free Speech
33% - Safe to voice political views
online
36% - Not safe to voice political views
online
70% - All should be free to criticize
their government online
55% - All should be free to voice
political views online, even if extreme
Online Activity Data for Teens
http://www.pew.internet.org/Static-Pages/TrendData-for-Teens
80
70
60
50
40
30
20
10
0
Social
networking
sites
News
Shopping
Share
Health
original art, information
photos,
music
Where college students begin their
information Search Perceptions of Libraries 2010
www.oclc.org/reports/2010perceptions.htm
(2005 in white)
Search Engines in the East
http://searchengineland.com/yandex-expands-to-turkey-first-shots-firedin-new-google-baidu-yandex-stand-off-93201
Web Search in 2012
Who’s crawling the Web?
Google
Bing (aka Yahoo!)
Gigablast
Blekko
DuckDuckGo
Exalead
Baidu
Yandex
Blekko
Limits results to your social web
Slashtags-collections of content-related
URL’s tagged by users
Sign into Blekko with Facebook Connect
and add “/likes” to your terms. Limits
results to sites your Facebook
community have “liked”
gifts/likes
new york shusi/likes
Facebook Social Graph data integrated
into ranking factors
Eliminates spam, content farm pages
Google, Bing
Social data in results displays
Option to directly influence ranking
your results by your network
communities
Bing’s Facebook Liked Results
Google’s Google+ and Social Search
Content being shared in real time
Bing’s Social Search
Google’s (former) Realtime Search
Do “social signals” impact
general web search results?
“Authoritative” people
Twitter- a user’s “authority” based on the
number and “authority” of their followers
Bing
Authoritative Twitter users add ranking
value to the pages included in their tweets
Google
Google+ data likely to influence ranking
(Too soon to say)
Facebook’s “shared” or
“liked” pages
Bing
Looks at links shared with “Everyone” and
shared via Facebook fan pages
Google
Looks at links from fan pages
Has no personal wall data from Facebook
Uses Google+ for personal data,
recommended sites
Topsy – www.topsy.com
Real-time search of the social web
Results from Twitter and Google+
Ranking factors include
How often the page is cited in tweets
“Influence” algorithm
○ “Who is listening to you?”
○ Dynamic process assigns influence score
based on
Number of followers
Their influence
How often your tweets are re-tweeted
Topsy – www.topsy.com
Real-time search of the social web
Unlike other real time se’s, ranking is
based on a deep archive of social
media
Trending metrics used in ranking
What’s viral right now?
Experts Search locates authoritative
Twitter users on topics of your choice
Advanced search filters
Site/domain
Language (10)
Twitter user
Date, time posted
Twitter’s Archive
Coming soon ??!!??
Began in March, 2006
Great potential primary source
archive
Twitter only offers last 14 days at
search.twitter.com
Access via Google ended 7/2/11
LC purchased entire archive; not yet
available.
http://blogs.loc.gov/loc/2010/04/how-tweet-itis-library-acquires-entire-twitter-archive/
Quora –
quora.com
Question and answer service
Requires account (free)
Registered users
Ask/answer questions
Manage topic/subtopic structures
Your Quora homepage provides a
feed of topics or answerers you
choose to follow
Quora –
quora.com
Question and answer service
Must choose a topic to begin a
search
Option to login and connect with
your Facebook or Twitter networks
“Helps Quora surface content that
is relevant to you”
Chacha –
chacha.com
Question and answer service
Largest SMS question/answer service
Answers provided by live “Guides” if not
found in Chacha’s database
Guides
Generalists and specialists earn $.10-$.20 per
completed task, depending on the “category of
the question”
Transcribers for voice-transmitted questions
earn $.02-$.03 per completed task
Option to login and connect with your
Facebook, Twitter Google or Yahoo social
networks
Freebase – www.freebase.com
An open, Creative Commons
licensed repository of structured
data of almost 22 million entities (a
single person, place, or thing often
connected together as a graph)
Search begins in “Topics”
“Tables” are rich in factual text
information
Powerful search capabilities
Zanran – zanran.com
Finds “semi-structured” numerical data
on the web
Charts, graphs, tables in excel, pdf, html
and other formats
Retrieves both static and dynamic data
Search limits by date, location, format,
domain and URL
Data in thumbnails with link to source
URL
Trapit – trap.it
“Personal discovery engine”
Users create keyword or URL-based
“traps” which search ca. 100k “high
quality” content sites
News aggregators and content
farms excluded
Option to login via Facebook or
Twitter to utilize data from your
network(s)
Over time Trapit increasingly
personalizes results
Voxalead
http://labs.exalead.com/project/voxalead
Experimental full text search of video
and audio content
Utilizes speech-to-text technology
Currently includes major world news
sources
Sources in 9 languages, including Arabic
and Chinese
Locates transcript; search for your terms
by accessing the transcript itself
Machine transcription can be problematic
Key advances in Google Search
1997 – Backrub becomes Google
Ranking based on number and quality
of incoming links
2003 – G. gives more weight to
links from authoritative sites
2005 – Option for personalized
results
2005 – “Bigdaddy” update for more
comprehensive web crawling
Key advances in Google Search
2007 – Universal search yields
links to any medium in results page
2009 – Real-time search displays
results from Twitter and blogs as
they are published
2011- Panda update to combat
spam
2011- QDR update (Query
Deserves Freshness)
Panda
Reduces low-quality sites with “all
the right keywords” that appear
high in results listings.
Incorporates data on searcherblocked sites
Affects results from content farms
and scraper sites in particular
“Content farms”
Companies that employ freelance writers
to create content specifically designed to
rank high in SE results
Ads are embedded or linked to this
content
“Deliver as much as possible as cheaply as
possible”
Demand Media alone publishes over 1 m
articles a month
Can subvert a SE’s “organic” search
algorithms
Scraper sites
Sites with low levels of original
content that pull (scrape) content
from other sites without permission
and manipulate their pages to rank
high in results listings
Pay-per-click advertising abounds
Quality and quantity of inbound
links are generally poor
QDR - Fresher results
New QDR algorithm identifies topics
that are likely to need up-to-theminute results and ranks results
accordingly
Affects searches for
Recent events or hot topics
Regularly recurring events
Frequent updates, product reviews
Universal search ranking
Universal search – sections in search
results from Google’s News, Image,
Video, Shopping, Local verticals
Current ranking within these sections
YouTube #1 in video results
Google Maps #1 in map results
Google Product Search #1 in shopping
Google Blogger #1 in image results
May be an issue in the anti-trust
reviews Google is undergoing
Verbatim tool -
The “+” is gone
Search is limited to exactly what you type
In More search tools at the left of results
page
Customary (default) “improvements”
suggest spelling corrections and alternative
spellings
personalize your search by using information
such as sites you’ve visited before
include synonyms of your search terms
find results that match similar terms
search for words with the same stem
Searching by image
Image is analyzed for shape, texture,
color and other attributes
Possible matches returned based on
similar attributes
Results may be further refined by
adding words in the search box
Use an image from the web or upload
one
In Google Images, click on the
camera in the search box
Sort images by subject
Allows searching for general
classes or types of image, ie.
mountains, ducks, cars from the
1940’s.
Algorithms identify relationships
through pixel values, patterns,
colors and other visual attributes in
combination with text related to
the image
Google’s mobile search
Search by speaking
Search nearby
Search for places open now
Get interactive weather
Search using your camera (upload
the image)
Google+
plus.google.com
Newest social network (requires a
Google account)
Launched 9/19/11 (access to
Twitter ended 7/2/11)
Currently over 170 m users
Facebook currently over 800 m active
users
Offers “hangouts” –video chat
rooms within the social network
Businesses and organizations
allowed 11/9/11
Google+
“Google+1” allows user to give a site
a vote of approval
Web search results include +1 votes,
sometimes location-based
Current access to content:
http://www.google.com/search?q=site:pl
us.google.com+searchterm+searchterm+
searchterm
Google Books – the
collection
G. has scanned over 12 million titles
(U.S. libraries estimated to hold 42
million titles)
2 million public domain titles (pre-1923)
2 million scanned with publisher’s
permission
8 million remain
○ “Orphans” (O.P, in copyright)
○ Copyrights not renewed (pub. 1923-1963)
Proposed agreement – Google would pay
$125 m to authors of scanned, copyrighted
books
Proposed agreement
Public domain works
Free to read, download, print
In-print, out-of-copyright (orphans)
To read, download and/or print
$$$ institutional license (colleges,
universities, research centers)
$$$ consumer license (individuals)
Free – public access license through
ONE terminal in public libraries – read
only
($$$ to print or download)
In-print copyrighted – depends on
agreement with publisher
Judge Denny Chin
3/22/11
Second Circuit Court of Appeals, NY NY
Proposed agreement gives Google an
unfair competitive advantage and
breaks antitrust laws
Urged an agreement that would only
include works whose copyright owners
have agreed, rather than the current
plan which would require authors to
“opt out” of selling their books online
through Google.
Googleopoly: Antitrust allegations
Active Investigations 2010-2011
Texas – Is Google using its ad business
to boost ranking? (Buy an ad, improve
your ranking)
EU – Allegations that Google is keeping
competitors out of its rankings
US Senate Antitrust Subcommittee to
investigate possible unfair treatment of
competitors in ranking and ability to
purchase advertising.
Content Removal Requests from
Governments: January- June, 2011
http://www.google.com/transparencyreport/governmentreque
sts/removals/
Brazil
Germany
United States
South Korea
Taiwan
India
UK
China
224
125
92
88
69
68
65
3 (Youtube blocked 6 mo.)
User Data Requests from
Governments: Jan.- June, 2011
http://www.google.com/transparencyreport
/governmentrequests/userdata/
Country
US
India
France
UK
Germany
Italy
Brazil
Requests
5950
1739
1300
1273
1060
934
703
% Compliance Rate
93
70
48
64
67
60
87
Binghoo!
Yahoo! now using Bing search results
Maintains other Yahoo services
independently
Bing’s real-time search results from
Twitter continues; Facebook added this
year
Users encouraged to bring “liked by your
friends” to Bing results via Facebook
Reward program – credits given for
searching and exploring Bing search
features
Rewards include gift cards, Facebook
credits, Hulu subscription
SE success rate –
Experian Hitwise
“% of search queries that result in a visit to a website”
July, 2011
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
Curating the web with Blekko
http://blekko.com (still in beta!)
Human/crawler service
Blekko (human) editors create “topic” and
“built-in” tags used to label content in the
Blekko crawler database.
Registered users can create their own tags for
any site in the Blekko database for a personal,
searchable web
Small db of 3 b web pages
50 m queries in 5th month
“AdSpam” algo blocked over 1.5 m sites to date
Blekko: Under the hood
3 search options
Web results
Slashtags (human/expert curation)
Likes (Facebook friends’ curation)
Adding a slashtag limits the search
to those sites so tagged
Note: adding multiple Blekko
“topic” slashtags limits the search
to sites which have ALL the tags
Other independent Services
to watch…
DuckDuckGo.com
Exalead (www.exalead.fr)
WolframAlpha.com
Subject directories
ipl2.org (ipl + lii)
botw.org (strong in international sites)
europeana.eu (European digital collections)
intute.ac.uk (no new content since 7/11)
Is a universal notion of
relevancy possible for
today’s web searchers?
What are searchers looking
for and how do engines
enable (or hinder) them?
Modalities of Search
“Three Mindsets of Search” - About.com 2011
http://www.advertiseonabout.com/wpcontent/uploads/2011/09/3-Mindsets-of-Search-PR.pdf
“Answer me” 48%
Give me exactly what I ask for…
“Educate me” 26%
Broad, contextual understanding, with
multiple perspectives
“Inspire me” 28%
Fun, “browsy” search; Looking for
surprises, travel, etc.
Can any single search service
based on full-text retrieval
satisfy all these modalities,
regardless of how many ranking
algorithms come into play?
Social networks and Search
“Three Mindsets of Search” - About.com 2011
How would you define “an expert”?
96% include “self-taught “allies with
experience and motivation
89% include the wisdom of crowds
in social or outside networks
87% include traditional credentialed
experts
Growing Trends…
Social networks as sources for
information and se ranking data
Facebook as e-commerce hub (stores
and companies’ fanpages)
Content packaged in video and audio
Mobile search apps
May only search for one thing but it does
that very well
Development of niche engines for
different search modalities (Return of
vertical engines?)
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
Geneva, NY 14456
(315) 781-3014
[email protected]