Transcript Slide 1
Search and the ‘Net @ 2013
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
For Rochester Regional Library Council
Member Libraries’ Staff
Sponsored by the
Rochester Regional Library Council
Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds
granted by the
New York State Library 2012
For today . . .
The Searchscape
Entity-based Search
New Services and Tools
The Social Web
Bing, Blekko, DuckDuckGo, Exalead
News from Google
A Privacy Primer
Trends and Future Directions
Linklist
http://people.hws.edu/hunter/searchnet13links.htm
America at the Digital Turning Point
Center for the Digital Future – USC Annenberg School for Communication
www.digitalcenter.org/pdf/CDF_10_year_digital_turning_point.pdf
Longitudinal study over 10 years
Over 2,000 US households surveyed each year
“…online behavior changes relentlessly.”
“…constant social connection, unlimited access to
information, and unprecedented abilities to
purchase.”
“…online technology creates extraordinary
demands on our time, major concerns about
privacy, and fundamental questions about the
proliferation of the digital realm…”
America at the Digital Turning Point
Selected highlights
Americans view the Internet as an important
information source, yet many Internet users
do not trust much of the information (there)
Our privacy is lost.
Most printed daily newspapers will be gone in
about five years.
The sheer overwhelming nature of technology
may be reaching a critical point.
Because of online technology, work is
increasingly a 24/7 experience.
America at the Digital Turning Point
Time spent face-to-face with family in the
household since the Internet
The Web Worldwide
data from the International Telecommunications Union 2011
Total Population – ca. 7 b.
Connected to the Web – ca. 2 b.
Mobile subscriptions – ca. 6 b.
GLOBAL
5,981,000,000
Developed nations
1,461,000,000
Developing nations
4,520,000,000
Mobile subscriptions forecast for 2017
9 b. with 5 b. mobile broadband connections
World Internet Project
www.worldinternetproject.net
By using the Internet, people like you can better
understand politics – 2009 reporting countries
New Top Level Domains
(ICANN 1/11/12)
.com domains almost exhausted for new
website names
“Someone got there first”
New businesses must pay domain brokers for
an address or register a new one with unnatural, insignificant words
Now possible to purchase a unique TLD
(.mycompany or .ourtrademark or
.ourbrand)
Fee - $185,000 with waiting period of 2 years.
Domain Registration
Currently unrestricted:
.com
.info
.net
.org
Currently require proof of eligibility
.edu
.gov
.xxx
.coop
.int
.aero
.mil
.museum
.asia
Search engines and satisfaction
mdgadvertising.com (data from Pew Research)
How often do you actually find the information
you’re looking for with search engines?
Entity-based Search:
Google’s Knowledge Graph
Bing’s Satori
Entity-based search
The back end How S.E’s worked until now
Matched query terms to terms in their crawlercreated database
Results refined
Linkage patterns
Popularity
Personalization
Other (?????)
Ambiguous terms abound
“kings”
“jaguar”
“Apollo”
Can a system know????
“Charles Dickens”
This searcher wants information about and
books by him
“Frank Lloyd Wright”
This searcher wants information about and
pictures of buildings designed by him
The basics….
Entity database seeded with a large“bag of
nouns” and supplemented with nouns from
web crawls identified through natural
language processing
These nouns are mapped to another
database of information related and/or
relevant to those nouns through n.l.p. beyond
simple text matches
Results can be customized based on click
responses from previous anonymous searches
for that query
Yahoo Research paper - 2009
http://research.yahoo.com/files/pods09-woc.pdf
Extract structured data (addresses, prices,
item #, etc.) from web documents and
associate it with an entity
Link relationships between entities
An actor to his films and other actors he has
worked with
Discover categorizing information in the
document’s content
Subject headings
Reviews ( : or ) :
Type of food served
The front end Google’s Knowledge Graph:
Focused on questions and answers
Contextual box for ambiguous terms with
short descriptions
Bing’s Satori:
Focused on potential “actions” associated
with the entity
Searchers for a rock band usually want to buy
a recording, find lyrics or get tickets
“Snapshot” panel – entity-based results from
the social web (your’s and others)
Benefits of entity-based search
Greater predictability of searcher
satisfaction
Discovers related information that does
not contain the search term(s)
Disambiguates many terms
Colocates related information from
across the Web in a variety of filetypes
The Long Tail
http://searchengineland.com/search-illustratedb2b-long-tail-seo-13237
Future challenges- the “long tail”
Entities are now limited to the most popular
topics
Currently no way to map complex queries to
an entity or entity group
“volcanic eruptions in the 18th century”
“Lady Gaga concerts in a warm location”
Currently limited to English only
Including more entities in English and other
languages will greatly increase processing
and impact response time
New Services and Tools
Realtime, Metas and Collaboration
http://marketingland.com/new-social-discovery-engine-bottlenoseaims-to-take-over-real-time-exploration-17024
Bottlenose: A realtime meta
Launched 8/12 (public beta)
Homepage access via login to your social
network (gives Bottlenose access also)
Click into Social Search tab and search a
category with no login (11/27/12)
Searches all the major social networks
Events, trending topics and people
Tabs to sort, organize and display
Mobile apps available
Terrier – www.smartfp7.eu
Open source Research Project of the EU
based at U. of Glasgow
Real-time information about the “real world”
Current traffic conditions at a specific
intersection
My friends’ favorite bar right now
“Smart Cities” concept
Physical spaces covered in an array of
intelligent sensors which communicate and
can be searched for information
Zuula: a multi-meta
Web search includes Google, Bing,Yahoo,
Gigablast, Exalead, Alexa, EntireWeb, Mahalo,
Mojeek
Unique sources and settings available for
each type of search:
Web
News
Images
Tags
Blogs
Jobs
Tab through results from each source engine
Polymeta
Web search includes Google, Bing, Ask, Yahoo,
Exalead
Source selection available for each search type
Web News Images Videos Twitter Blogs
Twitter search is limited to top 50 containing your
search terms
Faceted and graphed results available
Related results from other search types appear
to the right
Searchteam.com
Search engine with wiki-like, real-time
collaborative work spaces
“Collective knowledge from your trusted social
network circles”
Web sites Videos (YouTube) Images
Reference (Wikipedia) Educational
Books and Articles (Amazon)
Faceted results and suggested searches
Related main topics
Subtopics
Related searches (suggested)
Searchteam.com
SearchSpaces
Organize and share links
Online forum for collaborative searching with
friends
Small database
Educational tab not inclusive of all .edu
domains
Results counts unreliable
GapVis
nrabinowitz.github.com/gapvis/index.html
Maps occurrences of geographic places in
texts
Currently includes public domain texts of
Graeco-Roman literature
Project of classical scholars and visualization
designers in the US and UK
In beta
The Shape of Today’s Social Web
Why search the social web???
Public responses/attitudes/primary sources
Breaking news
Trending topics and people
Latest product reviews
Companies and competition
Security, technology topics (latest virus, etc.)
Locate individuals and their networks
Who they follow, who follows them
People interested in a topic/hobby
Monitor collaborations
Social Networks in the Egyptian Revolution
1/25/11-2/11/11
Enabling protesters to become citizen journalists
Mining Today’s Social Web:
The trust factors
People you don’t know
Wikipedia
Human-created databases, directories
“I need a few good sites on solar
energy”
Mahalo, Ipl2.org
Q&A Services
“How do I repair my garage door
opener?”
Yahoo Answers, Answers.com, Mahalo Answers
Mining Today’s Social Web:
The trust factors
People you follow
Twitter-human created Tweets
“What’s the buzz on Beyonce?”
People you know
Post a question to friends and family
“What type of Mac should I buy?”
Facebook, LinkedIn, Google+, Bing (login via
Facebook)
Tumblr
Microblogging platform; requires free account
Allows users to post multimedia and other
content to a “tumblog”
Search options
www.tumblr.com- posts searchable by authorsupplied tags only; no keyword search
Tumblow.com- offers keyword search
Google site search- more comprehensive than
tumblow
Site:www.tumblr.com +search term(s)
Twittermining
Some tweets are more “authoritative” than
others…
Access to unfiltered, real-time perspective on
what people are thinking and doing
Authority (and usefulness) of a tweet depends on
Who sent it
The number and “authority” of their followers
When it was sent
Documents/sites it refers to
Twittermining Tools
Twitter.com
Requires a (free) account
Only the latest 2 weeks available
Searchable by hashtag (#)
Author-designated keyword or significant term
or phrase
#rochester
#jobs
#marketing
Twittermining Tools
Discover Tab (access via your account)
Launched 5/12
Offers Personalized content based on your
Twitter activity
Favorites, follows, retweets, and more by
people you follow
Who to follow -Twitter accounts suggested for
you based on who you follow
Browse categories (<25) and
people/organizations heavily associated with
the categories
.
Twittermining Tools
https://twitter.com/search-advanced
No account required
Only the latest 2 weeks available
Advanced search features
Booleans
Hashtag
Language limit
Author search (tweets from or to)
“Near this place”
Attitude – positive, negative, question
Twittermining Tools
Storify.com
Users build social stories, bringing together
media scattered across the Web into a coherent
narrative
Access material shared with and by you and
public posts
Postings, status updates, photos, videos,
podcasts from Twitter, Facebook, YouTube,
Flickr, Instagram and more
Discover others with similar interests
Requires (free) account, via Facebook or Twitter
What does/could searching the
social web provide your library’s
users?
Established Services:
Bing, Blekko, DuckDuckGo, Exalead
The Fallacy of the Superior Search Engine
Conrad Saam*
Is there a difference in the quality of search
results from Google and Bing?
Data set of 100 difficult queries
“clean crayon off an led t.v. screen”
“Who was Kim Jong Un’s mother?”
“wii new release rumors”
*http://searchengineland.com/google-fails-to-trouncebing-again-the-fallacy-of-the-superior-search-enginerevisited-107238
The Fallacy of the Superior Search Engine
Evaluative factors
Timeliness
One-click access to information
Volume of content
Lack of spam
Authoritative sites appear in first 3 results
The winner???
G. 296 B. 274
“Bing needs to be a much better search
engine than Google to make it worth the
switch”
October, 2012
Microsoft’s Bing
Redesigned 6/8/12
Social search results now located in the new
Social Sidebar (Facebook-based)
When logged in through Facebook
Ask friends
Friends who might know
People who know
Feed of questions you’ve asked your FB
friends through Bing
Without a FB login Sidebar results come from
public posts
What Bing is NOW
Travel- Price Predictor
Video- Hover and get a preview
Music: Artists – All content related to the artist
(entity-based search)
Events – FanSnap (meta for ticket purchasing)
Shopping – Hottest deals on the web right now
Maps – Malls and Airports added
Everywhere – Xbox, Mobile, iPad
Curating the web with Blekko
http://blekko.com (still in beta!)
Human/crawler service
Blekko (human) editors create “topic” and “built-in”
slashtags used to label content in the Blekko crawler
database.
Registered users can create their own tags for any
site in the Blekko database for a personal,
searchable web
Slashtags help refine results and eliminate spam
Small but well curated database
“AdSpam” algo blocked 1.5 m. sites in the first 6
months
Blekko: Under the hood
3 search options
Web results
Slashtags (human/expert curation)
Likes (Facebook friends’ curation)
Adding a slashtag limits the search to those
sites so tagged
Note: adding multiple Blekko “topic” slashtags
limits the search to sites which have ALL the
tags
Blekko this year
Slashtags now automatically added to
searches in 500 broad categories based on
aggregated anonymous search behavior.
For suggested slashtagsSearch term/
Adding /monte gives you results from 3
engines; sources revealed only after you
select the most relevant results set
Received substantial investment from major
Russian search engine Yandex
DuckDuckGo – http://ddg.gg
Home and search results pages redesigned
Related “Search Suggestions” on results
pages
“Goodies” – user-supplied questions with
answers in 20 broad categories
Entertainment
Food & Drink
Travel
Programming
Sysadmin
Web Design
Exalead –
http://exalead.com/search
Enterprise search company based in France
with free web search as product demo
Advanced search options appear as questions
Database well maintained
Faceted search results
Used by several of the major metaengines
The Year at Google
Personalization and Social Networks
in Google Results: A Timeline
2005 – Sites you visited given a boost (Opt-in
via Google account)
2009 – Sites your IP address visited given a
boost by default (Opt-out possible)
2009 – Sites mentioned by your personal
social network given a boost, but separated
from main results (Opt-in)
2011 – Social network results blended with
main results (Opt-in)
Personalization and Social Networks:
2012 – Search Plus Your World
Boosts in results ranking
Based on IP search behavior (Opt-out)
Based on personal search behavior (Opt-in)
Based on your social networks (Opt-in)
Based on Google+ public posts (Default;
multiple steps needed to opt-out)
Based on your private Google+ network posts
(Opt-in)
IP-based personalization
To permanently opt-out go to Search
Settings
To opt-out on a per-search basis use the
toggle (top right)
Personalization based on your personal
search behavior is still opt-in
Google+
plus.google.com
Google’s social network (requires a Google
account)
Launched 9/19/11 (access to Twitter ended
7/2/11)
Currently over 400 m users, 100 m active
on a monthly basis
Facebook currently over 1.01 b. active users
Offers “hangouts” –video chat rooms within
the social network
Businesses and organizations allowed
Google+
“Google+1” allows Google+ member to give a
site a vote of approval
Web search results include +1 votes,
sometimes location-based
Best access to content is through Google:
site:plus.google.com
search term(s)
Social Networks and Results: Users Respond
A distraction and concerns about privacy
Search Lesson Plans and
Common Core Standards
Part of Google’s search education initiative
5 main topics with beginner, intermediate and
advanced levels
Picking the right search terms
Understanding search results
Narrowing a search to get the best results
Searching for evidence for research tasks
Evaluating credibility of sources
google.com/insidesearch/searcheducation/lessons.html
Search Lesson Plans
Focus is using Google, but adaptable to other
sources
Each plan lists Common Core Standards
addressed
Include illustrative slides and suggested
assessments of student work
“A Google-a-day challenge” questions with
answers
Good strategies for deep web searching in
Advanced Level of Lesson #1
APA Lawsuit settled
2005 – Association of American Publishers
and McGraw-Hill, Person, Penguin, John
Wiley, Simon & Shuster allege copyright
violation in the Library scanning project
2012- Google settles with publishers, who
may now remove their books or journals from
the Library project
Author’s Guild suit remains unsettled
Content Removal Requests 1/12 – 6/12
Top 6 countries
Country
Total Requests
US
4167
UK
3193
Brazil
2310
Turkey
2084
Germany
1903
France
1250
A Privacy Primer
SHARING USER INFORMATION HAS
BECOME THE INDUSTRY NORM
Search engines and privacy
Google’s policy for its accountbased services
New unified privacy policy in effect 3/1/12
User profiles and individual search behavior
will be shared among all Google services that
require a login
Account holders cannot opt-out of this
sharing
Separate privacy policies still in effect for
Google Books and Chrome
Google’s policy for services not
requiring an account
Covers Search, Youtube
IP-based personalization in effect since 2009
“We will not combine Double-Click cookie
information with personally identifiable
information unless we have your opt-in
consent”
Remarketing or retargeting in
the Google ad network
Company and other websites tag visitors with
an IP-based (personally anonymous) cookie
When you visit other sites in Google’s ad
network you will see ads from sites you have
visited before based on these cookies
How to opt-out of remarketing/retargeting in
your browser
Turn off Web history
Clear/Remove Web history
Accept no cookies
Bing’s privacy policy
For MS services that require a Windows Live ID
“…information collected through one MS service
may be combined with information obtained
through other Microsoft services.”
Signing into one service may automatically sign
you into other Microsoft services
To opt-out
Use separate browsers for each MS service you
access
Sign in and out of your accounts throughout the
day to de-couple specific activities
DuckDuckGo
Does not collect or share personal
information
No browser cookies stored
No personally identifiable or IP-based search
histories stored
No IP addresses stored
Very comprehensive with high-quality search
results
Current Trends and
Future Directions
Search Engine Trends in 2012
Reversal in transparency at the major
services
Increasing personalization as the norm
Explosion of social network influence
Stronger anti-competitive allegations
Modest Bing marketshare gains
“The nature of the Internet is undergoing a
paradigm shift” – Matthew Berk (Zyxt Labs)
http://zyxt.com/post/26851542949/study-of-1-3-billion-urls22-of-web-pages-reference
2012 study of 1.3 billion URLs
22% of web pages contain Facebook URLs
Among 500 m. hardcoded links to Facebook
only 3.5 m. are unique
URLs from Common Crawl (open repository of
web crawl data that can be accessed and
analyzed by everyone)
“The Internet is shifting….” – M. Berk
from unstructured to structured content
Structured content can be parsed and
formatted into any other type of content
Unstructured content- static html
from websites to entities
Nodes in social and other networks that
contain or link to websites and other content
from links to connection
Growth of business and personal presence on
the social web
In the future -- Mobile search will continue to grow rapidly
Entity-based search will continue to develop
Personalization will grow but more slowly as
users better understand the consequences
Social networks will continue as powerful
tools for grassroots political movements
Web access and web search will attract more
government scrutiny worldwide
Thank You and
Enjoy Your Searching!
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
Geneva, NY 14456
(315) 781-3014
[email protected]