Transcript Slide 1

Search and the ‘Net @ 2013
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
For Rochester Regional Library Council
Member Libraries’ Staff
Sponsored by the
Rochester Regional Library Council
Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds
granted by the
New York State Library 2012
For today . . .
 The Searchscape
 Entity-based Search
 New Services and Tools
 The Social Web
 Bing, Blekko, DuckDuckGo, Exalead
 News from Google
 A Privacy Primer
 Trends and Future Directions
 Linklist
http://people.hws.edu/hunter/searchnet13links.htm
America at the Digital Turning Point
Center for the Digital Future – USC Annenberg School for Communication
www.digitalcenter.org/pdf/CDF_10_year_digital_turning_point.pdf
 Longitudinal study over 10 years
 Over 2,000 US households surveyed each year
 “…online behavior changes relentlessly.”
 “…constant social connection, unlimited access to
information, and unprecedented abilities to
purchase.”
 “…online technology creates extraordinary
demands on our time, major concerns about
privacy, and fundamental questions about the
proliferation of the digital realm…”
America at the Digital Turning Point
Selected highlights
 Americans view the Internet as an important
information source, yet many Internet users
do not trust much of the information (there)
 Our privacy is lost.
 Most printed daily newspapers will be gone in
about five years.
 The sheer overwhelming nature of technology
may be reaching a critical point.
 Because of online technology, work is
increasingly a 24/7 experience.
America at the Digital Turning Point
Time spent face-to-face with family in the
household since the Internet
The Web Worldwide
data from the International Telecommunications Union 2011
 Total Population – ca. 7 b.
 Connected to the Web – ca. 2 b.
 Mobile subscriptions – ca. 6 b.
GLOBAL
5,981,000,000
Developed nations
1,461,000,000
Developing nations
4,520,000,000
 Mobile subscriptions forecast for 2017
9 b. with 5 b. mobile broadband connections
World Internet Project
www.worldinternetproject.net
By using the Internet, people like you can better
understand politics – 2009 reporting countries
New Top Level Domains
(ICANN 1/11/12)
 .com domains almost exhausted for new
website names
 “Someone got there first”
 New businesses must pay domain brokers for
an address or register a new one with unnatural, insignificant words
 Now possible to purchase a unique TLD
(.mycompany or .ourtrademark or
.ourbrand)

Fee - $185,000 with waiting period of 2 years.
Domain Registration
 Currently unrestricted:




.com
.info
.net
.org
 Currently require proof of eligibility



.edu
.gov
.xxx
.coop
.int
.aero
.mil
.museum
.asia
Search engines and satisfaction
mdgadvertising.com (data from Pew Research)
How often do you actually find the information
you’re looking for with search engines?
Entity-based Search:
Google’s Knowledge Graph
Bing’s Satori
Entity-based search
The back end How S.E’s worked until now


Matched query terms to terms in their crawlercreated database
Results refined





Linkage patterns
Popularity
Personalization
Other (?????)
Ambiguous terms abound
“kings”
“jaguar”
“Apollo”
Can a system know????
 “Charles Dickens”

This searcher wants information about and
books by him
 “Frank Lloyd Wright”

This searcher wants information about and
pictures of buildings designed by him
The basics….
 Entity database seeded with a large“bag of
nouns” and supplemented with nouns from
web crawls identified through natural
language processing
 These nouns are mapped to another
database of information related and/or
relevant to those nouns through n.l.p. beyond
simple text matches
 Results can be customized based on click
responses from previous anonymous searches
for that query
Yahoo Research paper - 2009
http://research.yahoo.com/files/pods09-woc.pdf
 Extract structured data (addresses, prices,
item #, etc.) from web documents and
associate it with an entity
 Link relationships between entities

An actor to his films and other actors he has
worked with
 Discover categorizing information in the
document’s content



Subject headings
Reviews ( : or ) :
Type of food served
The front end Google’s Knowledge Graph:
 Focused on questions and answers
 Contextual box for ambiguous terms with
short descriptions
 Bing’s Satori:
 Focused on potential “actions” associated
with the entity

Searchers for a rock band usually want to buy
a recording, find lyrics or get tickets
 “Snapshot” panel – entity-based results from
the social web (your’s and others)
Benefits of entity-based search
 Greater predictability of searcher
satisfaction
 Discovers related information that does
not contain the search term(s)
 Disambiguates many terms
 Colocates related information from
across the Web in a variety of filetypes
The Long Tail
http://searchengineland.com/search-illustratedb2b-long-tail-seo-13237
Future challenges- the “long tail”
 Entities are now limited to the most popular
topics
 Currently no way to map complex queries to
an entity or entity group


“volcanic eruptions in the 18th century”
“Lady Gaga concerts in a warm location”
 Currently limited to English only
 Including more entities in English and other
languages will greatly increase processing
and impact response time
New Services and Tools
Realtime, Metas and Collaboration
http://marketingland.com/new-social-discovery-engine-bottlenoseaims-to-take-over-real-time-exploration-17024
Bottlenose: A realtime meta
 Launched 8/12 (public beta)
 Homepage access via login to your social
network (gives Bottlenose access also)
 Click into Social Search tab and search a
category with no login (11/27/12)
 Searches all the major social networks
 Events, trending topics and people
 Tabs to sort, organize and display
 Mobile apps available
Terrier – www.smartfp7.eu
 Open source Research Project of the EU
based at U. of Glasgow
 Real-time information about the “real world”


Current traffic conditions at a specific
intersection
My friends’ favorite bar right now
 “Smart Cities” concept

Physical spaces covered in an array of
intelligent sensors which communicate and
can be searched for information
Zuula: a multi-meta
 Web search includes Google, Bing,Yahoo,
Gigablast, Exalead, Alexa, EntireWeb, Mahalo,
Mojeek
 Unique sources and settings available for
each type of search:

Web
News
Images
Tags
Blogs
Jobs
 Tab through results from each source engine
Polymeta
 Web search includes Google, Bing, Ask, Yahoo,
Exalead
 Source selection available for each search type


Web News Images Videos Twitter Blogs
Twitter search is limited to top 50 containing your
search terms
 Faceted and graphed results available
 Related results from other search types appear
to the right
Searchteam.com
 Search engine with wiki-like, real-time
collaborative work spaces
 “Collective knowledge from your trusted social
network circles”
 Web sites Videos (YouTube) Images
Reference (Wikipedia) Educational
Books and Articles (Amazon)
 Faceted results and suggested searches
 Related main topics
 Subtopics
 Related searches (suggested)
Searchteam.com
 SearchSpaces


Organize and share links
Online forum for collaborative searching with
friends
 Small database
 Educational tab not inclusive of all .edu
domains
 Results counts unreliable
GapVis
nrabinowitz.github.com/gapvis/index.html
 Maps occurrences of geographic places in
texts
 Currently includes public domain texts of
Graeco-Roman literature
 Project of classical scholars and visualization
designers in the US and UK
 In beta
The Shape of Today’s Social Web
Why search the social web???
 Public responses/attitudes/primary sources




Breaking news
Trending topics and people
Latest product reviews
Companies and competition
 Security, technology topics (latest virus, etc.)
 Locate individuals and their networks


Who they follow, who follows them
People interested in a topic/hobby
 Monitor collaborations
Social Networks in the Egyptian Revolution
1/25/11-2/11/11
Enabling protesters to become citizen journalists
Mining Today’s Social Web:
The trust factors
 People you don’t know


Wikipedia
Human-created databases, directories
“I need a few good sites on solar
energy”


Mahalo, Ipl2.org
Q&A Services
“How do I repair my garage door
opener?”

Yahoo Answers, Answers.com, Mahalo Answers
Mining Today’s Social Web:
The trust factors
 People you follow

Twitter-human created Tweets
“What’s the buzz on Beyonce?”
 People you know

Post a question to friends and family
“What type of Mac should I buy?”

Facebook, LinkedIn, Google+, Bing (login via
Facebook)
Tumblr
 Microblogging platform; requires free account
 Allows users to post multimedia and other
content to a “tumblog”
 Search options



www.tumblr.com- posts searchable by authorsupplied tags only; no keyword search
Tumblow.com- offers keyword search
Google site search- more comprehensive than
tumblow

Site:www.tumblr.com +search term(s)
Twittermining
Some tweets are more “authoritative” than
others…
 Access to unfiltered, real-time perspective on
what people are thinking and doing
 Authority (and usefulness) of a tweet depends on




Who sent it
The number and “authority” of their followers
When it was sent
Documents/sites it refers to
Twittermining Tools
 Twitter.com



Requires a (free) account
Only the latest 2 weeks available
Searchable by hashtag (#)
Author-designated keyword or significant term
or phrase



#rochester
#jobs
#marketing
Twittermining Tools
 Discover Tab (access via your account)





Launched 5/12
Offers Personalized content based on your
Twitter activity
Favorites, follows, retweets, and more by
people you follow
Who to follow -Twitter accounts suggested for
you based on who you follow
Browse categories (<25) and
people/organizations heavily associated with
the categories
.
Twittermining Tools
 https://twitter.com/search-advanced



No account required
Only the latest 2 weeks available
Advanced search features






Booleans
Hashtag
Language limit
Author search (tweets from or to)
“Near this place”
Attitude – positive, negative, question
Twittermining Tools
 Storify.com





Users build social stories, bringing together
media scattered across the Web into a coherent
narrative
Access material shared with and by you and
public posts
Postings, status updates, photos, videos,
podcasts from Twitter, Facebook, YouTube,
Flickr, Instagram and more
Discover others with similar interests
Requires (free) account, via Facebook or Twitter
What does/could searching the
social web provide your library’s
users?
Established Services:
Bing, Blekko, DuckDuckGo, Exalead
The Fallacy of the Superior Search Engine
Conrad Saam*
 Is there a difference in the quality of search
results from Google and Bing?

Data set of 100 difficult queries
“clean crayon off an led t.v. screen”
“Who was Kim Jong Un’s mother?”
“wii new release rumors”
*http://searchengineland.com/google-fails-to-trouncebing-again-the-fallacy-of-the-superior-search-enginerevisited-107238
The Fallacy of the Superior Search Engine
 Evaluative factors





Timeliness
One-click access to information
Volume of content
Lack of spam
Authoritative sites appear in first 3 results
 The winner???
G. 296 B. 274
 “Bing needs to be a much better search
engine than Google to make it worth the
switch”
October, 2012
Microsoft’s Bing
 Redesigned 6/8/12
 Social search results now located in the new
Social Sidebar (Facebook-based)
 When logged in through Facebook




Ask friends
Friends who might know
People who know
Feed of questions you’ve asked your FB
friends through Bing
 Without a FB login Sidebar results come from
public posts
What Bing is NOW
 Travel- Price Predictor
 Video- Hover and get a preview
 Music: Artists – All content related to the artist
(entity-based search)
 Events – FanSnap (meta for ticket purchasing)
 Shopping – Hottest deals on the web right now
 Maps – Malls and Airports added
 Everywhere – Xbox, Mobile, iPad
Curating the web with Blekko
http://blekko.com (still in beta!)
 Human/crawler service
 Blekko (human) editors create “topic” and “built-in”
slashtags used to label content in the Blekko crawler
database.
 Registered users can create their own tags for any
site in the Blekko database for a personal,
searchable web
 Slashtags help refine results and eliminate spam
 Small but well curated database
 “AdSpam” algo blocked 1.5 m. sites in the first 6
months
Blekko: Under the hood
 3 search options



Web results
Slashtags (human/expert curation)
Likes (Facebook friends’ curation)
 Adding a slashtag limits the search to those
sites so tagged
 Note: adding multiple Blekko “topic” slashtags
limits the search to sites which have ALL the
tags
Blekko this year
 Slashtags now automatically added to
searches in 500 broad categories based on
aggregated anonymous search behavior.
 For suggested slashtagsSearch term/
 Adding /monte gives you results from 3
engines; sources revealed only after you
select the most relevant results set
 Received substantial investment from major
Russian search engine Yandex
DuckDuckGo – http://ddg.gg
 Home and search results pages redesigned
 Related “Search Suggestions” on results
pages
 “Goodies” – user-supplied questions with
answers in 20 broad categories
Entertainment
Food & Drink
Travel
Programming
Sysadmin
Web Design
Exalead –
http://exalead.com/search
 Enterprise search company based in France
with free web search as product demo
 Advanced search options appear as questions
 Database well maintained
 Faceted search results
 Used by several of the major metaengines
The Year at Google
Personalization and Social Networks
in Google Results: A Timeline
 2005 – Sites you visited given a boost (Opt-in
via Google account)
 2009 – Sites your IP address visited given a
boost by default (Opt-out possible)
 2009 – Sites mentioned by your personal
social network given a boost, but separated
from main results (Opt-in)
 2011 – Social network results blended with
main results (Opt-in)
Personalization and Social Networks:
2012 – Search Plus Your World
 Boosts in results ranking





Based on IP search behavior (Opt-out)
Based on personal search behavior (Opt-in)
Based on your social networks (Opt-in)
Based on Google+ public posts (Default;
multiple steps needed to opt-out)
Based on your private Google+ network posts
(Opt-in)
IP-based personalization
 To permanently opt-out go to Search
Settings
 To opt-out on a per-search basis use the
toggle (top right)
 Personalization based on your personal
search behavior is still opt-in
Google+
plus.google.com
 Google’s social network (requires a Google
account)
 Launched 9/19/11 (access to Twitter ended
7/2/11)
 Currently over 400 m users, 100 m active
on a monthly basis

Facebook currently over 1.01 b. active users
 Offers “hangouts” –video chat rooms within
the social network
 Businesses and organizations allowed
Google+
 “Google+1” allows Google+ member to give a
site a vote of approval
 Web search results include +1 votes,
sometimes location-based
 Best access to content is through Google:

site:plus.google.com
search term(s)
Social Networks and Results: Users Respond
A distraction and concerns about privacy
Search Lesson Plans and
Common Core Standards
 Part of Google’s search education initiative
 5 main topics with beginner, intermediate and
advanced levels
Picking the right search terms
 Understanding search results
 Narrowing a search to get the best results
 Searching for evidence for research tasks
 Evaluating credibility of sources
 google.com/insidesearch/searcheducation/lessons.html

Search Lesson Plans
 Focus is using Google, but adaptable to other
sources
 Each plan lists Common Core Standards
addressed
 Include illustrative slides and suggested
assessments of student work
 “A Google-a-day challenge” questions with
answers
 Good strategies for deep web searching in
Advanced Level of Lesson #1
APA Lawsuit settled
 2005 – Association of American Publishers
and McGraw-Hill, Person, Penguin, John
Wiley, Simon & Shuster allege copyright
violation in the Library scanning project
 2012- Google settles with publishers, who
may now remove their books or journals from
the Library project
 Author’s Guild suit remains unsettled
Content Removal Requests 1/12 – 6/12
Top 6 countries
Country
Total Requests
US
4167
UK
3193
Brazil
2310
Turkey
2084
Germany
1903
France
1250
A Privacy Primer
SHARING USER INFORMATION HAS
BECOME THE INDUSTRY NORM
Search engines and privacy
Google’s policy for its accountbased services
 New unified privacy policy in effect 3/1/12
 User profiles and individual search behavior
will be shared among all Google services that
require a login
 Account holders cannot opt-out of this
sharing
 Separate privacy policies still in effect for
Google Books and Chrome
Google’s policy for services not
requiring an account
 Covers Search, Youtube
 IP-based personalization in effect since 2009
 “We will not combine Double-Click cookie
information with personally identifiable
information unless we have your opt-in
consent”
Remarketing or retargeting in
the Google ad network
 Company and other websites tag visitors with
an IP-based (personally anonymous) cookie
 When you visit other sites in Google’s ad
network you will see ads from sites you have
visited before based on these cookies
 How to opt-out of remarketing/retargeting in
your browser



Turn off Web history
Clear/Remove Web history
Accept no cookies
Bing’s privacy policy
 For MS services that require a Windows Live ID
 “…information collected through one MS service
may be combined with information obtained
through other Microsoft services.”
 Signing into one service may automatically sign
you into other Microsoft services
 To opt-out
 Use separate browsers for each MS service you
access
 Sign in and out of your accounts throughout the
day to de-couple specific activities
DuckDuckGo
 Does not collect or share personal
information
 No browser cookies stored
 No personally identifiable or IP-based search
histories stored
 No IP addresses stored
 Very comprehensive with high-quality search
results
Current Trends and
Future Directions
Search Engine Trends in 2012
 Reversal in transparency at the major
services
 Increasing personalization as the norm
 Explosion of social network influence
 Stronger anti-competitive allegations
 Modest Bing marketshare gains
“The nature of the Internet is undergoing a
paradigm shift” – Matthew Berk (Zyxt Labs)
http://zyxt.com/post/26851542949/study-of-1-3-billion-urls22-of-web-pages-reference
 2012 study of 1.3 billion URLs



22% of web pages contain Facebook URLs
Among 500 m. hardcoded links to Facebook
only 3.5 m. are unique
URLs from Common Crawl (open repository of
web crawl data that can be accessed and
analyzed by everyone)
“The Internet is shifting….” – M. Berk
 from unstructured to structured content
Structured content can be parsed and
formatted into any other type of content
 Unstructured content- static html
 from websites to entities
 Nodes in social and other networks that
contain or link to websites and other content
 from links to connection
 Growth of business and personal presence on
the social web

In the future -- Mobile search will continue to grow rapidly
 Entity-based search will continue to develop
 Personalization will grow but more slowly as
users better understand the consequences
 Social networks will continue as powerful
tools for grassroots political movements
 Web access and web search will attract more
government scrutiny worldwide
Thank You and
Enjoy Your Searching!
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
Geneva, NY 14456
(315) 781-3014
[email protected]