Transcript Slide 1
Search and the ‘Net @ 2013 Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2012 For today . . . The Searchscape Entity-based Search New Services and Tools The Social Web Bing, Blekko, DuckDuckGo, Exalead News from Google A Privacy Primer Trends and Future Directions Linklist http://people.hws.edu/hunter/searchnet13links.htm America at the Digital Turning Point Center for the Digital Future – USC Annenberg School for Communication www.digitalcenter.org/pdf/CDF_10_year_digital_turning_point.pdf Longitudinal study over 10 years Over 2,000 US households surveyed each year “…online behavior changes relentlessly.” “…constant social connection, unlimited access to information, and unprecedented abilities to purchase.” “…online technology creates extraordinary demands on our time, major concerns about privacy, and fundamental questions about the proliferation of the digital realm…” America at the Digital Turning Point Selected highlights Americans view the Internet as an important information source, yet many Internet users do not trust much of the information (there) Our privacy is lost. Most printed daily newspapers will be gone in about five years. The sheer overwhelming nature of technology may be reaching a critical point. Because of online technology, work is increasingly a 24/7 experience. America at the Digital Turning Point Time spent face-to-face with family in the household since the Internet The Web Worldwide data from the International Telecommunications Union 2011 Total Population – ca. 7 b. Connected to the Web – ca. 2 b. Mobile subscriptions – ca. 6 b. GLOBAL 5,981,000,000 Developed nations 1,461,000,000 Developing nations 4,520,000,000 Mobile subscriptions forecast for 2017 9 b. with 5 b. mobile broadband connections World Internet Project www.worldinternetproject.net By using the Internet, people like you can better understand politics – 2009 reporting countries New Top Level Domains (ICANN 1/11/12) .com domains almost exhausted for new website names “Someone got there first” New businesses must pay domain brokers for an address or register a new one with unnatural, insignificant words Now possible to purchase a unique TLD (.mycompany or .ourtrademark or .ourbrand) Fee - $185,000 with waiting period of 2 years. Domain Registration Currently unrestricted: .com .info .net .org Currently require proof of eligibility .edu .gov .xxx .coop .int .aero .mil .museum .asia Search engines and satisfaction mdgadvertising.com (data from Pew Research) How often do you actually find the information you’re looking for with search engines? Entity-based Search: Google’s Knowledge Graph Bing’s Satori Entity-based search The back end How S.E’s worked until now Matched query terms to terms in their crawlercreated database Results refined Linkage patterns Popularity Personalization Other (?????) Ambiguous terms abound “kings” “jaguar” “Apollo” Can a system know???? “Charles Dickens” This searcher wants information about and books by him “Frank Lloyd Wright” This searcher wants information about and pictures of buildings designed by him The basics…. Entity database seeded with a large“bag of nouns” and supplemented with nouns from web crawls identified through natural language processing These nouns are mapped to another database of information related and/or relevant to those nouns through n.l.p. beyond simple text matches Results can be customized based on click responses from previous anonymous searches for that query Yahoo Research paper - 2009 http://research.yahoo.com/files/pods09-woc.pdf Extract structured data (addresses, prices, item #, etc.) from web documents and associate it with an entity Link relationships between entities An actor to his films and other actors he has worked with Discover categorizing information in the document’s content Subject headings Reviews ( : or ) : Type of food served The front end Google’s Knowledge Graph: Focused on questions and answers Contextual box for ambiguous terms with short descriptions Bing’s Satori: Focused on potential “actions” associated with the entity Searchers for a rock band usually want to buy a recording, find lyrics or get tickets “Snapshot” panel – entity-based results from the social web (your’s and others) Benefits of entity-based search Greater predictability of searcher satisfaction Discovers related information that does not contain the search term(s) Disambiguates many terms Colocates related information from across the Web in a variety of filetypes The Long Tail http://searchengineland.com/search-illustratedb2b-long-tail-seo-13237 Future challenges- the “long tail” Entities are now limited to the most popular topics Currently no way to map complex queries to an entity or entity group “volcanic eruptions in the 18th century” “Lady Gaga concerts in a warm location” Currently limited to English only Including more entities in English and other languages will greatly increase processing and impact response time New Services and Tools Realtime, Metas and Collaboration http://marketingland.com/new-social-discovery-engine-bottlenoseaims-to-take-over-real-time-exploration-17024 Bottlenose: A realtime meta Launched 8/12 (public beta) Homepage access via login to your social network (gives Bottlenose access also) Click into Social Search tab and search a category with no login (11/27/12) Searches all the major social networks Events, trending topics and people Tabs to sort, organize and display Mobile apps available Terrier – www.smartfp7.eu Open source Research Project of the EU based at U. of Glasgow Real-time information about the “real world” Current traffic conditions at a specific intersection My friends’ favorite bar right now “Smart Cities” concept Physical spaces covered in an array of intelligent sensors which communicate and can be searched for information Zuula: a multi-meta Web search includes Google, Bing,Yahoo, Gigablast, Exalead, Alexa, EntireWeb, Mahalo, Mojeek Unique sources and settings available for each type of search: Web News Images Tags Blogs Jobs Tab through results from each source engine Polymeta Web search includes Google, Bing, Ask, Yahoo, Exalead Source selection available for each search type Web News Images Videos Twitter Blogs Twitter search is limited to top 50 containing your search terms Faceted and graphed results available Related results from other search types appear to the right Searchteam.com Search engine with wiki-like, real-time collaborative work spaces “Collective knowledge from your trusted social network circles” Web sites Videos (YouTube) Images Reference (Wikipedia) Educational Books and Articles (Amazon) Faceted results and suggested searches Related main topics Subtopics Related searches (suggested) Searchteam.com SearchSpaces Organize and share links Online forum for collaborative searching with friends Small database Educational tab not inclusive of all .edu domains Results counts unreliable GapVis nrabinowitz.github.com/gapvis/index.html Maps occurrences of geographic places in texts Currently includes public domain texts of Graeco-Roman literature Project of classical scholars and visualization designers in the US and UK In beta The Shape of Today’s Social Web Why search the social web??? Public responses/attitudes/primary sources Breaking news Trending topics and people Latest product reviews Companies and competition Security, technology topics (latest virus, etc.) Locate individuals and their networks Who they follow, who follows them People interested in a topic/hobby Monitor collaborations Social Networks in the Egyptian Revolution 1/25/11-2/11/11 Enabling protesters to become citizen journalists Mining Today’s Social Web: The trust factors People you don’t know Wikipedia Human-created databases, directories “I need a few good sites on solar energy” Mahalo, Ipl2.org Q&A Services “How do I repair my garage door opener?” Yahoo Answers, Answers.com, Mahalo Answers Mining Today’s Social Web: The trust factors People you follow Twitter-human created Tweets “What’s the buzz on Beyonce?” People you know Post a question to friends and family “What type of Mac should I buy?” Facebook, LinkedIn, Google+, Bing (login via Facebook) Tumblr Microblogging platform; requires free account Allows users to post multimedia and other content to a “tumblog” Search options www.tumblr.com- posts searchable by authorsupplied tags only; no keyword search Tumblow.com- offers keyword search Google site search- more comprehensive than tumblow Site:www.tumblr.com +search term(s) Twittermining Some tweets are more “authoritative” than others… Access to unfiltered, real-time perspective on what people are thinking and doing Authority (and usefulness) of a tweet depends on Who sent it The number and “authority” of their followers When it was sent Documents/sites it refers to Twittermining Tools Twitter.com Requires a (free) account Only the latest 2 weeks available Searchable by hashtag (#) Author-designated keyword or significant term or phrase #rochester #jobs #marketing Twittermining Tools Discover Tab (access via your account) Launched 5/12 Offers Personalized content based on your Twitter activity Favorites, follows, retweets, and more by people you follow Who to follow -Twitter accounts suggested for you based on who you follow Browse categories (<25) and people/organizations heavily associated with the categories . Twittermining Tools https://twitter.com/search-advanced No account required Only the latest 2 weeks available Advanced search features Booleans Hashtag Language limit Author search (tweets from or to) “Near this place” Attitude – positive, negative, question Twittermining Tools Storify.com Users build social stories, bringing together media scattered across the Web into a coherent narrative Access material shared with and by you and public posts Postings, status updates, photos, videos, podcasts from Twitter, Facebook, YouTube, Flickr, Instagram and more Discover others with similar interests Requires (free) account, via Facebook or Twitter What does/could searching the social web provide your library’s users? Established Services: Bing, Blekko, DuckDuckGo, Exalead The Fallacy of the Superior Search Engine Conrad Saam* Is there a difference in the quality of search results from Google and Bing? Data set of 100 difficult queries “clean crayon off an led t.v. screen” “Who was Kim Jong Un’s mother?” “wii new release rumors” *http://searchengineland.com/google-fails-to-trouncebing-again-the-fallacy-of-the-superior-search-enginerevisited-107238 The Fallacy of the Superior Search Engine Evaluative factors Timeliness One-click access to information Volume of content Lack of spam Authoritative sites appear in first 3 results The winner??? G. 296 B. 274 “Bing needs to be a much better search engine than Google to make it worth the switch” October, 2012 Microsoft’s Bing Redesigned 6/8/12 Social search results now located in the new Social Sidebar (Facebook-based) When logged in through Facebook Ask friends Friends who might know People who know Feed of questions you’ve asked your FB friends through Bing Without a FB login Sidebar results come from public posts What Bing is NOW Travel- Price Predictor Video- Hover and get a preview Music: Artists – All content related to the artist (entity-based search) Events – FanSnap (meta for ticket purchasing) Shopping – Hottest deals on the web right now Maps – Malls and Airports added Everywhere – Xbox, Mobile, iPad Curating the web with Blekko http://blekko.com (still in beta!) Human/crawler service Blekko (human) editors create “topic” and “built-in” slashtags used to label content in the Blekko crawler database. Registered users can create their own tags for any site in the Blekko database for a personal, searchable web Slashtags help refine results and eliminate spam Small but well curated database “AdSpam” algo blocked 1.5 m. sites in the first 6 months Blekko: Under the hood 3 search options Web results Slashtags (human/expert curation) Likes (Facebook friends’ curation) Adding a slashtag limits the search to those sites so tagged Note: adding multiple Blekko “topic” slashtags limits the search to sites which have ALL the tags Blekko this year Slashtags now automatically added to searches in 500 broad categories based on aggregated anonymous search behavior. For suggested slashtagsSearch term/ Adding /monte gives you results from 3 engines; sources revealed only after you select the most relevant results set Received substantial investment from major Russian search engine Yandex DuckDuckGo – http://ddg.gg Home and search results pages redesigned Related “Search Suggestions” on results pages “Goodies” – user-supplied questions with answers in 20 broad categories Entertainment Food & Drink Travel Programming Sysadmin Web Design Exalead – http://exalead.com/search Enterprise search company based in France with free web search as product demo Advanced search options appear as questions Database well maintained Faceted search results Used by several of the major metaengines The Year at Google Personalization and Social Networks in Google Results: A Timeline 2005 – Sites you visited given a boost (Opt-in via Google account) 2009 – Sites your IP address visited given a boost by default (Opt-out possible) 2009 – Sites mentioned by your personal social network given a boost, but separated from main results (Opt-in) 2011 – Social network results blended with main results (Opt-in) Personalization and Social Networks: 2012 – Search Plus Your World Boosts in results ranking Based on IP search behavior (Opt-out) Based on personal search behavior (Opt-in) Based on your social networks (Opt-in) Based on Google+ public posts (Default; multiple steps needed to opt-out) Based on your private Google+ network posts (Opt-in) IP-based personalization To permanently opt-out go to Search Settings To opt-out on a per-search basis use the toggle (top right) Personalization based on your personal search behavior is still opt-in Google+ plus.google.com Google’s social network (requires a Google account) Launched 9/19/11 (access to Twitter ended 7/2/11) Currently over 400 m users, 100 m active on a monthly basis Facebook currently over 1.01 b. active users Offers “hangouts” –video chat rooms within the social network Businesses and organizations allowed Google+ “Google+1” allows Google+ member to give a site a vote of approval Web search results include +1 votes, sometimes location-based Best access to content is through Google: site:plus.google.com search term(s) Social Networks and Results: Users Respond A distraction and concerns about privacy Search Lesson Plans and Common Core Standards Part of Google’s search education initiative 5 main topics with beginner, intermediate and advanced levels Picking the right search terms Understanding search results Narrowing a search to get the best results Searching for evidence for research tasks Evaluating credibility of sources google.com/insidesearch/searcheducation/lessons.html Search Lesson Plans Focus is using Google, but adaptable to other sources Each plan lists Common Core Standards addressed Include illustrative slides and suggested assessments of student work “A Google-a-day challenge” questions with answers Good strategies for deep web searching in Advanced Level of Lesson #1 APA Lawsuit settled 2005 – Association of American Publishers and McGraw-Hill, Person, Penguin, John Wiley, Simon & Shuster allege copyright violation in the Library scanning project 2012- Google settles with publishers, who may now remove their books or journals from the Library project Author’s Guild suit remains unsettled Content Removal Requests 1/12 – 6/12 Top 6 countries Country Total Requests US 4167 UK 3193 Brazil 2310 Turkey 2084 Germany 1903 France 1250 A Privacy Primer SHARING USER INFORMATION HAS BECOME THE INDUSTRY NORM Search engines and privacy Google’s policy for its accountbased services New unified privacy policy in effect 3/1/12 User profiles and individual search behavior will be shared among all Google services that require a login Account holders cannot opt-out of this sharing Separate privacy policies still in effect for Google Books and Chrome Google’s policy for services not requiring an account Covers Search, Youtube IP-based personalization in effect since 2009 “We will not combine Double-Click cookie information with personally identifiable information unless we have your opt-in consent” Remarketing or retargeting in the Google ad network Company and other websites tag visitors with an IP-based (personally anonymous) cookie When you visit other sites in Google’s ad network you will see ads from sites you have visited before based on these cookies How to opt-out of remarketing/retargeting in your browser Turn off Web history Clear/Remove Web history Accept no cookies Bing’s privacy policy For MS services that require a Windows Live ID “…information collected through one MS service may be combined with information obtained through other Microsoft services.” Signing into one service may automatically sign you into other Microsoft services To opt-out Use separate browsers for each MS service you access Sign in and out of your accounts throughout the day to de-couple specific activities DuckDuckGo Does not collect or share personal information No browser cookies stored No personally identifiable or IP-based search histories stored No IP addresses stored Very comprehensive with high-quality search results Current Trends and Future Directions Search Engine Trends in 2012 Reversal in transparency at the major services Increasing personalization as the norm Explosion of social network influence Stronger anti-competitive allegations Modest Bing marketshare gains “The nature of the Internet is undergoing a paradigm shift” – Matthew Berk (Zyxt Labs) http://zyxt.com/post/26851542949/study-of-1-3-billion-urls22-of-web-pages-reference 2012 study of 1.3 billion URLs 22% of web pages contain Facebook URLs Among 500 m. hardcoded links to Facebook only 3.5 m. are unique URLs from Common Crawl (open repository of web crawl data that can be accessed and analyzed by everyone) “The Internet is shifting….” – M. Berk from unstructured to structured content Structured content can be parsed and formatted into any other type of content Unstructured content- static html from websites to entities Nodes in social and other networks that contain or link to websites and other content from links to connection Growth of business and personal presence on the social web In the future -- Mobile search will continue to grow rapidly Entity-based search will continue to develop Personalization will grow but more slowly as users better understand the consequences Social networks will continue as powerful tools for grassroots political movements Web access and web search will attract more government scrutiny worldwide Thank You and Enjoy Your Searching! Michael Hunter Reference Librarian Hobart and William Smith Colleges Geneva, NY 14456 (315) 781-3014 [email protected]