www.aryansexport.com

Download Report

Transcript www.aryansexport.com

By
BHARATH B S
4VV05CS009
http://powerpointpresentationon.blogspot.com
Agenda
•Definition
•Background
•Types
•Applications
•Constructing CAPTCHAs
•Breaking CAPTCHAs
•Issues with CAPTCHAs
•Conclusion
Intro
•CAPTCHA
Completely Automated Public
Turing test to tell Computers and Humans
Apart
•Invented at CMU by Luis von Ahn, Manuel
Blum, et. al
•A program that is a challenge – response
test to separate humans from computer
•Generic CAPTCHAs distort letters and
numbers
•Distorted characters are presented to user
•User has to recognize the distorted letters
•If the guessed letters are correct, the user
is inferred to be a human and allowed
access
•Else, user is a bot and denied access
•Humans can read the distorted and noisy
text
•Current OCRs cannot read them
Background
•Why CAPTCHA was needed?
• Sabotage of online polls
• Spam emails
• Abusing free online accounts
• Tampering with rankings on recommendation
systems (like EBay, Amazon)
•Altavista first used a crude CAPTCHA in
their sites
•Resulted in 95% spam reduction
•Yahoo partnered CMU to counter these
threats in Messenger chat service.
•Luis von Ahn and Manuel Blum of CMU
trademarked CAPTCHA in 2000
•What is a Turing test?
o Proposed by Alan Turing
o To test a machine’s level of intelligence
o Human judge asks questions to two
participants, one is a machine, he doesn’t
know which is which
o If judge can’t tell which is the machine, the
machine passes the test
o CAPTCHA employs a reverse Turing test,
judge = CAPTCHA program,
participant = user
if user passes CAPTCHA, he is human
Types of CAPTCHAs
•Text based:
• Simple, normal language questions:
 What is sum of three and thirty-five?
 If today is Saturday, what is day after tomorrow?
 Which of mango, table, water is a fruit?
o Very effective, needs a large question bank
o Cognitively challenged users find it hard
• Gimpy:
o Designed by Yahoo and CMU
o Picks up 10 random words from dictionary and
distorts, fills with noise
o User has to recognize at least 3 words
o If user is correct, he is admitted
• EZ-Gimpy:
o A modified version of Gimpy
o Yahoo used this version in Messenger
o Has only 1 random string of characters
o Not a dictionary word, so not prone to dictionary
attack
o Not a good implementation, already broken by
OCRs
• MSN’s Passport service CAPTCHAs:
o Provided for Microsoft’s MSN services
o Use 8 characters
o Warping is used to distort
o Very strong implementation, hasn’t been broken
o It is segmentation-resistant
•Graphic based CAPTCHAs:
• BONGO:
o After M.M.Bongard, pattern recognition expert
o User has to solve a pattern recognition problem
o Has to tell the distinct characteristic between two
sets of figures
o Then tell to which set a given figure belongs to
• PIX:
o Uses a large database of labelled images
o It shows a set of images, user has to recognize the
common feature among those
o E.g., Pick the common characteristic among the
following four pictures-----”Aeroplane”
•Audio CAPTCHAs:
o Consist of downloadable audio clip
o User listens and enters the spoken word
o Helps visually disabled users
o Below is the Google’s audio enabled
CAPTCHA
o Not popular
Applications
•Protect online polls
•Prevent Web registration abuse, protect
passwords from brute-force attack
•Prevent comment spam and spam emails
•E-Ticketing, prevent scalping
•Verify digitized books: reCAPTCHA
o Used in Google Books Project
o Two words are shown, the program knows first
word
o If user enters first word correctly, it assumes
that the second unknown word will also be
entered correctly
o Second word becomes “known”
•Help advance AI knowledge
• CAPTCHAs are called Hard-AI problems
• A win-win scenario:
o If CAPTCHAs are broken by a bot, a Hard-AI
problem is solved
o If its not yet broken, then current implementation is
able to withstand attacks
• Thus AI knowledge is advanced if CAPTCHAs
are broken
Constructing CAPTCHAs
•Things to keep in mind:
o Don’t store CAPTCHA solution in Web page’s
metadata
o A CAPTCHA is no good if it doesn't distort
o Need a large database of different CAPTCHA
questions
o Avoid repetition of questions
•CAPTCHA Logic:
• Generate the question
• Persist the correct answer
• Present the question to user
• Evaluate answer, if incorrect, start again-Generate a different CAPTCHA
• If correct, allow access to user
•Embeddable CAPTCHAs:
o Available freely, just embed code into Web
page’s HTML, from e.g., www.recaptcha.net
o No maintenance
•Custom CAPTCHAs:
o Fits to the theme of the page
o Better protected from spammers
Can be written in any language– Perl, .NET,
ASP, JavaScript
•Guidelines:
o Accessibility
o Image security
o Script security
o Security after widespread adoption
o Custom implementation or a general
CAPTCHA?
Breaking CAPTCHAs
•Cracking CAPTCHAs through programs
o Convert CAPTCHA into greyscale
o Detect patterns in the image corresponding to
characters
o Or, read session files of that user and know the
CAPTCHA word
 Solution: Only store a hash of the CAPTCHA
word in session files
•Greg Mori and Jitendra Malik have broken
text CAPTCHAs, e.g., Ez-Gimpy
o To break this CAPTCHA
 Segmentation: Locate possible
letters in the image
 Construct graph of consistent
letters
 Find out plausible words from
the graph, use scores to rank
roll=11.94, profit=9.42 (better match)
•Social engineering to break CAPTCHAs:
o Spammer encounters a CAPTCHA
o That CAPTCHA is copied to another site
o Humans are baited, e.g., free MP3s
o To get those MP3s, users are told to solve the
copied CAPTCHA
o Solution is routed to the spammer
 Solution: Fix a time-to-live period for a question
•CAPTCHA cracking as a business:
o Firms offer CAPTCHA cracking service in
exchange for money
Issues with CAPTCHAs
•Usability issues:
o W3C mandates Web to be accessible to all
people
o Some CAPTCHAs are inaccessible to visually
impaired, cognitively challenged people
•Compatibility issues:
o JavaScript may need to be activated in
browsers
Summary
•CAPTCHAs are an effective way to
counter bots and reduce spam
•They serve dual purpose– help advance AI
knowledge
•Applications are varied– from stopping
bots to character recognition & pattern
matching
•Some issues with current implementations