(The fun and the fear of…) Finding Information

Download Report

Transcript (The fun and the fear of…) Finding Information

Private Communications
Issues in the Social Web era
CS315 – Web Search and Data Mining
The AOL search data release
2
By the way, …
… search companies log your searches …
3
Privacy concerns
Data is often collected silently

Web allows large quantities of data
to be collected inexpensively
and unobtrusively
Data from multiple sources may be merged

Non-identifiable information can become identifiable when
merged
Data collected for business purposes
may be used in civil and criminal proceedings
Users given no meaningful choice

4
Few sites offer alternatives
Browser Chatter
Browsers chatter
about




IP address, domain
name, organization,
Referring page
Platform: O/S, browser
What information is
requested
 URLs and search terms

6
Cookies
To anyone who
might be listening




End servers
System administrators
Internet Service
Providers
Other third parties
 Advertising networks

Anyone who might
subpoena log files later
Cookies 101
Cookies can be useful



Used like a staple
to attach multiple parts of a form together
Used to identify you when you return to a web site
so you don’t have to remember a password
Used to help web sites understand how people use them
Cookies can do unexpected things

7
Used to profile users and track their activities, especially across web sites
How cookies work – the basics
A cookie stores a small string of characters
A web site asks your browser to “set” a cookie
Whenever you return to that site your browser sends the
cookie back automatically
Please store
cookie xyzzy
site
Here is cookie
xyzzy
browser
First visit to site
8
site
browser
Later visits
How cookies work – advanced
Cookies are only sent back
to the “site” that set them – but
this may be any host in domain

Sites setting cookies indicate
path, domain, and expiration
for cookies
Send
me with
any
request
to x.com
until
2008
9
Cookies can store user info
or a database key
that is used to look up user info
Send me
with requests
for
index.html
on y.x.com
for this
session only
User=Joe
Email=
Joe@
x.com
Visits=13
Database
Users …
Email …
Visits …
User=457
6904309
Cookie terminology
Cookie Replay – sending a cookie back to a site
Session cookie – cookie replayed only during current
browsing session
Persistent cookie – cookie replayed until expiration date
First-party cookie – cookie associated with the site the
user requested
Third-party cookie – cookie associated with an image, ad,
frame, or other content from a site with a different
domain name that is embedded in the site the user
requested

10
Browser interprets third-party cookie based on domain name,
even if both domains are owned by the same company
“Web bugs”
Invisible “images” (1-by-1 pixels, transparent)
embedded in web pages
and cause referrer info and cookies to be transferred

Also called web beacons, clear gifs, tracker gifs,etc.
Work just like banner ads from ad networks,
but you can’t see them unless you look at the page
source
Also embedded in HTML formatted email messages,
MS Word documents, etc.
11
Ad networks
search for
medical
information
buy CD
set cookie
Random Ad
Search Service
12
replay cookie
Ad company
can get your
name & address
from CD order
and link them to
your search
(This is NOT how
Google Ads work)
Medical Ad
CD Store
What ad networks may know…
Personal data:




Email address
Full name
Mailing address
(street, city, state, and
Zip code)
Phone number
Transactional data:



Details of plane trips
Search phrases used at
search engines
Health conditions
“It was not necessary for me to click on the banner ads for information to
be sent to DoubleClick servers.”
– Richard M. Smith
14
Offline data goes online…
•The Stop and Shop
grocery store began
posting purchase
information for
customers who had
frequent shopper cards
•The Cranor family’s
25 most frequent
grocery purchases
(sorted by nutritional
value)!
15
Spyware
Spyware: Software that employs a user's Internet connection,
without their knowledge or explicit permission, to collect info

Most products use pseudonymous, but unique ID
Over 800 known freeware and shareware products contain
Spyware, for example:



Beeline Search Utility
GoZilla Download Manager
Comet Cursor
Often difficult to uninstall!
Anti-Spyware Sites:




16
http://grc.com/oo/spyware.htm
http://www.adcop.org/smallfish
http://www.spychecker.com
http://cexx.org/adware.htm
Devices that monitor you
Creative Labs Nomad JukeBox
Music transfer software reports
all uploads to Creative Labs.
Sony eMarker
Lets you figure out the
artitst and title of songs
you hear on the radio. And
keeps a personal log of all
the music you like on the
emarker Web site.
http://www.nomadworld.com
http://www.emarker.com
Sportbrain
Monitors daily workout. Custom
phone cradle uploads data to
company Web site for analysis.
http://www.sportbrain.com/
:CueCat
Keeps personal log of advertisements
you‘re interested in.
http://www.crq.com/cuecat.html
See18http://www.privacyfoundation.org/
HTTP request that sets a Cookie
Web Page Request Header
GET /models/model_overview.asp?ModelName=S2000 HTTP/1.1
Web Site which
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword,
application/x-shockwave-flash,
*/*
uses
Cookies
Referer: http://www.google.com/search?hl=en&q=s2000
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Host: automobiles.honda.com
Proxy-Connection: Keep-Alive
Web Page Response Header
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Wed, 05 Oct 2005 19:51:07 GMT
X-Powered-By: ASP.NET
P3P: policyref="http://www.honda.com/w3c/privacy.xml ", CP="IDC DSP COR ADMi DEVi TAIa PSAi PSDi IVAi CONi OUR
SAMi IND PHY ONL COM NAV STA"
pragma: no-cache
cache-control: no-store
Content-Length: 1435
Content-Type: text/html
Expires: Sat, 18 Jan 1997 17:36:16 GMT
Set-Cookie: bhCookieSaveSess=1; path=/
Set-Cookie: bhCookieSess=1; path=/
Set-Cookie: bhCookiePerm=1; expires=Fri, 07-Oct-2005 19:51:06 GMT; path=/
Set-Cookie:
BrowserInfo=VBScript=True&BrowserOS=Win&Crawler=False&BrowserVer=6&BrowserName=IE; path=/
Cache-control: private
HTTP request after the cookie is set
Web Site which uses Cookies
Request Header after cookie is set
GET /models/model_overview.asp?ModelName=S2000&bhcp=1 HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel,
application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Host: automobiles.honda.com
Proxy-Connection: Keep-Alive
Cookie: bhCookieSaveSess=1; bhCookieSess=1; bhCookiePerm=1;
BrowserInfo=VBScript=True&BrowserOS=Win&Crawler=False&BrowserVer=6&BrowserName
=IE; bhResults=bhjs=1; bhPrevResults=bhjs=1