Understanding Big Data - University of Scranton

Download Report

Transcript Understanding Big Data - University of Scranton

By: Paul Kenosky








Big Data Define
Big Data Challenges
Increase in Technology
Characteristics of Big Data
Fraud Detection
Social Media
Hadoop
BigInsight

Understanding Big Data
 Big Data applies to information that cant be
processed or analyzed using traditional processes
or tools.

Wiki
 Big data is the term for a collection of data sets so
large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications



Business face big data challenges more and
more in today's world
They are overloaded with information that
can be beneficial to the organization
However they do not know how to make use
of the raw and unstructured data

Interconnectivity:
 More and more systems, people, and technology
are becoming interconnected

Inexpensive
 Integrated circuits are continually becoming
cheaper to produce and buy
 This allows intelligence to be added to many
devices that once seemed too costly
Example railway cars have hundreds of sensors.
Sensors can track things such as conditions
experienced by the rail car, the state of individual
parts, and GPS based data for shipment
 With the rise of technology these rail cars are
becoming more advanced and sensors are added to
sensor data on parts that are prone to wear, so they
can be replaced before they fail
 Data is stored on the rails, railroad crossing sensors,
weather patterns that cause rail movements, cargo
location, cargo arrival, and cargo departure times
 Processing all this data using a traditional relational
system would be impractical if not impossible



Volume:
 Data being stored today is increasing at an
overwhelming number
 Booking a flight, posting to facebook, sending a
text, and more

Variety:
 Represents all types of data

Velocity:
 How quickly data is arriving, stored, and analyzed

Transactions
 Online auctions, insurance claims
 A big data platform can present opportunities to
increases detection success
 Patterns of fraud can come and go in hours, days,
or weeks.
 If fraud detection pattern has a low latency by the
time it is discovered the damage is already done

An estimate of 20% of available information that
could be useful for fraud detection is being used

Why not load the other 80 percent of data into the
traditional analytic warehouse?
 Too expensive

Would it not pay for itself?
 How can we be sure this new information will be valuable
before making a costly business decision
 Use BigInsights to provide an elastic and cost-effective
repository to establish what of the remaining 80 percent
of the information is useful for fraud modeling.




IBM teamed up with a large credit card issuer
to improve there fraud detection model.
They discovered they could improve the
speed of detection and have more accurate
results using the new model
A process that once took three weeks was
improved to just a few hours.
They also found that about half of the 80%
was actually beneficial information that could
be used
Organizations can use Big Data usage pattern in
social media to find out what is being said about
the company and competitors
 This information can be used to significantly
improve decision making
 IBM has built a solution to accelerate an
organization usage called Cognos Consumer
Insights (CCI)
 CCI allows an organization to see what people
are saying, how topics are trending in social
media, and all sorts of things that affect the
organization




Although you can find out what people are
saying, another more important question
would be why are they saying and behaving
in this way?
An organization needs to look beyond that
data to answer the question
Sales, promotions, loyalty programs,
merchandising mix, competitor actions, and
even weather can come into play.




Company introduced a different kind of
packaging for one of its products.
Customers were giving negative feedback on
the new packaging
Months later the company discovered the
problem and switched the packaging to an
eco-friendly package.
This in turn increased sales and customer
happiness
An author of the book is a prolific facebook
poster
 Traveling on airlines is essential to his job and
after a number of flight delays he posted his
frustration with these airlines on his facebook
wall
 These flight delays were found on his facebook
wall by the airline and they contacted him
 Although, it doesn't mention what the airlines to
did to compensate or fix the problem it does
show one thing which is the company where
listening






Hadoop is a top level apache project and is open
source
Is designed to scan through large data sets to produce
its results through a highly scalable, distributed batch
processing system
Data is redundantly stored in multiple places across
clusters
The programming model is build to expect failures
and it will automatically resolve them by running
portions of the program on various servers.
Hardware components might fail but due to the
redundancy hadoop can provide fault tolerance





Hadoop can be complex to install, configure, and
administrate
IBM takes this complexity away with the BigInsight
installer
BigInsights makes it simpler for people to use Hadoop
and build big data applications.
It enhances this open source technology to withstand
the demands of your enterprise, adding
administrative, discovery, development, provisioning,
and security features, along with best-in-class
analytical capabilities from IBM Research.
The result is that you get a more developed and userfriendly solution for complex, large scale analytics.
http://www01.ibm.com/software/data/infosphere/biginsight
s/index.html
 http://en.wikipedia.org/wiki/Big_data
 http://www.decalsplanet.com/item-10485black-pot-of-gold.html
 http://drshocker.blogspot.com/2007_03_01_arch
ive.html
 http://www.mytinyphone.com/wallpaper/31448/
 https://www.facepunch.com/showthread.php?t=
1332655



Short YouTube video that explains Big Data
Some interesting stories the speaker went
over


Bats flying around airports
Noise was produced and airports filtered this
noise out
 Weather patterns
 Airplane movement

15 years later scientists got together
 Collecting data on bat migration
 Throwing this data away

One mans garbage is another mans treasure

Gates foundation
 Eradicate polio in Nigeria

Satellite maps
 Found villages no one knew of
 Government did not know these people where there
 No maps showed these villages
Gates gave out GPS phones to polio eradication
workers
 Combining satellites, vaccine, and cell phones is not
something that comes to mind when thinking of big
data
 Problems caused by misinformation or get the
information to late




http://motherboard.vice.com/blog/big-dataexplained-brilliantly-in-one-short-video
http://www.netanimations.net/Movingvampire-bat-and-Dracula-blood-suckinganimations.htm
http://www.nbcnews.com/id/37086846#.Uxd
7-YXpbYg