vanets.vuse.vanderbilt.edu

Transcript vanets.vuse.vanderbilt.edu

Memcached
Jenna Careccia
Matt Chambers
John Francis
Patrick McGannon
Richard Whalen
TutorialCachingStory
This is a story of Caching
Written by: Dormando
Edited by: Brian Moon and Emufarmers
http://code.google.com/p/memcached/wiki/TutorialCachingStory
Two plucky adventurers, Programmer and Sysadmin, set out
on a journey. Together they make websites. Websites with
webservers and databases.
Users from all over the Internet talk to the webservers and ask
them to make pages for them. The webservers ask the
databases for junk they need to make the pages. Programmer
codes, Sysadmin adds webservers and database servers.
One day the Sysadmin realizes that their database is sick! It's
spewing bile and red stuff all over! Sysadmin declares it has a
fever, a load average of 20! Programmer asks Sysadmin,
"well, what can we do?"
Sysadmin says, "I heard about this great thing called
memcached. It really helped livejournal!" "Okay, let's try it!"
says the Programmer.
Our plucky Sysadmin eyes his webservers, of which he has
six. He decides to use three of them to run the 'memcached'
server.
Sysadmin adds a gigabyte of ram to each webserver, and
starts up memcached with a limit of 1 gigabyte each. So he
has three memcached instances, each can hold up to 1
gigabyte of data.
So the Programmer and the Sysadmin step back and behold
their glorious memcached!
"So now what?" they say, "it's not DOING anything!" The
memcacheds aren't talking to anything and they certainly don't
have any data. And NOW their database has a load of 25!
Our adventurous Programmer grabs the pecl/memcache client
library manual, which the plucky Sysadmin has helpfully
installed on all SIX webservers. "Never fear!" he says. "I've got
an idea!"
He takes the IP addresses and port numbers of the THREE
memcacheds and adds them to an array in php.
Then he makes an object, which he cleverly calls
'$memcache'.
Now Programmer thinks. He thinks and thinks and thinks. "I
know!" he says. "There's this thing on the front page that runs
SELECT *
FROM hugetable
WHERE timestamp > lastweek
ORDER BY timestamp ASC
LIMIT 50000;
and it takes five seconds!"
"Let's put it in memcached," he says. So he wraps his code for
the SELECT and uses his $memcache object. His code asks:
Are the results of this select in memcache?
If not, run the query,
take the results,
and PUT it in memcache!
Like so:
Programmer pushes code. Sysadmin sweats. BAM! DB load is
down to 10! The website is pretty fast now. So now, the
Sysadmin puzzles, "What the HELL just happened!?"
"I put graphs on my memcacheds! I used cacti, and this is
what I see! I see traffic to one memcached, but I made three
:(."
So, the Sysadmin quickly learns the ascii protocol and telnets
to port 11211 on each memcached and asks it:
"Hey,
'get huge_data_for_front_page'
are you there?"
The first memcached does not answer...
The second memcached does not answer...
The third memcached, however, spits back a huge glob of
crap into his telnet session! There's the data! Only one
memcached has the key that the Programmer cached!
Puzzled, he asks on the mailing list. They all respond in
unison, "It's a distributed cache! That's what it does!" But what
does that mean?
Still confused, and a little scared for his life, the Sysadmin
asks the Programmer to cache a few more things.
"Let's see what happens. We're curious folk. We can figure
this one out," says the Sysadmin.
"Well, there is another query that is not slow, but is run 100 times
per second. Maybe that would help," says the Programmer. So he
wraps that up like he did before. Sure enough, the server loads
drops to 8!
So the Programmer codes more and more things get cached.
He uses new techniques. "I found them on the list and the faq!
What nice blokes," he says.
The DB load drops; 7, 5, 3, 2, 1!
"Okay," says the Sysadmin, "let's try again." Now he looks at
the graphs. ALL of the memcacheds are running! All of them
are getting requests! This is great! They're all used!
So again, he takes keys that the Programmer uses and looks for
them on his memcached servers. 'get this_key' 'get that_key' But
each time he does this, he only finds each key on one
memcached!
Now WHY would you do this, he thinks? And he puzzles all
night. That's silly! Don't you want the keys to be on all
memcacheds?
"But wait", he thinks "I gave each memcached 1 gigabyte of
memory, and that means, in total, I can cache three gigabytes of my
database, instead of just ONE! Oh man, this is great," he thinks.
"This'll save me a ton of cash. Brad Fitzpatrick, I love your ass!"
"But hmm, the next problem, and this one's a puzzler, this
webserver right here, this one running memcached it's old, it's
sick and needs to be upgraded.
But in order to do that I have to take it offline! What will
happen to my poor memcache cluster? Eh, let's find out," he
says, and he shuts down the box.
Now he looks at his graphs. "Oh noes, the DB load, it's gone up in
stride! The load isn't one, it's now two. Hmm, but still tolerable. All
of the other memcacheds are still getting traffic. This ain't so bad.
Just a few cache misses, and I'm almost done with my work."
So he turns the machine back on, and puts memcached back
to work. After a few minutes, the DB load drops again back
down to 1, where it should always be.
"The cache restored itself! I get it now. If it's not available it
just means a few of my requests get missed. But it's not
enough to kill me. That's pretty sweet."
So, the Programmer and Sysadmin continue to build websites.
They continue to cache. When they have questions, they ask
the mailing list or read the faq again. They watch their graphs.
And all live happily ever after.
What is it?
Memcached is an in-memory key-value store for
small chunks of arbitrary data (strings, objects)
from results of database calls, API calls, or page
rendering.
Memcached introduction (video)
History
● Created for LiveJournal.com, an early
blogging platform.
● Brad Fitzpatrick was the primary developer.
● Originally implemented in Perl, but converted
to C for speed reasons.
What is it composed of?
•
•
•
•
Client software, which is given a list of available memcached servers.
A client-based hashing algorithm, which chooses a server based on the
"key" input.
Server software, which stores your values with their keys into an internal
hash table.
Server algorithms, which determine when to throw out old data (if out of
memory), or reuse memory.
Popularity & Integration
Due to its popularity, memcached can be seen automatically integrated in
common web frameworks, like Django or Rails.
Simple configuration in a settings file. Django will then automatically handle the
interaction with memcached and serializing the objects.
Before Memcached ...
function get_foo(int userid) {
data = db_select("SELECT * FROM users WHERE userid = ?", userid);
return data;
}
... and After Memcached (pt. 1)
function get_foo(int userid) {
/* first try the cache */
data = memcached_fetch("userrow:" + userid);
if (!data) {
/* not found : request database */
data = db_select("SELECT * FROM users WHERE userid = ?", userid);
/* then store in cache until next get */
memcached_add("userrow:" + userid, data);
}
return data;
}
... and After Memcached (pt. 2)
function update_foo(int userid, string dbUpdateString) {
/* first update database */
result = db_execute(dbUpdateString);
if (result) {
/* database update successful : fetch data to be stored in cache */
data = db_select("SELECT * FROM users WHERE userid = ?", userid);
/* the previous line could also look like data =
createDataFromDBString(dbUpdateString); */
/* then store in cache until next get */
memcached_set("userrow:" + userid, data);
}
}
Brad Fitzpatrick on Memcached at
LiveJournal
“Our current setup is way faster than MySQL even on memory. MySQL has
to parse queries, form an optimizer plan, seek around in b-tree indexes,
keeps lots of metadata, etc... And if you do use the MySQL in-memory
tables, they have to be fixed-size records and only up to 2 GB of cache.
Our setup lets us scale to any amount of memory (add more machines to the
pool), and if one dies, there's no pain.. only lost a small fraction, and soon
the hit rates are up again, because those requests are now evenly hashed
against the remaining alive machines… We have 14 GB of cache online
right now and we're getting about an 85% cache hit rate. So basically the
databases aren't doing anything but writes.”
http://lj-dev.livejournal.com/539656.html
Client Software
•
•
•
•
Given list of memcached servers
Responsible for hashing keys to select servers
May compress data sent to servers
Available in multiple languages, including
o C/C++
o PHP
o Java
o Python
o Ruby
Failover or Failure?
•
•
•
Client software either uses failover or failure to handle
unavailable servers
Failover
o If server is unavailable, reassign keys to next available
server
o Problem - when server comes back online, client will
retrieve old data
Failure
o Treat requests to unavailable server as cache misses
Server Software
•
•
•
•
•
•
Stores your values with their keys into an internal hash
table
Servers are generally unaware with each other
Least Recently Used (LRU) cache by default
Free space lazily reclaimed
Listens on TCP and UDP port 11211 by default
Defaults to 4 worker threads per server
Key-Value Store Properties
Items are made up of:
a key
an expiration time
optional flags
raw data (Only un-serialized data. Memcached is type
independent)
•
•
•
•
Implementation Details
- C as a single-process, single-threaded,
asynchronous I/O, event-based dæmon
- All algorithms are O(1)
Server Configuration
•
•
•
•
Not usually exposed directly to the internet
o Memcached has limited built-in security
Each server has about same amount of dedicated RAM
o Lessens impact of an individual server going down
Assign slightly less memory than available to avoid
swapping
Typical layouts
o Memcached running on webservers
o Memcached running on dedicated hosts
Server Configuration (cont’d)
o Memcached running on webservers
 Benefit - memory spread out more, so loss of a single server
has less impact
 Disadvantage - increased webserver memory usage could
cause memcached to use swap
o Memcached running on dedicated hosts
 Benefit - only memcached is using memory, so memory
swapping less likely
 Disadvantage - More memory concentrated on each server, so
loss of server has more impact
Server Software Internals
•
•
•
•
RAM allocated for memcached is partitioned into 1 MB pages
Each page is permanently assigned to a “slab” class when needed
Each slab class specifies a chunk size
o A page assigned to a slab class will be divided into chunks of size
specified by the slab class
Key/value pair placed in slab class with closest fit
o Can think of each slab class as an individual LRU cache
Amazon Web Services for scalable
memcached and Redis hosting.
Advantages of AWS ElastiCache
● Simple to set-up and use
○ Maintaining your own memcached cluster can be a
big pain.
● Elastic
● All of the common AWS advantages (Reliability, Cost
Effective, etc.)
● Protocol compatibility with memcached or Redis.
How ElastiCache Works
● Deploy Nodes as memcache endpoints.
Each node has a size similar to EC2 sizes.
● Can scale to as many as you need
elastically.
● Each node has an endpoint which is a
separate URL for the node.
Setting ElastiCache up with EC2. (w/ demo)
● Go to ElastiCache in the AWS Management Console and choose “Launch
Cache Cluster.”
● Fill in the settings to your preference and start the cluster.
● Now, in the ElastiCache console, go to the nodes and find the “endpoint.”
Save this for later.
● On the EC2, install telnet. Now type telnet endpointaddress node-port.
● Common Error: The Cache Node and the EC2 must be on the same
Security Rules. Also, you must open the node port within those Security
Rules.
● Now, one can use basic commands to test the cache.
● Source:
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Getting
Demo
API- Common Read Operations
•
•
•
•
•
•
•
addServer - add new host to the memcached cluster
fetch / fetchAll - request next or all items from an asynchronous request
get - get an item from the cache or return false if the item does not exist
getDelayed - request items to be retrieved asynchronously
getResultCode - get code from last operation
getResultMessage - get message describing last result code
getStats - get server stats
API- Common Write Operations
•
•
•
•
•
•
•
•
•
add - add item to cache or return false if item already exists
append / prepend - modify existing string value with a new string; compression not supported
cas - compare and swap value if no other client has modified the value; useful for concurrent access
decrement / increment - modify existing numeric value
delete - remove item from cache, optionally marking it to not be recacheable for a certain time
flush - remove all items from the cache
replace - replace an existing value or returns false if item does not already exist
set - sets a new value or replaces an existing value
touch - modify expiration time of an existing value
Client code
Time the query
Put the query in cache
Time retrieval from cache
memcache location could be
●
localhost
●
another server
●
a group of servers

vanets.vuse.vanderbilt.edu

Transcript vanets.vuse.vanderbilt.edu

Directory