Skip to comments.Facebook trapped in MySQL ‘fate worse than death’
Posted on 07/07/2011 8:55:49 PM PDT by TenthAmendmentChampion
According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to a fate worse than death, and the only way out is bite the bullet and rewrite everything.
Not that its necessarily Facebooks fault, though. Stonebraker says the social networks predicament is all too common among web startups that start small and grow to epic proportions.
During an interview this week, Stonebraker explained to me that Facebook has split its MySQL database into 4,000 shards in order to handle the sites massive data volume, and is running 9,000 instances of memcached in order to keep up with the number of transactions the database must serve. Im checking with Facebook to verify the accuracy of those numbers, but Facebooks history with MySQL is no mystery.
The oft-quoted statistic from 2008 is that the site had 1,800 servers dedicated to MySQL and 805 servers dedicated to memcached, although multiple MySQL shards and memcached instances can run on a single server. Facebook even maintains a MySQL at Facebook page dedicated to updating readers on the progress of its extensive work to make the database scale along with the site...
(Excerpt) Read more at gigaom.com ...
Shoking. NOT. No replacement for Oracle.
Flame-retardant equipment deployed here for what will be an incendiary thread. And it's ess-que-ell not sequel. :-) Harumph!
I think it means spaghetti code.
But in query form.
I run mysql and have dealt a little with memcahed stuff for a ticketing database. The problems facebook might have depends of which data storage engine they run, likely innodb, which are huge relational glop files. I run myisam tables for speed and because I’m only dealing with a gigabyte of data.
But yes, switching to another database would require a very bug prone rewrite with many database quirks. The language (SQL) is not consistently implemented from database maker to database maker and the connectors are inconsistent. They’d also be stuck with an oracle or sqlserver license fee out the wazoo.
Basically yes but a bit more complex. Easily translated: you do not run REALLY large operations on freeware. You get what you pay for. Oracle is the best. Become one with the Borg.
IOW....it would be a ‘hacker’s delight’???
So rather than update, patches are applied to the output to make changes when served. And it's working ok for the moment, but it is swiftly reaching the horizon of functionality. At a growth rate of 52,000 items a year, we've at most two years left of the database before it becomes impossibly bogged down, without throwing extensive hardware at the problem.
And we're a tiny company of just ten people.
ROFL!!!! That alone is contentious enough to get us through several pages of debate/flame wars!
For the record, I've always referred to it as 'sequel'. Not passionate about it, just seems to roll off the tongue a bit easier spoken that way.
The problem is common, and has nothing to do with MySQL, ProgSQL, or Oracle.
It has to do with lazy barstids that didn't do the work up front and became frantic barstids, trying to keep the thing working.
It's a common engineering problem. Systems Engineering, not just for aerospace.
Someone has to ask (and answer) "what is the growth path?"
I'm doing engineering on a start-up now. And we're good on growth path until I sell out or die. And then I don't care.
What's the backend RDBMS here on FR?
Of course, Perl is the glue but I wonder if the database access is via the DBI layer or if it uses the native APIs?
Facebook started their enterprise using a database called MySQL. (SQL is an an acronym meaning Structured Query Language)
They didn't see the potential of their enterprise and stuck with MySQL and when the thing expanded exponentially they were caught unawares. Call it a lack of vision.
Now, their database isn't beefy enough (supposedly) to handle all the daily transactions it must deal with. They are breaking the thing down into smaller chunks so it doesn't slow to a crawl and piss off the kiddies who simply MUST get their Facebook fix every 5 minutes.
Or something like that.
The should have migrated to Oracle a long time ago.
I manage a Informix database of up to 1.5 million items, with large associated blob items, and it runs like a charm on a modest HP Unix server.
Makes me wonder about the tiny little MySQL stuff I play with on the side.
Heh! Just wait until we start on brace and indentation styles (K&R vs. Allman)! And of course, there's always vi vs. emacs -- but that battle has been won since vim came along. Perl vs. Python. C# vs. Java...
:-) * 10e6
Stonebraker explained to me that Facebook has split its MySQL database into 4,000 shards in order to handle the site's massive data volume, and is running 9,000 instances of memcached in order to keep up with the number of transactions the database must serve.Over a year ago I read that FB had (at that time) 30,000 servers, and hosted more pix than all other sites combined. As the old saying goes, it's not that the dancing bear dances well, it's that he dances at all.
Oracle can scale like you wouldn’t believe.
Oracle would be nothing without the original DB designers from Digital Equipment Corporation.
Oracle the best?
Yeah, the best salesmen!
What???!! Guess the little boy-geniuses were not so smart after all?
Ack! That takes me back to newsgroup flamewars back before anything but dialup existed, and the web was just a spark waiting to become an RFC. FTP, gopher, talk... The slow old days.
Not new to people who have been around SQL and database design.
What people forget about SQL relational databases is that they were a replacement for how data was organized in the 70’s on mainframes: indexed files.
Because most kiddies today have never seen an ISAM or VSAM file, and wouldn’t know one if it it were to leap up and bit them in their pampered buttocks, they have no perspective on what SQL relational databases were designed to do.
The brutal truth is pretty low-tech for most of today’s kids: SQL and RDBMS were to fill an organized file with data in a way that enabled the programmer generating fast reports with queries for keyed and non-keyed fields - in other words, doing the grunt work of the business world that had outgrown such platforms as the System 34 and 36 and languages like RPG-II and RPG-III.
That’s it. As SQL progressed, transactions were added, as were some pretty powerful constructs for joins, etc. Pretty cool stuff if you’re crunching reams of nice tabular data. Let’s say you want to write up a database of auto registrations, medical records etc - RDBMS do that pretty well.
But applications like social networks..... they don’t have the nice, orderly data in columns, marching down the greenbar fanfold paper. There’s all these networks, circularities, odd bits of data with bizarre relationships that don’t fit the relational database models - not even remotely. Oracle won’t solve this problem; it will merely move the wall out a tad before they hit it.
Kids need to crack open some books, read a bunch of code and learn some things from people who have been there, done that. But the current dot-bomb VC/startup system doesn’t think about rewarding people who do their homework and get things right. It rewards people who are the first with the crappiest. These problems aren’t new. The reason why outfits like Facebook use MySQL or any SQL-based relational DB is because they don’t know any better, they haven’t got the brains to think through the problem they have and are trying to solve. They just grab some “free” software off a FTP site and use a big enough hammer until their square peg gets broached into the round hole.
And why don’t they know any better? Because these snot-nosed twerps wasted their years in colleges fooling around with rubbish like C++, arguably the worst language to come along since.... well, since forever. I can’t think of a worse programming language, actually. Add to this that schools like Harvard waste undergrad time on such nonsense as “intelligent machines” and “privacy and technology,” both of which are simply graduate level subject areas. As far as I can see, things like databases, database schema, grunt-work business data processing... are all too mundane to receive any treatment in the CS department at Harvard, Zuckerberg’s school.
So Facebook has a DB problem. Eh, OK. Gives me another reason to drop my account. Zuckerberg’s perverted security models were my first biggest reason.
The problem with SQL, whatever the server, is that the implementations, i.e. the schemas of the tables are rarely relational, due to (1) the skills of the Indian programmers, (2) the limitations of the relational theory.
Been there, done that. Irrelational databases.
Informix, wow, didn’t even know it still existed. I did some Informix projects back in the mid 80’s. Worked great.
>>And it’s ess-que-ell not sequel.<<
I was a DB2 programmer back in the day and EVERYONE called it ess-que-ell. Then I was at a job interview at Microsoft about 6 years ago and one of the guys that interviewed me used both pronunciations, but for specific things. When we talked of mainframes and DB2, he pronounced it the old way. When dealing with Microsoft software, he called it Sequel. Who owns Microsoft ess-que-ell server? :-)
All that for ten people? Wow.
I worked at a pharmacy benefits company with a comprehensive system that managed cardholders, drugs, plans, prior authorizations, etc. I wrote the data dictionary for the system which consisted of about 200-250 tables for various purposes. Some of the logic was kludgey, some of it was pretty ingenious. It was created in Oracle and scalability never seemed to be a problem. I would guess they managed about a million cardholders all told.
The solution is IBM’s DB2...
This is actually a very interesting thread.
I know a little SQL, learned it on MySQL and Windows SQL 2000. Hell, I can barely pass a basic SQL course at the local Community College. But I passed.
Rock on you guys, you are my heroes. Fags ;) Y’all are so far over my head I bet you have nose bleeds.
+1, you win the interwebs.
Oops, sorry my reply was meant for Kingu’s post number 10.
Otherwise known as "curry code" and for good reason. Make sure you don't take IMS off their computers or they would be really lost.
Great explanation, thanks! I got the jist of the story, that Facebook uses MySQL, which is freeware and was never designed for hundreds of millions of users creating billions of transactions; however, I didn’t know that Facebook ran on such a tiny platform!!
A few years ago I worked for a huge financial company, and learned that a key list of company information was kept on one of the employee’s desktop computer hard drive, in a spreadsheet!
There are big differences in how things are done between MySQL and MS SQL Server, so I had to either write a portability layer (a work that is not earning me any extra income) or to pick a winner and be done. I picked MS SQL Server Express, and so far it seems to work well. If some customers need more performance they can always send a check to Steve Ballmer. MySQL is free, but on the other hand what you have is all that you have - and FB people realized that.
Their migration to a better RDBMS would probably require not just syntax changes that reflect SQL dialects. The whole FB code is probably awful. They have a lot of work to do. On the other hand, they probably can pay for it. The good news is that now they have a better idea about how it should have been written to begin with - and they can order it written this way.
Personally, I use Cache almost every day and as far as I am concerned it is the greatest database and programming language ever...just don't know why it doesn't catch on.
There is no doubt that the internet, starting with the basic IRC (web chat), up through these massive social networking sites, have had an overwhelming effect on society, worldwide.
I know, in my own case, it resulted in a massive and total
reordering, as well as unimagined changes in my life.
I have two Facebook accounts that I rarely look at, but my girlfriend also has two, and the computer is on them 24/7
as well as playing Facebook games, such as Farmville.
I sometimes wonder when the entire internet might crash from
overload, and as with many disasters, it might happen from an innocuous component or link somewhere in the world.
It might be a case of all the kings horses, and all the kings men, won’t be able to put it back together again.
I didn’t know who to ping about this! So glad you were interested and I didn’t even reach out to you.
Your post explains it all very well.
I have seen it all myself.
Relational databases. Ah, the final solution to all data scheme problems.
+1 to your post!