As in any other field, for a win, we should take a look at the successful stories and case studies around us. In this article I would like to talk about optimization, speed up and handling with success big amount of traffic and lots of database queries.
How many times have you bought a domain name, a hosting and started a website with the hope that your idea will rock? I did it few times without real success. Now the question is WHY?
Supposing you want to start a social website/app, it's not enough to code the whole thing and turn it on! Imagine that your website can go viral in few hours and your code will not be enough to handle a big amount of traffic. The whole system will crash and you will have to re-work it, and this will take enough time to lose your user's interest and it will be hard to get them back at your second start.
The key point for a successful start is to think few times about the technology you use, your hardware system, the possibility of high traffic from the start, and the possibility to upgrade your machine in a short time. In a world where you can find everything as an open source, it's all about search and research around the right technologies you need in order to avoid future troubles.
I did some research about Facebook technologies and I would like to share it with you! So let's look at my results.
Hip-Hop for Php
Php is relatively slow compared to code that runs natively on a server so, Facebook created a tool used for speeding-up php applications in general by converting the php code into C++. Like this the code will be compiled for better performance. You can find prebuilt packages of hip-hop for php for Ubuntu, Debian and CentOs also, it can easily be integrated into the Zend framework.
Distributed caching. Hit your database less. Speed up your website.
Memcache allows you to take memory from parts of your system where you have more than you need and make it accessible to areas where you have less than you need. It's a free, open source and optionally you can find few php and python add-ons too.
BigPipe - Say Good Bye to Frames
Created by Facebook with the goal of decomposing webpages into small chunks called pagelets and pipeline them inside webservers and browsers for optimal performance. For example, BigPipe decomposes a Facebook homepage in few pagelets (chat, newsfeed, ads, menu, and so on) that are retrieved parallel, also gives users a website that even works parts that are deactivated or broken.
Created by the Apache Software Foundation, Cassandra is designed to handle very large amounts of data spread out across many commodity servers. It's perfect NoSql alternative.
Cassandra is fully distributed (no SPOF), multi master - multiDC, linearly scalable, larger than memory datasets, fully durable, integrated caching, tunable consistency, best in class performance.
Flexible logging system designed to be scalable, extensible without client side notifications.
Haystack - efficient storage of billions of photos
Haystack is a high-performance photo storage/retrieval system used by Facebook to create four different resolutions for each photo uploaded on Facebook. More than that, haystack performance is critical. Imagine that Facebook serves more than 1 million photos every second that means more than 4 million photos (with four different resolutions) are handled by Haystack with success.
Hadoop and Hive
Open source (Apache projects) used by big services like Yahoo, Facebook and Twitter. Hadoop makes it possible to perform calculations on massive amounts of data. Hive uses SQL queries against Hadoop, making it easier for non-programmers to use.
Created by Facebook as open a source, Thrift is a cross-language framework that ties different programming languages used into a system together, giving them the possibility to talk to each other. This helps to keep a cross-language development.
Apache Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages. Instead of writing a load of boilerplate code to serialize, transport objects and invoke remote methods, you can get right down to business.
Used by Facebook to serve photos and profile pictures, handling billions of requests every day. Like almost everything Facebook uses, Varnish is open source. Varnish is an HTTP accelerator which cache content and serves it lightning-fast, and also acts as a load balancer. As a web application accelerator it doesn't need to be installed in front of your web application, and it will speed it up significantly.
Gatekeeper - Building and testing
Gatekeeper is used by Facebook to run different code for different sets of users (it basically introduces different conditions in the code base). This permits to do gradual releases of new features, A/B testing, activate certain features only for Facebook employees, etc.
Gatekeeper also lets Facebook do the "dark launches", which is to activate elements of a certain feature behind the scenes before it goes live (without users noticing since there will be no corresponding UI elements). This acts as a real-world stress test and helps expose bottlenecks and other problem areas before a feature is officially launched. Dark launches are usually done two weeks before the actual launch.
Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention.
Online Schema Change for MySQL asmall but complex utility to perform online schema change for MySQL without taking your cluster offline.
Phabricator is an open source collection of web applications which makes it easier to scale software companies. It was created at Facebook but is also used at many other companies such as Airtime, Asana, Dropbox, deviantART, MemSQL, Path, Quora, and more.
PHPEmbed makes embedding PHP truly simple (Simplified API built on top of PHP SAPI)
PhpSh an interactive shell for PHP written mostly in Python.
All technologies presented in this article are open source and free to use. It's just a matter of time spent on understanding them and managing them to work together. Build your system in a smart way!
1.Github - Facebook/HipHop-PHP https://github.com/facebook/hiphop-php/wiki
2.Memcache - https://groups.google.com/forum/#!forum/memcached
3.BigPipe - https://github.com/garo/bigpipe
4.The Apache Cassandra Project - http://wiki.apache.org/cassandra/
5.Scribe - https://github.com/facebook/scribe
7.Handoop & Hive - http://hive.apache.org/
8.Thrift - http://thrift.apache.org/
9.Varnish - https://www.varnish-cache.org/
10.Gatekeeper - https://www.facebook.com/notes/facebook-engineering..
11.Codemod - https://github.com/facebook/codemod
13.PhpEmbed - https://github.com/facebook/phpembed/blob/master/README
14.phabricator - http://phabricator.org/
What should you learn next?
15.phpsh - http://www.phpsh.org/