General security

Facebook Technologies

Adrian Birsan
July 26, 2013 by
Adrian Birsan

As in any other field, for a win, we should take a look at the successful stories and case studies around us. In this article I would like to talk about optimization, speed up and handling with success big amount of traffic and lots of database queries.

How many times have you bought a domain name, a hosting and started a website with the hope that your idea will rock? I did it few times without real success. Now the question is WHY?

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

Supposing you want to start a social website/app, it's not enough to code the whole thing and turn it on! Imagine that your website can go viral in few hours and your code will not be enough to handle a big amount of traffic. The whole system will crash and you will have to re-work it, and this will take enough time to lose your user's interest and it will be hard to get them back at your second start.

The key point for a successful start is to think few times about the technology you use, your hardware system, the possibility of high traffic from the start, and the possibility to upgrade your machine in a short time. In a world where you can find everything as an open source, it's all about search and research around the right technologies you need in order to avoid future troubles.

I did some research about Facebook technologies and I would like to share it with you! So let's look at my results.

Hip-Hop for Php

[caption id="" align="alignnone" width="624"]Click to Enlarge Click to Enlarge[/caption]

Php is relatively slow compared to code that runs natively on a server so, Facebook created a tool used for speeding-up php applications in general by converting the php code into C++. Like this the code will be compiled for better performance. You can find prebuilt packages of hip-hop for php for Ubuntu, Debian and CentOs also, it can easily be integrated into the Zend framework.

Memcache

Distributed caching. Hit your database less. Speed up your website.

Memcache allows you to take memory from parts of your system where you have more than you need and make it accessible to areas where you have less than you need. It's a free, open source and optionally you can find few php and python add-ons too.

BigPipe - Say Good Bye to Frames

Created by Facebook with the goal of decomposing webpages into small chunks called pagelets and pipeline them inside webservers and browsers for optimal performance. For example, BigPipe decomposes a Facebook homepage in few pagelets (chat, newsfeed, ads, menu, and so on) that are retrieved parallel, also gives users a website that even works parts that are deactivated or broken.

Apache Cassandra

Created by the Apache Software Foundation, Cassandra is designed to handle very large amounts of data spread out across many commodity servers. It's perfect NoSql alternative.

Cassandra is fully distributed (no SPOF), multi master - multiDC, linearly scalable, larger than memory datasets, fully durable, integrated caching, tunable consistency, best in class performance.

Scribe

Flexible logging system designed to be scalable, extensible without client side notifications.

Haystack - efficient storage of billions of photos

Haystack is a high-performance photo storage/retrieval system used by Facebook to create four different resolutions for each photo uploaded on Facebook. More than that, haystack performance is critical. Imagine that Facebook serves more than 1 million photos every second that means more than 4 million photos (with four different resolutions) are handled by Haystack with success.

Hadoop and Hive

Open source (Apache projects) used by big services like Yahoo, Facebook and Twitter. Hadoop makes it possible to perform calculations on massive amounts of data. Hive uses SQL queries against Hadoop, making it easier for non-programmers to use.

Thrift

Created by Facebook as open a source, Thrift is a cross-language framework that ties different programming languages used into a system together, giving them the possibility to talk to each other. This helps to keep a cross-language development.

Apache Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages. Instead of writing a load of boilerplate code to serialize, transport objects and invoke remote methods, you can get right down to business.

[caption id="" align="alignnone" width="626"]Click to Enlarge Click to Enlarge[/caption]

[caption id="" align="alignnone" width="624"]Click to Enlarge Click to Enlarge[/caption]

Varnish

Used by Facebook to serve photos and profile pictures, handling billions of requests every day. Like almost everything Facebook uses, Varnish is open source. Varnish is an HTTP accelerator which cache content and serves it lightning-fast, and also acts as a load balancer. As a web application accelerator it doesn't need to be installed in front of your web application, and it will speed it up significantly.

Gatekeeper - Building and testing

Gatekeeper is used by Facebook to run different code for different sets of users (it basically introduces different conditions in the code base). This permits to do gradual releases of new features, A/B testing, activate certain features only for Facebook employees, etc.

[caption id="" align="alignnone" width="622"]Click to Enlarge Click to Enlarge[/caption]

Gatekeeper also lets Facebook do the "dark launches", which is to activate elements of a certain feature behind the scenes before it goes live (without users noticing since there will be no corresponding UI elements). This acts as a real-world stress test and helps expose bottlenecks and other problem areas before a feature is officially launched. Dark launches are usually done two weeks before the actual launch.

Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention.

Online Schema Change for MySQL asmall but complex utility to perform online schema change for MySQL without taking your cluster offline.

Phabricator is an open source collection of web applications which makes it easier to scale software companies. It was created at Facebook but is also used at many other companies such as Airtime, Asana, Dropbox, deviantART, MemSQL, Path, Quora, and more.

PHPEmbed makes embedding PHP truly simple (Simplified API built on top of PHP SAPI)

PhpSh an interactive shell for PHP written mostly in Python.

Conclusion:

All technologies presented in this article are open source and free to use. It's just a matter of time spent on understanding them and managing them to work together. Build your system in a smart way!

Resources:

1.Github - Facebook/HipHop-PHP https://github.com/facebook/hiphop-php/wiki

2.Memcache - https://groups.google.com/forum/#!forum/memcached

3.BigPipe - https://github.com/garo/bigpipe

4.The Apache Cassandra Project - http://wiki.apache.org/cassandra/

5.Scribe - https://github.com/facebook/scribe

6.Haystack -technohandle.blogspot.ro/2012/11/facebook-haystack-image-datastore.html

7.Handoop & Hive - http://hive.apache.org/

8.Thrift - http://thrift.apache.org/

9.Varnish - https://www.varnish-cache.org/

10.Gatekeeper - https://www.facebook.com/notes/facebook-engineering..

11.Codemod - https://github.com/facebook/codemod

12.OSCM - https://www.facebook.com/notes/mysql-at-facebook/online-sch

13.PhpEmbed - https://github.com/facebook/phpembed/blob/master/README

14.phabricator - http://phabricator.org/

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

15.phpsh - http://www.phpsh.org/

Adrian Birsan
Adrian Birsan

Adrian Birsan is a freelance web developer and pentester. Says he: "Technology has always been something which captivates me; I like computer security and software development. I am a pentester on my free time and also own a blog where I post useful information. I am a big supporter of Freedom of Speech and ... I play the guitar m/ " His blog can be found at http://softpill.eu/