This article talks about techniques used at Facebook.com to serve the static resources such as CSS, JS and Images files when someone accesses Facebook.com. If you are one of the developer at Facebook and worked on related modules, and disagree with one or more aspects of this article, please drop a message and I shall change the same appropriately. The article is aimed to present a perspective on how to handle the web static resources, based on how it is handled at facebook.com. Thank you for reading it further.
Back in February 2004?
Well, like most other startups, facebook got launched in February 2004 with usual manners of serving CSS & JS files as independent, separate files. As per Wikipedia page, Zuckerberg (picture below) wrote the software for the Facemash website (facebook predecessor) when he was in his second year of college and the website got launched in Octobar 2003. Few months later, in January 2004, Zuckerberg began writing the code for a new website, known as ‘theFacebook’ and the same got launched in February 2004.
Following is how the code used to look like for few years after the launch:
However, as facebook started growing, the above way of managing the CSS & JS files needed to be changed because of following reasons:
- Management nightmare: It was difficult to manage all the CSS & JS files in various web pages as it was required to include right files in right web pages in right order. The error started to get in, in form of many not-needed resource files found in one or more web pages.
- Performance issue: The performance issue was related with large number of HTTP requests that was required to be made for every CSS & JS files.
Most of the startup websites adopt the above mentioned strategy as in the initial days one is least bothered about performance bottlenecks and management issues and more concerned about validating the idea in general. Fair enough!
Haste System & Other Optimization Techniques further 2007!
The CSS & JS files started getting managed by what is called as a Haste system. As per the documentation, the haste system is used to scan the directories, read the package.json file for configuration changes, gather the dependencies and update a map of static resources for the given webpage. This solved the issue of manually Following represents sample code on how Haste system use to manage the dependencies and bundle them in form of updating the map with the bundled data.
Along with Haste system, following were some additional optimization techniques that got adopted along the same time.
- Bundling all the JS, CSS files as one JS & CSS file and sending them over.
- Loading the resources file at the end of page rendering
Finally, Static Resources Delivered from Database!
Facebook started growing further across geographies as a result of which they started delivering webpages using 1000s of web servers. As a result, following started appearing as some of the challenges:
- How to release static resources (CSS, JS, Images) on these 1000s servers with all the users having the latest copies? There may always be the lag in the release and resources version mismatch could become the critical issue.
- Version management of these static resources, in general.
- How to have users always get the fresh/latest copies of static resources without the need for clearing their browser cache? There could always be the case that the users might have accessed the page from a server where the webpage with most up-to-date resource file path would have got served. And, when the request for these resource files would have got sent back, the request could have landed on the web server where the latest files did not get released/pushed. This could end up having users with stale resource file for latest page, and thus, poisoned cache.
Following is pictorial representation of the problem/inconsistencies in relation with users having stale copies of resources in their browser cache and, thus, not getting consistent look and feel of the page:
In the figure above, you may see that as user is trying to access the resource files, some of the resource files may not have got pushed in the appropriate servers where the request for the resource files came. This would have lead to what is termed as poisoned cache with stale resources for new version. To fix the above issue, facebook moved to the following technique:
- Publish all the static resources in the database before pushing the updates in the webpage.
- Have a php file, named as rsrc.php, query the database to get the appropriate version
- Place the static resources in the web page like following link:
In the above example, you may want to note the file rsrc.php, version number v2 and, css files with cryptic names. Look at the diagram below representing how the request is processed and the resource files are delivered from the database.
Follow him on Twitter and Google+.
Latest posts by Ajitesh Kumar (see all)
- Data Science – R Packages & Methods for naive Bayes Classification - December 16, 2014
- Learn R – How to Fix Read.Table Command Reading Lesser Rows - December 15, 2014
- Data Science – Data Cleaning R Commands for Text Classification Problems - December 12, 2014