How big MNC’s like Google, Facebook, Instagram etc stores, manages and manipulate Thousands of Terabytes of data with High Speed and High Efficiency

Mohd. Ajmal Khan
2 min readSep 16, 2020

Ever wondered how big MNC’s like Google, Facebook, etc handles that much data.

For this let us take the example of Facebook, to understand how they handle this much data.

Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half-hour.

and this much data per day gave birth to a new problem which is known as Big data, let me clearly state that big data is not a technology but a problem of this world, i.e. how to store this much data how to access this much data, also how to minimize the cost of all this.

if you thinking “ we can make a bigggg storage memory “ yep it’s possible, but, but can you think what will be the cost of the making one, extremely high right, so we cant choose this option.

even if we make a “big hard-disk(storage unit)” there will be a problem of accessing the data, as we all aware that accessing is slow so how can we solve that?

These problems are technically known as Volume and Velocity.

There is no such machine that can store as well as access with speed at low cost available now.

SO to solve the problem of Big data a new concept was introduced by some genius guy which is known as distributed system, and the product of this concept is known as Hadoop.

A distributed system is a system in which components are located on different networked computers, which can communicate and coordinate their actions by passing messages to one another. The components interact with one another in order to achieve a common goal.

suppose we have stored a data of 205 Gb, but the limitation is that we don’t have that much space, thus we can use the distributed system concept to distribute the data in further slave devices(usually servers). The broad principle is to take a task (like an individual search), break it down into smaller tasks, have hundreds if not thousands of individual computers chew away at those smaller tasks, put the results together, and serve them up to the user.
this solves the problem of BIG DATA:
the Access time(velocity) is reduced and also the storage problem has been taken care of (volume)
This whole system is taken as a cluster, and we can make thousands of clusters. .Thus know as distributed system clustering

SOME BENEFITS :

  • Resource Sharing
  • Openness
  • Scalability
  • Fault Tolerance

--

--