Let’s come to the point, we are using our phones and laptops every single day for doing various tasks which includes handling our social media accounts and not only on a single platform. This year has changed everything by taking everything to an online platform and a huge technical boom has been created since then. Companies such as Facebook, Snap chat, Google and so many more have billions of active users under their name and collecting data and information from every single user. So, the question which arises is where these companies are storing this huge amount of data !?
Before discussing anything, we should know about some Facts and Stats .
These numbers says it all. Everyday approximately 500 terabytes of data is stored by any company which is a huge figure. Not only that it is rapidly increasing as number of users are also increasing at the same pace.
- In 2014, there were 2.4 billion internet users, that number grew to 3.4 billion by 2016, and in 2017 ,300 million internet users were added and now in 2020, there are more than 4 billion internet users which is 83% increment in number of internet users.
- Over 2.5 Quintilian bytes of data are created every single day, and it’s only going to grow from there.
- By 2020, it’s estimated that 1.7MB of data will be created every second for every person.
- 340,000 tweets are sent per minute.
- More than 570 new websites are created every minute.
- Every minute 24 hours of video is uploaded to YouTube. More video content is uploaded to YouTube in a 60-day period than the three major U.S. television networks created in 60 years.
How data is playing importance in business?
Suppose you are using any social media platform and you upload any photo which you want to share with your network/connections and it gets deleted after an hour of uploading. It will be discouraging for any user to see this thing since we want any data to store permanently in the profile or account. Whether it be Facebook or Instagram or any other site, you can always see your data which you stored right after you joined with an account. So, data is the only thing which builds a trust inside a customer for any company. It is important for the company as it helps them
- To find new customers
- To track social media interaction with brands
- To improve customer retention rate
- To capture market trends and customer inclination
- To predict future trends and requirements
Problems faced when storing Data
Since the amount of Data is extremely huge, few problems arise while storing these which are mentioned below:
- Volume : Storing this much data needs a lot of storage. We can’t use small Hard disk drives or solid state drives(SSD) which we generally use in our local machines/laptops that has a storage capacity of 1 TB or somewhere around that. But we are talking about Exabytes and Zettabytes so it is hard to create such storage devices. Although some Storage Solution companies like Dell EMC, IBM, etc. can create such storage but it will have further disadvantages to that such as cost and velocity.
- Cost : Higher storage devices require higher labor work and hence it is highly expensive. We can compare that when we buy a Hard Disk for our computer of storing capacity of 1 TB, it generally costs around 3–5K depending on the company, so it will be exponential in price when it comes to a higher capacity storage device.
- Velocity : Also, we don’t want to wait for anything since time is money. We want to grab information we search as fast as we can. Suppose we search anything on Google and it takes a day to fetch that information and then display, no one is going to use that. So we need a device or architecture which can fetch the information within seconds. Generally we see that RAM loads data very fast since it is volatile in nature as compared to Hard disk or SSD which are storing data persistently.
All these problems are commonly referred as “BIG DATA”.
To solve the Big Data problem, a concept is created which is known as “DISTRIBUTED STORAGE”
What actually is Distributed Storage?
This can be explained by taking an easy example.
Suppose we have a file of 40 GB and also we have four servers(machines) having 10 GB storage, so it is obvious that we can’t store in a single server or machine. But we can do one thing that we can split the file into chunks of 10 GB and we can store them in the different servers.
By this method, problem of volume needed is resolved and also the data will be persistent and if we store this data in parallel, problem of velocity is also resolved. For more velocity, we can add more slave nodes/servers.
It can be technically defined as:
A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.
The most common topology used for implementing Distributed Storage is Master-Slave or Multi-node cluster.
Master is the main server managing all the other servers which are known as slaves. Master is also known as Name Node and Slave is also known as Data Node.
Some of the tools which can be used to implement Distributed Storage are:
These are some common tools which are currently used by industries to overcome Big Data problem.
I will discuss some technologies in the further blogs, till then have a read at this.
Link for my LinkedIn profile: