Big Data Basics

Source Credits : IBM documentation, IBM Skills Network 

Understanding the Difference between Small vs Big Data


Big Data Lifecyle


Did you Know ?



The V's of Big Data:

1. Velocity : Speed at which data arrives
2. Volume : Increase in the amount of data stored over time.
3. Variety : Diversity of data
4. Veracity : Certainty of data
5. Value : Creates value when collected, processed and stored correctly.

Impact of Big Data : Day to day life use cases



 
Linear vs Parallel Processing :

  • In any normal analytics cycle, the functionality of the computer is to store data and move that data from its storage capacity into a compute capacity (which includes memory), and back to storage once important results are computed. 
  • With Big Data, you have more data than will fit on a single computer. 
  • Parallelism or Parallel processing can best be understood by comparing it to Linear processing. 
  • Linear Processing Linear processing is the traditional method of computing a problem where the problem statement is broken into a set of instructions that are executed sequentially till all instructions are completed successfully. If an error occurs in any one of the instructions, the entire sequence of instructions is executed from the beginning after the error has been resolved. It is evident from the processing method that Linear processing is best suited for minor computing tasks and is inefficient and time consuming when it comes to processing complex problems such as Big Data. 
  • Parallel Processing, here the problem statement is broken down into a set of executable instructions. The instructions are then distributed to multiple execution nodes of equal processing power and are executed in parallel. Since the instructions are run on separate execution nodes, errors can be fixed and executed locally independent of other instructions. 

Advantages of Parallel Processing :


Data Scaling :

Comments