ETL vs ELT

 ETL :

Image source : Google images

  1. Raw data is retrieved and tranformed being being saved.
  2. Suitable for systems that only require simple models
  3. The transformation work in ETL takes place in a specialized engine, and it often involves using staging tables to temporarily hold data as it is being transformed and ultimately loaded to its destination.
  4. Transformation involves - Basic data cleaning, deduplicating data, reformatting contents of individual fields.
  5. Stream oriented approach - emphasis on speed/throughput
  6. ETL can help with data privacy and compliance, removing sesnitive data before it arrives in analytical data models
  7. Often, the three ETL phases are run in parallel to save time. For example, while data is being extracted, a transformation process could be working on data already received and prepare it for loading, and a loading process can begin working on the prepared data, rather than waiting for the entire extraction process to complete.

ELT :

Image source : Microsoft


  1. Data is stored before being transformed i.e the transformation occurs in the target data store.
  2. Instead of using a separate transformation engine, the processing capabilities of the target data store are used to transform data. This simplifies the architecture by removing the transformation engine from the pipeline.
  3. ELT only works well when the target system is powerful enough to transform the data efficiently.
  4. scaling the target data store also scales the ELT pipeline performance.
  5. Suitable for constructing complex model.
  6. ELT might use optimized storage formats like Parquet, which stores row-oriented data in a columnar fashion and provides optimized indexing.
  7. Suitable for the cloud.
  8. Azure Data Factory - create and schedule data-driven workflows, using compute services such as Azure HDInsight, Hadoop, Azure Databricks




Comments