ETL vs ELT
ETL :
Image source : Google images
- Raw data is retrieved and tranformed being being saved.
- Suitable for systems that only require simple models
- The transformation work in ETL takes place in a specialized engine, and it often involves using staging tables to temporarily hold data as it is being transformed and ultimately loaded to its destination.
- Transformation involves - Basic data cleaning, deduplicating data, reformatting contents of individual fields.
- Stream oriented approach - emphasis on speed/throughput
- ETL can help with data privacy and compliance, removing sesnitive data before it arrives in analytical data models
- Often, the three ETL phases are run in parallel to save time. For example, while data is being extracted, a transformation process could be working on data already received and prepare it for loading, and a loading process can begin working on the prepared data, rather than waiting for the entire extraction process to complete.
ELT :
Image source : Microsoft
- Data is stored before being transformed i.e the transformation occurs in the target data store.
- Instead of using a separate transformation engine, the processing capabilities of the target data store are used to transform data. This simplifies the architecture by removing the transformation engine from the pipeline.
- ELT only works well when the target system is powerful enough to transform the data efficiently.
- scaling the target data store also scales the ELT pipeline performance.
- Suitable for constructing complex model.
- ELT might use optimized storage formats like Parquet, which stores row-oriented data in a columnar fashion and provides optimized indexing.
- Suitable for the cloud.
- Azure Data Factory - create and schedule data-driven workflows, using compute services such as Azure HDInsight, Hadoop, Azure Databricks
Comments
Post a Comment