Understanding Azure Data Factory
- Get link
- X
- Other Apps
- Pipelines
- Activities
- Datasets
- Linked services
- Data Flows
- Integration Runtimes
A data factory might have one or more pipelines. A pipeline is a logical grouping of activities that performs a unit of work. Together, the activities in a pipeline perform a task. For example, a pipeline can contain a group of activities that ingests data from an Azure blob, and then runs a Hive query on an HDInsight cluster to partition the data.
Activity
Activities represent a processing step in a pipeline. For example, you might use a copy activity to copy data from one data store to another data store. Similarly, you might use a Hive activity, which runs a Hive query on an Azure HDInsight cluster, to transform or analyze your data. Data Factory supports three types of activities: data movement activities, data transformation activities, and control activities.
Datasets
Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
Linked services
Linked services are much like connection strings, which define the connection information that's needed for Data Factory to connect to external resources. For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage account.
Integration Runtime

- Schedule Trigger : A trigger that invokes a pipeline on a wall-clock schedule. It supports periodic and advanced calendar options.
- Event based trigger - A trigger that responds to an event.
- Tumbling window trigger - A trigger that operates on a periodic interval, while also retaining state. Creates windows, Works well for historical dates Self Dependency Trigger - Using of offsets becomes compulsory
Data flows
These are one of the features inside the Azure Data Factory which allows data engineers to develop data transformation logic in a graphical approach without writing code. The resulting data flows can then be executed as activities within Azure Data Factory pipelines that use scaled-out Spark clusters. Data flows will run on your own execution cluster for scaled-out data processing. ADF internally handles all the code translation, spark optimization and execution of transformation. Data flow activities can be operationalized via the existing Data Factory scheduling, control, flow, and monitoring capabilities.
There are two types of Data flows:
- Mapping Data Flow
- Wrangling Data Flow
A pipeline run is an instance of the pipeline execution. Pipeline runs are typically instantiated by passing the arguments to the parameters that are defined in pipelines. The arguments can be passed manually or within the trigger definition.
Variables can be used inside of pipelines to store temporary values and can also be used in conjunction with parameters to enable passing values between pipelines, data flows, and other activities.
ADF security
Data can be copied from more than 80 connectors or in between 80 different applications in Azure. These applications may be Software as a Service kind of applications like Dynamic 365 or Salesforce. Or these applications may be on-premises data stores like SQL Server or Oracle. Or these applications may be cloud data stores like Azure SQL Database or maybe Amazon S3.
Thus it enables to copy data from application in any environment to different application in any other environment. While copying, it can convert file formats, like zip or unzip file , map the columns between the source and the destination.
In addition to copying data, transformation of the data is also possible. Previously the only way to do this was using the external services like HDInsight, or Databricks, or maybe SQL Server stored procedures. But very recently in 2019, Azure Data Factory added the Data Flow task. Now this Data Flow task is very much like a Data Flow in the SSIS and it has all possible kind of transformations.
Now both copy as well as transforming the data is possible right from the same interface in Data Factory. This makes Data Factory a complete ETL tool or Data ingestion tool.
- Get link
- X
- Other Apps
Great Blog.Thanks for sharing.
ReplyDeleteAzure Data Engineer Course
Azure Data Engineer Training
Azure Data Engineer Online Training
Azure Data Engineer Training Ameerpet
Data Engineer Training Hyderabad
Azure Data Engineer Training Hyderabad
Data Engineer Course in Hyderabad
Great Blog.Thanks for sharing.
ReplyDeleteAzure Data Engineer Course
Azure Data Engineer Training
Azure Data Engineer Online Training
Azure Data Engineer Training Ameerpet
Data Engineer Training Hyderabad
Azure Data Engineer Training Hyderabad
Data Engineer Course in Hyderabad
Awesome Blog.Thanks for sharing.
ReplyDeleteAzure Data Engineer Course
Azure Data Engineer Training
Azure Data Engineer Online Training
Azure Data Engineer Training Ameerpet
Data Engineer Training Hyderabad
Azure Data Engineer Training Hyderabad
Data Engineer Course in Hyderabad