What is the difference between Gen1 and Gen2?
Microsoft created Data Lake Gen1 storage on top of Hadoop HDFS storage. So Gen1 has all the advantages of Hadoop HDFS.
Microsoft has another service called Blob Storage and Blob Storage has their own advantages.
Blob storage + Data Lake Gen1 = Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 account built up on top of Blob Storage. So it looks very similar. Only the property Hierarchical namespace makes it different. If we enable this, the storage account will become Data Lake Storage Gen2 account.
Wherever you see hierarchical namespace you should know that this is related to Data Lake Gen2.
What does the Hierarchical namespace means? What is the difference between blob storage and Gen2 ?
Consider a container created inside the Data Lake storage, and if there is a need and possibility to create another directory inside it, more kind of folder structure inside this container. This is called hierarchical folder structure, hierarchical namespace. We cannot have a folder hierarchical structure in Blob storage, that is only possible in Lake storage.
Data Lake Storage is designed to store
massive amounts of data for big data analytics.
As an example, consider Contoso Life Sciences,
Accounts and Research Center,
they analyze petabytes of genetic data,
patient data, and records of related sample data.
Data Lake Storage Gen 2 reduces computation times,
making the research faster and less expensive.
To ingest data into your data lake system,
use Azure Data Factory,
Apache Sqoop, Azure Storage Explorer, the AzCopy tool,
PowerShell or Visual Studio,
to use the File Upload feature to
import file sizes above two gigabyte,
use PowerShell or Visual Studio.
AzCopy supports a maximum file size of one terabyte
and automatically splits data files
that exceed 200 gigabytes.
In Data Lake Storage Gen 1,
data engineers query data by using the U-SQL language.
In Gen 2, use
the Azure Blob Storage API or
the Azure Data Lake System, ADLS, API.
Because Data Lake Storage
supports Azure Active Directory ACLs,
security administrators can control data access
by using the familiar active directory security groups.
Role-Based Access Control, or RBAC,
is available both in Gen 1 and Gen 2.
Built-in security groups include,
read-only users, right access users,
and full access users.
Enable the firewall to limit
traffic to only Azure services.
Data Lake Storage automatically encrypts
data at rest, protecting data privacy.
Reference and Credits : Microsoft
Comments
Post a Comment