Azure Databricks
Setting up all of these spark components could be a lot of work to do for any organization as there are a lot of computers to configure to keep up to date and to manage to maintain and this is where databricks finally comes into the picture
Databricks wrapper spark with a management solution that completely takes away your need to manage and maintain all of those servers and instances
Instead you simply say what version of the platform you need and what are the compute and memory resources, what is the skew of vm you need and databricks will manage that dynamically allocating additional instances and provisioning data breaks taking away the burden of managing the individual worker and management nodes.
Additionally, brings some new features into spark.
Azure Databricks is a fully managed, cloud based big data, and machine learning platform.
Added with an enterprise level scalability and security of Microsoft Azure platform, Azure Databricks makes it very simple to run large scale Spark workloads.
The teams of Azure and Databricks have worked together to make it a managed first party application on Azure. So Databricks is integrated natively with Azure and its services.
This means that Azure SLA also applies to Azure Database, which is 99.95% of the time. This is a huge deal for companies since the Databricks software is entirely backed up by the Microsoft and Microsoft Azure SLAs.
Databricks is fully compatible with Azure Active Directory and offers a Role Based Access Control, which means there is no need to handle users and view them separately. Unified billing is possible as we pay for the use of Databricks for storage, and for VMs, and the disk, and all the other resources which created as part of the cluster through a single build. It is very helpful for organizations.
We already know Data Factory is very famous service within Azure which can be used to ingest data into a cloud architecture. The Kafka or Event Hub can be very useful service to ingest streaming kind of data into a Data Lake and Data Lake is the first choice of all the organizations to store the data as It is very cheap, fast, as well as it has the property of Hierarchical namespace property, it can be integrated with the Databricks, or other Hadoop ecosystem technologies.
Databricks and Data Lake have a very good integration. Databricks can access data from Data Lake without moving data from Data Lake to Databricks. Databricks can process the data. So basically the point is, the data will remain in Data Lake. Databricks will not change the location of data. But within Data Lake, Databricks can process their data. Basically the role of Databricks is to process the data, to transform the data, to clean the data, to munch the data, to wrangle the data if required from Data Lake or other sources. And finally push that data into a more structured format and maybe like SQL database or SQL warehouse or maybe an Analytics Service, or Cosmos DB, or maybe some other storage service, depends on our application architecture, and finally data from the model and serve can be used by all these reporting tools to project that data to the business.
Within your Azure Databricks workspace, you can quickly and easily access Jupyter notebooks and begin writing code in Python, SQL, Scala, or R.
Comments
Post a Comment