HDInsight is built on top of Azure and supports popular open-source frameworks such as Hadoop, Spark, Hive, and HBase. It also provides integrations with other Azure services such as Azure Data Lake Storage and Azure Data Factory. HDInsight is a cost-effective solution that allows users to process large amounts of data quickly and easily, without the need for expensive hardware or the hassle of managing a Hadoop cluster.
Features
Azure HDInsight
supports batch processing,
data warehousing, IoT and data science.
As mentioned earlier it includes Apache Hadoop, Spark,
HBase, Kafka, Storm, and Interactive Query.
Hadoop includes Apache Hive,
HBase, Spark, and Kafka.
Hadoop stores data in a file system or HDFS,
and Spark stores data in memory. This difference makes
Spark about 100 times faster.
HBase is a NoSQL database built on Hadoop.
It's commonly used for search engines.
HBase offers automatic failover,
and Kafka is an open source platform
that's used to compose data pipelines.
It offers message queue functionality which allows
users to publish or subscribe to real-time data streams.
Storm is a distributed
real-time streamlining analytic solution.
It supports common programming languages
like Java, C-sharp and Python.
Finally, Interactive Queries allow
to query the state of
your stream processing application without needing to
materialize that state to external databases or storage.
Data engineers use hive to
run ETL operations on the data you're
ingesting or orchestrate hive queries
in Azure Data Factory.
In Hadoop, you use Java and Python to process big data.
Mapper consumes and analyzes input data.
It then emits tuples that reducer can analyze.
Finally, reducer runs summary operations
to create a smaller combined result set.
Spark processes streams by using Spark Streaming.
For machine learning, it uses
the 200 pre-loaded Anaconda libraries with Python.
It uses GraphX for graph computations.
Developers can remotely submit
and monitor jobs from Spark.
Storm supports common programming languages
like Java, C-sharp and Python.
For running queries, Hadoop supports
Pig and HiveQL languages and in Spark,
data engineers use Spark SQL.
For security Hadoop supports encryption,
Secure Shell or SSH,
shared access signatures and
Azure Active Directory security.
Source : Microsoft
Comments
Post a Comment