Apache Spark: Difference between revisions
(Created page with "== Apache Spark == Apache Spark<ref>https://spark.apache.org/</ref> is an open-source developed at UC Berkeley in Scala, Spark is a multi-language unified engine for large-scale data analytics, data science, and machine learning on single-node machines or clusters. Spark can easily support multiple workloads ranging from batch processing, interactive querying, real-time analytics to machine learning and graph processing. Unlike many other platforms with limited option...") |
|||
Line 5: | Line 5: | ||
== Use Apache Spark in Jupyter through PySpark == | == Use Apache Spark in Jupyter through PySpark == | ||
PySpark supports most Apache Spark features such as Spark SQL, DataFrame, MLib, Spark Core, and Streaming. PySpark allows users to write Spark applications using the Python API and provides the ability to interface with the Resilient Distributed Datasets (RDDs) in Apache Spark. PySpark also allows Python to interface with JVM objects using the Py4J [[library]]. | The Spark Python API (PySpark) exposes the Spark programming model to Python. PySpark supports most Apache Spark features such as Spark SQL, DataFrame, MLib, Spark Core, and Streaming. PySpark allows users to write Spark applications using the Python API and provides the ability to interface with the Resilient Distributed Datasets (RDDs) in Apache Spark. PySpark also allows Python to interface with JVM objects using the Py4J [[library]]. | ||
Revision as of 17:31, 28 November 2023
Apache Spark
Apache Spark[1] is an open-source developed at UC Berkeley in Scala, Spark is a multi-language unified engine for large-scale data analytics, data science, and machine learning on single-node machines or clusters.
Spark can easily support multiple workloads ranging from batch processing, interactive querying, real-time analytics to machine learning and graph processing. Unlike many other platforms with limited options or requiring users to learn a platform-specific language, Spark supports all leading data analytics languages such as R, SQL, Python, Scala, and Java.
Use Apache Spark in Jupyter through PySpark
The Spark Python API (PySpark) exposes the Spark programming model to Python. PySpark supports most Apache Spark features such as Spark SQL, DataFrame, MLib, Spark Core, and Streaming. PySpark allows users to write Spark applications using the Python API and provides the ability to interface with the Resilient Distributed Datasets (RDDs) in Apache Spark. PySpark also allows Python to interface with JVM objects using the Py4J library.
This site introduce basic Big data analysis using PySpark