Mike Sun | WiseWithData

RDDs vs DataFrames vs DataSets: The Three Data Structures of Spark

by Mike Sun | May 20, 2020 | Apache Spark, Apache Spark Cafe, Java, Python, R, Scala

RDD, DataFrame, and Dataset are the three most common data structures in Spark, and they make processing very large data easy and convenient. Because of the lazy evaluation algorithm of Spark, these data structures are not executed right way during creations,...

The Rise in Popularity of Apache Spark

by Mike Sun | May 20, 2020 | Apache Spark, Apache Spark Cafe

Since year end 2014, there has been an increase in the number of Google searches comparing Apache Spark to Hadoop. What brings people who are experts in Big Data, Data Science, and Data Analysis to Apache Spark (Spark)? Spark is a fast and expressive cluster computing...