by Mike Sun | May 20, 2020 | Apache Spark, Apache Spark Cafe, Java, Python, R, Scala
RDD, DataFrame, and Dataset are the three most common data structures in Spark, and they make processing very large data easy and convenient. Because of the lazy evaluation algorithm of Spark, these data structures are not executed right way during creations,...
by Bryan Chuinkam | May 20, 2020 | Apache Spark, Apache Spark Cafe, Python
What is Koalas? Koalas is an implementation of the pandas DataFrame API on top of Apache Spark. Pandas is the go-to Python Library for data analysis, while Apache Spark is becoming the go to for big data processing. Koalas allows you leverage the simplicity of Pandas...
by Mike Sun | May 20, 2020 | Apache Spark, Apache Spark Cafe
Since year end 2014, there has been an increase in the number of Google searches comparing Apache Spark to Hadoop. What brings people who are experts in Big Data, Data Science, and Data Analysis to Apache Spark (Spark)? Spark is a fast and expressive cluster computing...
by Andrea Bacqué | May 14, 2020 | Apache Spark, Customer Experience, Python, SAS, Solutions
The future of data science can be confusing – there’s so many options when it comes to advanced analytics. Big data initiatives have been my focus for decades now and I think I have an idea of where things are headed. Of course, this is just my opinion but...
by Andrea Bacqué | May 4, 2020 | Apache Spark, Python, SAS, Solutions
As many are pointing out, the pandemic is causing major global distribution shifts. For those working with advance analytics, machine learning and artificial intelligence programs, this pandemic is messing up any confidence of historical data used with predictive...
by Bryan Chuinkam | Apr 27, 2020 | Apache Spark, Apache Spark Cafe, Python, SAS, Solutions
I would like to introduce you to SPROCKET SearchParty, the first step in the SPROCKET automated migration solution. What is SPROCKET SearchParty? SPROCKET SearchParty plays an important function in planning your SAS to PySpark code conversion. It identifies and...