Total 12 Posts

Apache Spark

Data+AI Summit 2022 - Top Announcements and Recap

Data+AI Summit 2022 [https://databricks.com/dataaisummit/] is the world’s largest gathering among…

Read More


Jul 07, 2022 3 min read

Theo LEBRUN

Data

Apache Spark 3.0

Databricks recently announced the release of Apache Spark 3.0 [https://databricks.com/blog/2020/…

Read More


Jun 23, 2020 3 min read

Theo LEBRUN

Apache Spark

Transient Cluster on AWS

This post demonstrates a cost-effective and automated solution for running Spark-Jobs on the EMR cluster on a daily basis using CloudWatch, Lambda, EMR, S3, and SNS.…

Read More


Jun 03, 2019 6 min read

Sripriya Rajanna

Apache Spark

Performance Tweaking Apache Spark

Apache Spark Streaming applications need to be monitored frequently to be certain that they are…

Read More


Jun 26, 2017 5 min read

Jeannine Stark

Data Streaming

Incrementally loaded Parquet files

In this post, I explore how you can leverage Parquet [https://parquet.apache.org/] when…

Read More


May 17, 2017 4 min read

Alexis Seigneurin

Apache Spark

MongoDB and Apache Spark - Getting started tutorial

MongoDB and Apache Spark are two popular Big Data technologies. In my previous post [https:…

Read More


May 03, 2017 6 min read

Raphael Brugier

Apache Spark

Introduction to the MongoDB connector for Apache Spark

MongoDB is one of the most popular NoSQL databases. Its unique capabilities to store document-oriented…

Read More


Mar 31, 2017 3 min read

Raphael Brugier

Apache Spark

Spark Summit East 2017 - A summary

I attended Spark Summit East 2017 last week. This 2 day conference - February 8th…

Read More


Feb 21, 2017 4 min read

Alexis Seigneurin

Apache Spark

A tour of Databricks Community Edition: a hosted Spark service

With the recent announcement [https://databricks.com/blog/2016/02/17/introducing-databricks-community-edition-apache-spark-for-all.html] of the…

Read More


Apr 13, 2016 6 min read

Raphael Brugier

Apache Spark

Testing strategy for Spark Streaming - Part 2 of 2

In a previous post [https://test-ippon.ghost.io/testing-strategy-apache-spark-jobs/], we’ve seen why it’s…

Read More


Mar 30, 2016 6 min read

Raphael Brugier

Apache Spark

Testing strategy for Apache Spark jobs - Part 1 of 2

Like any other application, Apache Spark jobs deserve good testing practices and coverage. Indeed, the…

Read More


Mar 11, 2016 6 min read

Raphael Brugier

Apache Spark

Applying Data Science with Apache Spark Coding Dojo

This week, at the power plant (Ippon Technologies USA headquarters), we had the pleasure of…

Read More


Aug 28, 2015 3 min read

Kile Niklawski

Apache Spark