How AWS Glue Data Quality Helps You Achieve Compliance For Your Data Lake With Confidence

The "recent" creation of data lakes by thousands of Organizations also created a…

Read More


Aug 09, 2023 4 min read

Theo LEBRUN

AWS Glue

An Introduction to Delta Lake: The Open-Source Storage Layer for Big Data

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute…

Read More


Jul 11, 2023 5 min read

Theo LEBRUN

Data

Streamline your Data Transformations by Running dbt Directly on Databricks using Jobs

Running dbt (data build tool) on Databricks is a great alternative to dbt Cloud if…

Read More


Apr 26, 2023 4 min read

Theo LEBRUN

Data

Boost the Performance of Your Databricks Jobs and Queries

Databricks is doing a lot of optimization and caching by default to have jobs and…

Read More


Mar 10, 2023 4 min read

Theo LEBRUN

Data

Capture Data History With SCD2 Using Databricks Delta Live Tables

Delta Live Tables is a great way to build and manage reliable batch and streaming…

Read More


Feb 14, 2023 4 min read

Theo LEBRUN

Data+AI Summit 2022 - Top Announcements and Recap

Data+AI Summit 2022 [https://databricks.com/dataaisummit/] is the world’s largest gathering among…

Read More


Jul 07, 2022 3 min read

Theo LEBRUN

Data

Transform Data in your Warehouse using dbt, Airflow, and Redshift

Data Build Tool [https://www.getdbt.com/] (better and simply known as "dbt"…

Read More


Apr 20, 2022 4 min read

Theo LEBRUN

AWS

Sync Two S3 Buckets Using CDK and a Lambda Layer Containing the AWS CLI

The AWS Command Line Interface (CLI) [https://aws.amazon.com/cli/] is a great tool…

Read More


Apr 08, 2021 4 min read

Theo LEBRUN

Cloud

Process CSVs from Amazon S3 using Apache Flink, JHipster, and Kubernetes

Apache Flink [https://flink.apache.org/] is one of the latest distributed Big Data frameworks…

Read More


Feb 04, 2021 6 min read

Theo LEBRUN

Data Streaming

Use Stargate by DataStax to effortlessly store and query your data

Stargate [https://stargate.io/] is one of the latest shiny tools from DataStax [https://www.…

Read More


Jan 15, 2021 5 min read

Theo LEBRUN

Cassandra

Saving and Analyzing Trending Topics on Twitter using AWS Athena, Lambda, and CDK

With more than 300 million active users, Twitter is still one of the more optimal…

Read More


Aug 11, 2020 5 min read

Theo LEBRUN

Twitter

Apache Spark 3.0

Databricks recently announced the release of Apache Spark 3.0 [https://databricks.com/blog/2020/…

Read More


Jun 23, 2020 3 min read

Theo LEBRUN

Apache Spark

Build an event sourcing system on AWS using DynamoDB and CDK

Over the past few years, event sourcing has become a popular pattern used in modern…

Read More


May 05, 2020 5 min read

Theo LEBRUN

AWS

AWS Cognito and JHipster for the LOVE of OAuth 2.0

OAuth 2.0 [https://oauth.net/2/] is a stateful security mechanism. OpenID Connect (OIDC)…

Read More


Feb 25, 2020 5 min read

Theo LEBRUN

JHipster

Deploying a JHipster app to AWS using Elastic Beanstalk

JHipster [https://www.jhipster.tech/] is a great development platform to help you bootstrap a…

Read More


Nov 07, 2019 5 min read

Theo LEBRUN

JHipster