Channel Avatar

Soumil Shah @UC_eOodxvwS_H7x2uLQa-svw@youtube.com

43K subscribers - no pronouns :c

I’m Soumil Nitin Shah, a Lead Data Engineer and Apache Hudi


23:35
Learn to Use ClickHouse to Move Data from Kafka Topics into ClickHouse Tables Real Time|Hands-OnLabs
26:08
Bulding Universal Datalake with EMR Serverless Query with Snowflake|Athena|Spark Hands on Labs
05:11
Build Modern Data Lakes on AWS with EMR Serverless and Hudi Streamer: Interoperate with XTable
06:29
How to Use External Python Packages in a PySpark Job on EMR Serverless: A Beginner’s Guide
01:02
Apache Hudi August 2024 Newsletter
06:52
Developer Guide: How to Submit Iceberg PySpark(Python) Jobs to EMR Serverless (7.1.0) with AWS Glue
06:32
Developer Guide: How to Submit Hudi PySpark(Python) Jobs to EMR Serverless (7.1.0) with AWS Glue
07:59
How to load data from S3 into Snowflake using Snowflake Copy Command Hands on Labs
08:16
How to Consume Apache Hudi Tables in Snowflake, Iceberg, and Athena | Hands-On Labs
09:21
Use DeltaStreamer & JDBC to Pull Data from Snowflake into Iceberg, Delta, Hudi: Hands-on Labs
07:01
Quick Getting Started with Iceberg on Glue Notebook: Insert, Deletes, and Merge INTO Commands
03:02
What's Next on My Channel: Upcoming Projects, Exciting Topics, and Recent Updates!
06:39
How to use Kafka Connect S3SinkConnector with Minio | Dump Data from kafka topics to Minio Buckets
07:59
Use S3 as an External Volume in Snowflake along with X table to interoperate as Hudi|Iceberg|Delta
06:01
How to Use AWS S3 as an External Volume in Snowflake | Hands on guide
05:56
Personal Opinion: Which Table Format Do I Prefer? (Hudi, Iceberg, or Delta)
06:35
Insert | Update| Delete | TimeTravel|Schema Evolution|with Iceberg and Minio Requested by Saurabh
07:50
Using Bucket Index & Right Partitioning with Hudi for 660GB Tables & 4.4B Records on AWS Glue 4.0
09:32
What are some of the common Interview Questions for Apache Hudi
05:05
Learn How to Run the Apache X Table in Docker Environments with Rocky Linux (Hudi| IceBerg|Delta)
06:57
Understanding Apache Hudi's MERGE INTO Command with Minio and HiveMetaStore
11:01
Apache Hudi 1.0.0 leverages LSM trees to achieve faster writes and save storage.
14:43
My Journey into Apache Hudi: How and When I Started Learning and Tips for Beginners
05:29
Hudi JAR Compilation: Build & Compile Hudi JARs for Specific Spark Versions Using Docker Containers
06:23
Unlock Insights: A Step-by-Step Guide to LakeView Free Community Edition with AWS Glue
03:36
Milestone Achieved: 42,000 Subscribers! Thank You, Everyone!
09:03
Fast GeoSearch on Data Lakes: Efficiently Build Geo Search Using Hudi for Lightning-Fast Retrieval
07:24
Building Keyword Search in Hudi: Inverted Indexes, Record Level | Keyword Search in Datalakes
08:55
Storing Athena Query Metrics in Hudi for Advanced Analysis and Audit using AWS Glue
07:20
Using OpenAI Vector Embedding to Store Large Vectors in Hudi with MiniO for Cost-Effective AI Apps
08:23
Learn How to Use Apache Hudi Streamer with DataHUB An Open Source Metadata Platform
06:19
Getting Started with X-Table and Unity Catalog | Universal Datalakes | Hands on Labs
08:15
Hudi Using Spark SQL on AWS S3: Insert, Update, Deletes, Stored Procedures on AWS Glue Notebooks
07:03
How to Use Hudi Streamer on New EMR 7.1.0 Spark 3.5.1 and Hudi 0.14.1 | Hands-on Labs
04:29
How to Use Hudi Streamer with Hudi version 0.15.0 | Hands on Guide |
03:10
How to Execute Postgres Stored procedures in Spark | Hands on Guide
06:47
Learn How to Ingest Data from Hudi Incrementally hudi table changes into Postgres Using Spark
08:17
Universal Datalakes: Interoperability with Hudi, Iceberg, and Delta Tables with AWS Glue Notebooks
03:38
4 Different Ways to fetch Apache Hudi Commit time in Python and PySpark
05:41
OneTable to translate a Hudi table to Iceberg format and sync with Glue Catalog
04:30
Learn How to Run Apache X Table Sync Command on AWS Cloud Shell | Interoperate Hudi Iceberg delta
06:32
Learn How to Ingest XML files with AWS Glue into Hudi Datalakes | Step by Step guide
08:56
Hudi with Spark SQL for Beginners | Insert| Updates | Delete | incremental Query | Stored procedures
05:19
How we Utilized Hudi's Time Travel Query to Investigate Bid and Spend | Going Back in Time with Hudi
05:55
Hudi Cleaning Process | hoodie.keep.min.commits and hoodie.keep.max.commits Explained
02:58
AWS Glue Tutorial: How to Filter and Exclude S3 Files while reading as Glue Dynamic Frame
03:58
How to Read S3 Partitioned Data as Columns in AWS Glue DF
08:00
Multiple Spark Writers to Hudi tables | Hands on Labs
05:20
Learn How to Ingest data from pulsar Topic into Hudi with DeltaStreamer | Hands on Labs
03:46
Build Hudi Date Dimension in Minutes with Spark SQL Minio and Query with Trino
24:00
Hudi Streamer implementing Slowly Changing Dimension Type 2 and Query Real Time Trino | Hands on
04:43
Demo Video : Hudi Delta Streamer Implementing Slowly Changing Dimension and Query that using Trino
06:13
DeltaStreamer with incremental ETL and Broadcast Joins for Faster ETL
07:24
Learn How to use Cloudwatch metrics with Hudi AWS Glue Jobs
02:16
Tips to Feel Valued at Work: Overcoming Unappreciation
04:37
How to Use Spark 3.5.1 on Kubernetes running locally | Step by Step Guide using Helm
05:57
Learn how to Spinup Trino on Kubernetes running Locally on Windows | Mac machine | Simple Guide
01:04
Mastering ETL and Data Warehousing with AWS Glue
01:27
Mastering Elasticsearch Your Comprehensive Guide to Shards, Performance Tuning, and More
06:51
Unleashing the Power of Serverless: Serving Gold Hudi Tables with AWS Lambda