Oh no, the video couldn't be loaded :(
You can try refreshing the page!

...

High Definition Standard Definition Theater Full screen
Video id : Z-X8jGXZraw
ImmersiveAmbientModecolor: #ceccc5 (color 2)
Video Format : (720p) openh264 ( https://github.com/cisco/openh264) mp4a.40.2 | 44100Hz
Audio Format: 140 ( High )
PokeTubeEncryptID: 7a616dea0f8286a8766dfad5be7a03afdd305a1f252192b78cd6edc88aa5d83bf77253fc588cdb10918be2539082fed7
Proxy : eu-proxy.poketube.fun - refresh the page to change the proxy location
Date : 1720216523838 - unknown on Apple WebKit
Mystery text : Wi1YOGpHWFpyYXcgaSAgbG92ICB1IGV1LXByb3h5LnBva2V0dWJlLmZ1bg==
143 : true
Explanation on Handling Duplicate Rows in PySpark | dropDuplicates() | dropDuplicates(column_name)
Jump to Connections
5,309 Views • Apr 30, 2024 • Click to toggle off description
To enhance your career as a Cloud Data Engineer, Check trendytech.in/?src=youtube&sub=mockdec for curated courses developed by me.

I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.

Most commonly asked interview questions when you are applying for any data based roles such as data analyst, data engineer, data scientist or data manager.

Link of Free SQL & Python series developed by me are given below -
SQL Playlist -    • SQL tutorial for everyone by Sumit Si...  
Python Playlist -    • Complete Python By Sumit Mittal Sir  

Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!

Social Media Links :
LinkedIn - www.linkedin.com/in/bigdatabysumit/
Twitter - twitter.com/bigdatasumit
Instagram - www.instagram.com/bigdatabysumit/
Student Testimonials - trendytech.in/#testimonials

Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs
Metadata And Engagement

Views : 5,309
Genre: Education
Date of upload: Apr 30, 2024 ^^


Rating : 4.937 (4/250 LTDR)

98.43% of the users lieked the video!!
1.57% of the users dislieked the video!!
User score: 97.65- Overwhelmingly Positive

RYD date created : 2024-05-11T12:53:40.990182Z
See in json
Tags

YouTube Comments - 8 Comments

Top Comments of this video!! :3

@sivahanuman4466

1 month ago

We can use row_number() partion by pks and order by audit_tmstamp. And filter only row_number =1 and load the df into Target s3.

1 |

@harshadk4264

2 months ago

if we have a hashkey column in the source S3 bucket then we can compare the same with the Redshift table and exclude in the where clause.

1 |

@saimanasa905

2 months ago

distinct is wide transformation

1 |

@gudiatoka

2 months ago

Drop duplicates keep first occurrence of duplicates you can use keep option

|

@alexoldfield373

2 months ago

Wouldn't the first step to be understanding why duplicate rows are occuring and then working backward from that based on the reason?

1 |

@nirmalkumarr2235

2 months ago

distinct narrow transformation???

1 |

@sundipperumallapalli

2 months ago

Passing attributes can be available in dropDuplicates Whereas not in distinct

|

@knowitnow2425

2 months ago

Hi sir , I have worked as python django developer for 1.8 years, I did azure dataengineer certification, but I don't have much knowledge in dataengineer field , now I have started learning pyspark and databricks, can you suggest career tips to learn skills and search for jobs

|

Go To Top