Talk Summary: Apache Spark is a General-purpose computing engine that has in-memory computing capabilities. It can be used for a variety of workloads like Batch processing, Iterative problems, stream processing, etc. It is designed to be highly scalable and provides various APIs like Scala, Python, R, Java, and SQL. It can be easily integrated with other BIG Data tools as well. During this workshop, we will discuss how we can use spark DataFrames for Data Analysis and working with various different data Sources like HIVE, CSV, Parquet etc.
Bio: Neeraj is a Data Scientist at Expedia Group™. He has more than a decade of experience building software and is currently working in AI & Data Science team at Expedia Group™. He has delivered various training and workshops both internally and externally. Prior to Expedia Group™, he worked on various Big Data projects, dealt directly with clients as a Technical specialist, and migrated various ETL pipelines to Apache Spark. He also received a Gold Medal for securing first place in his batch during his undergraduate days.