SQL processing in Apache Spark

Tudor Lăpușan - BigData/Hadoop DevOps @

Workshop room

12nd November, 11:00-13:00

Apache Spark is the de facto framework choice for big data processing, but learning it can be a little intimidating, due to its complexity.

The main role of Spark SQL is to reduce this complexity and to allow you to run queries on big data with a minimum learning effort. All you need to know is to write SQL queries !

In this workshop you will have a short introduction to Apache Spark, its architecture, data structures and after that we will focus on Spark SQL :

  • Create tables
  • Investigate table schema
  • Write and run SQL queries
As a workshop participant, you can follow one of the approaches:
  • install the project before and run all the code examples during the workshop from your laptop. This requires some effort, especially if you are new to python. The main benefit of this approach is that you already have an up and running environment for Apache Spark. Information about how to install the project : https://github.com/tlapusan/itdays-2019
  • participate without installing the project. You will need to pay attention to code examples from my notebook. If you will fall in love with Apache Spark, you can install and run the code examples at home, after the workshop. Because the workshop time is limited to 2 hours, I would like to stay as much as possible to code examples and very little time on infrastructure issues.
github repo
Workshop tools : Apache Spark, Python, Jupyter Notebook

All the details regarding infrastructure setup and datasets will be provided with a few days before the conference.
Target audience : people interested in big data, apache spark, spark sql, data analysts, database developers.

Tudor Lăpușan

I'm passionate about Big Data/Machine Learning technologies and startups. I heard for the first time about Apache Hadoop when I was at my master courses and from that time I was fascinated about Big Data world.
My first big professional success was when I introduced the Apache Hadoop technology into the company I'm working for, Skobbler, in 2012. From that time I'm working on Big Data projects. My current work involves designing and writing scalable Big Data/Machine Learning projects.
From this passion, I have initiated a BigData/DataScience community in my town, Cluj-Napoca, Romania, with the goals of meeting new passionate people, working together on cool projects and helping IT companies to adopt Big Data technologies. Until now we had many meetups and workshops with many participants.