January 23 – 25
9:00 AM – 5:00 PM (Eastern Time)
600 Congress Avenue
Austin, Texas 78701
This course is designed for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark. This course covers the same material as our three-day Apache Spark Programming course.
The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.
Each topic includes slide and lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. All class code is directly usable with pure open-source Spark or any commercial Spark distribution.
After taking this class you will be able to:
• Describe Spark’s fundamental mechanics
• Use the core Spark APIs to operate on data
• Articulate and implement typical use cases for Spark
• Build data pipelines with SparkSQL and DataFrames
• Analyze Spark jobs using the UIs and logs
• Create Streaming and Machine Learning jobs
• Spark Overview
• RDD Fundamentals
• SparkSQL and DataFrames
• Spark Job Execution
• Cluster Architectures for Spark
• Intro to Spark Streaming
• Machine Learning Basics
All participants will need a laptop with updated versions of Chrome or Firefox (Internet Explorer and Safari are not supported)
Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.