Apache Spark 101: A series on Apache Spark feats.

 In big data

What is Apache Spark?

Since 2009, Apache Spark has found its home on some of the largest companies in the world: from governments and banks to game companies and IT giants like Facebook, Apple or Microsoft. This platform has turned into a key element to the big data processing frameworks in the world.

According to Databricks: “[Apache Spark] has quickly become the largest open source community in big data, with over 1000 contributors from 250+ organizations” which make it one of the most used open-source projects of the world.

But why? Let us start by looking at what it is.

Apache Spark is an open-source, distributed processing system used for big data workloads that utilizes in-memory caching and optimized query execution for fast analytic queries against data of any size, in short: it combines a computer system and distributes it in a simple way. For example: if we have to process a big quantity of data and you have 10 computers with Apache Spark installed in each, in that moment each computer will be processing a tenth part of your data resulting in faster processing times that change the game.

Remember: in the Big Data business timing is essential.

Recent Posts

Leave a Comment

Start typing and press Enter to search