Spark Structured Streaming - Introduction (1/3)

A brief introduction to Spark Structured Streaming

Pavan Kulkarni

10 minute read

Structured Streaming is a new of looking at realtime streaming. With abstraction on DataFrame and DataSets, structured streaming provides alternative for the well known Spark Streaming. Structured Streaming is built on top of Spark SQL Engine. Some of the main features of Structured Streaming are -

Detailed Guide to Setting up Scalable Apache Spark Infrastructure on Docker - Standalone Cluster With History Server

This post is a complete guide to build a scalable Apache Spark on using Dockers. We will see how to enable History Servers for log persistence.

Pavan Kulkarni

10 minute read

This post is a complete guide to build a scalable Apache Spark on using Dockers. We will see how to enable History Servers for log persistence. To be able to scale up and down is one of the key requirements of today’s distributed infrastructure. By the end of this guide, you should have pretty fair understanding of setting up Apache Spark on Docker and we will see how to run a sample program.

Spark - Cassandra Data Processing (Scala)

Faster data processing from Cassandra by leveraging Apache Spark's in-memory and distributed processing powers

Pavan Kulkarni

38 minute read

We will look into basic details of how to process data from Cassandra using Apache Spark. Data Processing from a NoSQL DB is very efficient when we use a distributed processing system like Spark in Scala