Reliably utilizing Spark, S3 and Parquet: Everybody says ‘I love you’; not sure they know what that entails

Posts over posts have been written about the wonders of Spark and Parquet. How one can simply save the RDD/Dataframes in parquet format into HDFS or S3. In many cases the job output is persisted to HDFS volumes that are located on the same machines in the Spark cluster. However, HDFS come with a price: Disk volume resources…

Reporting Kafka Offsets To Datadog with Quantifind’s KafkaOffsetMonitor

There are multiple solutions on reporting Kafka offset information, such as Kafka distribution’s ConsumerOffsetChecker or Linkedin’s Burrow (which is more advanced and has an alerting and recovery built-in logic). In my company we were already using Quantifind’s KafkaOffsetMonitor. It’s a very useful visualization tool for Kafka topic and their consumers. It visualizes the relation between…