HaVeDa - A Backend Development Blog | All kinds of things about my development stack- Java, Spring, Spark, MySQL and more!

Builders and JSON – A Lombok and Jackson Love Story

March 2, 2020March 3, 2020 ywilkofLeave a comment

I’m a big Lombok fan. I have been using it in every project since I discovered it. One of my favorite usages is the auto generation of the builder class enabling the builder pattern. In my opinion, builders make code more readable and fluent as well as simplifying the simplifying initialization of scenarios for testeded class. However, when it came…

Druid, what the diff? Using Apache Druid to report top changes in a metric between two intervals

September 9, 2018September 12, 2018 ywilkofLeave a comment

At our company we are using Druid to power our time-based (as well as our non time-based) analytics dashboard. We looked into several options such as a relational database (Postgres) , Clickhouse and Elasticsearch. Taking into account the pros and cons of each, we finally decided for Druid, as it was the leader in most query benchmarks. In fact,…

Reliably utilizing Spark, S3 and Parquet: Everybody says ‘I love you’; not sure they know what that entails

October 29, 2017October 30, 2017 ywilkof5 Comments

Posts over posts have been written about the wonders of Spark and Parquet. How one can simply save the RDD/Dataframes in parquet format into HDFS or S3. In many cases the job output is persisted to HDFS volumes that are located on the same machines in the Spark cluster. However, HDFS come with a price: Disk volume resources…

Out of the Middle Ages: Use S3a File System with Spark (2.x), Hadoop (2.7.x) and AWS SDK (>1.7.4)

August 13, 2017August 15, 2017 ywilkof1 Comment

Much had been said about the hardships entailed in combining of Apache Spark, Hadoop libraries and Amazon’s AWS SDK. Take as an example reading from S3 Storage using s3a:// file system. If you’ve tried once this setup, then you know it is not a straight forward task. While it seems like this should work out…

Reporting Kafka Offsets To Datadog with Quantifind’s KafkaOffsetMonitor

May 14, 2017May 14, 2017 ywilkofLeave a comment

There are multiple solutions on reporting Kafka offset information, such as Kafka distribution’s ConsumerOffsetChecker or Linkedin’s Burrow (which is more advanced and has an alerting and recovery built-in logic). In my company we were already using Quantifind’s KafkaOffsetMonitor. It’s a very useful visualization tool for Kafka topic and their consumers. It visualizes the relation between…

Combining Spring Boot and Datadog StatsD Writer

September 18, 2016February 7, 2017 ywilkof3 Comments

Are you monitoring your Spring Boot app? Spring Boot comes with the actuator package, which brings a lot of tools making application monitoring in production easier. It comes with an interesting metric export reader/writer feature, which help in collection of metrics and the publishing to a desired output channel. Out of the box Actuator supports writers to…

MySQL, UTF-8 encoding and you!

April 8, 2016September 18, 2016 ywilkofLeave a comment

Since MySQL 5.5, the world can enjoy the amazing world of non latin characters. Time to party! …Or is it? One may think that just by setting the charset and collation of the database, he is covered. That may sometimes be right, but not always. Let’s review the steps for resilient character persisting: Step one: Set…

Using MySQL tables for aggregations : lnitialize your counters before you go-go

February 22, 2016March 20, 2016 ywilkofLeave a comment

So far I had a positive experience using MySQL for holding results of aggregations. Firstly, it provides flexible and efficient indexing on the dimension columns. For me, it was a great improvement over Cassandra’s in-built counter tables (we were using v2.0.7), which do not offer much flexibility on querying dimensions independently due to its key based partitioning . Secondly, MySQL provides…

Propagating environment variables when using Spark’s internal REST API to submit jobs

January 30, 2016February 22, 2016 ywilkofLeave a comment

In an earlier post I introduced my client that wraps the calls to Spark’s REST API to submit jobs, instead of using the spark-submit script. This REST API, while not officially in the Spark documentation, works just fine and almost the same way as the spark-submit script and we are even using it in production. Lately,…

Win over Spark distribution’s dependency conflicts with SBT shading

December 27, 2015April 10, 2016 ywilkof2 Comments

In our production environment we are currently using a spark cluster in standalone mode (version 1.5.2). We reached a point when it was necessary to add the Amazon S3 Java SDK (1.10.39), following the advice given in this post. What should have been an easy task proved to be problematic – the Jackson dependency for…