Spark Structured Streaming with Blockchain Streams

Mageswaran D
4 min readMay 8, 2018

--

For people who just wanted some working prototype, hit my Git repo @

In recent times, if you are into IT world you can’t keep escape from hearing about Blockchains!

Here is the curated list of curated list of Awesome Blockchain links. Rest is up to you figure it out yourself ;)

Okay, so why we are talking about Blockchain when the title says Spark Structured Streaming? If you are guessing it , you are right!

Lets interface two of Blockchain streams with Spark Structured Streaming and see how that goes.

When any newbie comes into Streaming world, the first few keywords that pops are Kafka, Flume, Akka, Spark Streaming, Nifi, Sqoop name a few. Which includes interfacing the streams to read/write, a processing framework and an infrastructure to support storing the data and downstream analysis.

After a couple of Google hits and filtering, I ended up using Akka, Kafka and Spark Streaming framework

Whats the exercise?

A simple SBT based Scala project was put in place to explore the new Spark Structured Streaming APIs @ https://github.com/dhiraa/blockchain-streaming . The Git README covers how to run the example codes.

We are going to use two Binance streams using Websockets:

  • wss://stream.binance.com:9443/ws/xvgbtc@trade
  • wss://stream.binance.com:9443/ws/btcusdt@trade

You can test the stream using the websocket.org site, ​https://www.websocket.org/echo.html​. Give it a shot to see
what the data looks like!

And if you are wondering what is the schema of the streams hit below link, we are using the Trade schema.

What we wanted to calculate?

We now have two streams XVG/BTC and BTC/USDT and we wanted to calculate volume-weighted average price (VWAP) of XVG/USDT over a period of time and standard deviation of calculated VWAP. That is VWAP calculation is stateless, where as Std. deviation is with previous state information.

Wondering what is XVG/BTC?

XVG is what you wanted to buy and BTC is the commodity you pay to get XVG. Replace XVG with APPLE and BTC with CAD, with that you get APPLE/CAD, which implies you trading APPLE shares in terms of Canadian Dollars, to buy 10 shares of APPLE you have to pay some 100 CAD dollars, something like that!

Code Exploration (intermediate level):

Schema/Protocol:

Schema.scala captures the schema of the stream.

Initially Scala Json libraries were used and found to have some problems, so I had to go with some rude way of parsing the Json incoming stream.

Akka Binance Producer:

BinanceProducer.scala; Once again Akka is a black magic box! Shiny stuff , hard know how things work together.

Same goes here lucky I am I got some reference to start with and able to put things together to pull data from Binance streams.

So now we can get our streaming data. What next? processing off course!

Spark Streaming:

Wait you said structured streaming? We will reach there in a moment

Lets see how things work together with old tools.

Streaming.scala covers the whole logic of stateless and stateful calculations.

Source code has most of the explanation after each transformation.

One catch would be “updateStateByKey”. Consider it to be an recursive function which is called with new data and previous output of itself.

Now we know our streaming works!

Structured Streaming

StructuredStreaming.scala covers the exploration on the new APIs.

As the title goes, I restrict this post for doers than explaining how stuff works (which may be my next post).

Here we have explored how to:

Spark Structured Streaming has some limitaions like no support for multiple aggregation on streams.

To imitate the Streaming logic in Structured Streaming we need to have multiple aggregations, but wait we don't have support for it!

What to do?

Binance Stream → Akka → Kafka → Structured Streaming → First Aggregation → Out put to Kafka StreamRead in Structured Streaming Apply second aggregation (for you to try!!!)

Sounds good? May not be! But who cares we are not putting this in production any ways right :)

Thanks for reading.

Mageswaran.D (https://www.linkedin.com/in/mageswaran1989/)

Principal Engineer, Imaginea

--

--