Daniel Cook

Bitcat Software

Workshop AbstractWe’ve long assumed that in order to move money from one account to another reliably that we need to employ transactions to wrap the debit from one account and the credit to another.  The blockchain has challenged this thinking, recording all transactions on a ledger in an append only manner.  Let’s take the ideas of Bitcoin but apply them with the original append only king Apache Hadoop to build a bank that can scale its processing to millions of transactions per second and can store petabytes of data.  And for good measure we’ll look at Apache Kafka, the Hadoop Distributed File System, Spark and Accumulo along the way.

What you’ll learn from this workshop:

  •  A basic understanding of the blockchain, how transactions are structured on the blockchain and the concept of unspent transaction output.
  • How to subscribe to a high throughput stream of messages from an Apache Kafka topic, technology often used in ETL pipelines.
  • The concepts of Bigtable storage for petabyte scale databases with efficient random access.  We’ll utilise Apache Accumulo to store the blockchain in the workshop.
  • How to perform analytics over a massive data warehouse, in our case leveraging Apache Spark to analyse the monetary transactions.What you’ll need:
    • A Linux based laptop or Macbook.
    • Coding will be in Java so the latest JDK installed and an IDE such as IntelliJ or Eclipse.

     

  • Happiness programming in Java to at least a basic level.
Bio: Dan Cook – Bitcat Software

Dan is a Big Data consultant operating predominantly as a Technical Architect. He has led the development of the UK Hydrographic Office’s Hadoop-as-a-Service offering and built software streaming frameworks long before the rise of Apache Spark and Storm.  More recently he has helped build a real time alerting pipeline that processed in excess of 150,000 events per second utilising Apache Kafka and Spark Streaming.