So, you’ve heard the buzz about Big Data…
Brian White
By Brian White
Big Data Blog v2.png

You’ve read a few articles about it. Maybe you’re already on the Big Data journey and have adopted some of the wide array of technologies. However, for many people, there is still a lot of myth and rumour around Big Data. Why is that?

One of the reasons is this: I’ve read more articles about Big Data than I care to remember and the overriding theme for the vast majority is theory and concept. I like to deal in facts, experience and empirical data and, having worked on six Big Data projects, have my share of battle scars. In this blog series, I will pass on some of that insight and provide some actual recommendations, not just hypothesis.


Questions, questions, and more questions

The sensible place to start for any Big Data implementation is by asking questions, and primarily questions about your business, not just the technology. What are the business outcomes that you want to achieve? What will differentiate you from your competitors? What is the one question about your business that you would like to answer? Having this enquiring mindset is a good place to start.


Once you’ve asked yourself these questions, it’s time to consider the IT aspect. There are so many different Big Data tools out there, many of them performing similar functions and new ones being introduced at breakneck speed. That makes it difficult to navigate the landscape, let alone actually decide which of these tools to adopt. When I think back to my days as a developer, the complexity of getting Javascript to work for both Internet Explorer and Netscape Navigator seems laughably simple in comparison!


Do you opt for Cloud or On-premise? Or hybrid? If you go Cloud, is it Amazon Web Services or Microsoft Azure? The mainstay of all things Big Data is Hadoop, so do you opt for a commercial distribution? That’s one of the easier questions to answer, and the answer is ‘Yes’. Nobody installs Hadoop directly from the Apache binaries any more, as several engineers told me whilst looking aghast, when I’d tried (unsuccessfully) to do it.


OK, you’ve decided it’s Hadoop, so is it Hortonworks, Cloudera or MapR? Do you run Spark on top of it? And what about NoSQL? Is it Cassandra, Couchbase, MongoDb or Hbase? OK, you get the picture.


Be outcome-led, not technology-led

The point is that there is no off-the-shelf architecture to be had for a Big Data implementation. Primarily, it depends on the specific business outcomes that you want to achieve, and which Use Cases then fit within those desired outcomes. Be outcome-led, not technology-led.


So, once you’ve defined your target architecture, what then? Again, there’s no silver bullet in all this. Whilst it’s easy to spin up these environments to run a Proof of Concept, the real challenges come later. There are at least two commonly recognised pain points; they are data ingestion and… well, I’ll tell you about the other one next time.


One of the main reasons for the difficulties encountered with data ingestion is data governance, or more accurately, the lack of it. To some, the concept of a Data Lake is synonymous with a Data Dump; let’s just chuck everything in there and worry about the other stuff later. The sooner you can embed things like data classification, taxonomy, security and audit trail into your processes the better. I for one will be watching the Apache Atlas project with interest, which seeks to address some of these issues.


In summary, don’t make the mistake of thinking that Big Data/a Data Lake will magically solve all your data quality issues. However, done correctly, it will provide new and exciting insight for your organisation, and can serve as a catalyst to improve overall data quality at source.


Next time I will be talking about that second Big Data project pitfall…


New Call-to-action

Brian White
By Brian White

Brian heads up AgilityWork’s Big Data practice. He started his career as a Lotus Notes developer and is currently studying for a Master’s Degree in Big Data Analytics. Brian enjoys the incredible pace of technology change, helping our clients implement new ways of storing, managing and gaining insight from their data. Outside of work, his two young sons keep him busy, so he likes to keep fit. He’s completed the Iron Man triathlon (and has the “M-Dot” tattoo to prove it!)

Follow us on Social

And get access to even more digital insights


Just want to talk, call us on:
+44 (0)844 5610930

Join us

Life doesn't wait and neither should you. If you want to join a bunch of people intent on changing the world, you've come to the right place.


Latest opportunities