Hi, welcome to the second of our Big Data Top Trumps series. We will see how SAP solutions and Open Source solutions stack up in the enterprise, where one may make sense over another, and also where it might make sense to use both. Next up is Scalability.
NoSQL (‘Not only SQL’) solutions are massively horizontally scalable; more machines can be added into the cluster, rather than adding RAM, disk or CPU to the typical RDBMS server machine. The whole paradigm came into being specifically to solve the scalability problem, so as you’d hope, it works pretty well. The proliferation of Open Source solutions also provides a highly cost-effective means of scaling as required. The self-describing nature of typical NoSQL data formats, such as JSON and XML mean that the data are easily ported between box, rack, data centre or region.
Hadoop is also NoSQL; it stores data in key/value pairs, which removes the need for data to adhere to a pre-defined schema. Other flavours of NoSQL are document based-storage like Couchbase; this is what is commonly termed ‘semi-structured data’ and encompasses JSON and XML. Columnar databases such as Cassandra work on the premise that it is more efficient to manage data in columns than in row format. In a large relational system, I may be retrieving lots of unneeded data in a row, so taking the columnar approach improves efficiency. Graph databases like Neo4j operate on Nodes and Edges and perform well when dealing with entities and relationships, such as Person A knows Person B, who knows Person C, or Person A bought Product C and Product D, so might like Product E.
From a SAP perspective, this is a question of business need. For sure, there are organisations like Google that are operating a massive number of operations and collecting huge amounts of data. But, if my business sells biscuits there is a real question about how “scalable” the data needs to be. Even if I want to include smart data from manufacturing equipment (packets of IoT data etc.) this can absolutely be catered for within the SAP product portfolio. For companies that are looking to capture Petabytes of data then a co-existence strategy with say SAP and Hadoop technology stacks would make absolute sense.
Before hitting Hadoop-scale data, SAP does support both scale-up (adding more machine power) and scale-out (adding more machines) architecture across both their SAP HANA and SAP IQ database technology platforms. The decision on which architecture to follow does depend on the use case, and scale-out does get over the limitations of hardware. So it really does come down to what you are trying to do, with transactional systems favouring the scale-up approach and the scale-out more suited to analytics. As was discussed in the previous blog in this series SAP have taken the strategic decision to port all of their Enterprise solutions to the SAP HANA platform and remove support for all non-HANA based databases from 2025.
The overall cost of HANA appliances is also a consideration when looking at the architecture (though this is less when thinking cloud), so when you are getting to the scale-out option it is worth considering that co-existence strategy with Hadoop, and also near-line storage via SAP IQ. The Hadoop / HANA coexistence strategy is also very promising, with SAP VORA providing the ability for your Hadoop/Spark cluster to natively integrate with HANA, or for HANA Smart Data Access talking to Hadoop; the direction all being about where your analytics sits (on Hadoop or HANA, or potentially on both depending on the user).
In summary, both SAP NoSQL solutions scale well, and the Use Case(s) should drive the decision; you’ll also want to think about cost, so look out for the next blog…