Machine Learning on Steroids: Looking Beyond the Clouds
6 SEPTEMBER 2018
In 1642, Blaise Pascal, a French teenager, invented the first calculator to help his tax collector father do his job more efficiently. Ever since, the world has been creating tools for capturing and analyzing more and more information. Today, there is no denying that the world has gone digital. We are in the age of machine learning and big data analytics, and it’s revolutionizing practically everything. In fact, according to Forbes Magazine, there are 2.5 quintillion bytes of data created each day. This growing mountain of data requires a never-ending reinvention of tools and processes, as we strive for new ways to manage and extract value from it all.
In the history of analytics and business intelligence, the three necessary resources have always been: data, subject matter expertise, and computing power. Today’s communication service providers have an abundance of data and subject matter experts. What’s lagging is the computing power. With machine learning, too much data paired with too little computing power equals extremely long training sessions. Enormous mountains of data are available, but there isn’t enough time to process and learn all the insights buried inside. This leads to a paradox. In principle, more data means machine learning algorithms can be more complex, and more valuable. But in practice, simpler classifiers end up being used, because complex ones take too long to learn. This means valuable information and insights are being left on the table, due to a computing power bottleneck.
It’s always been our belief that the ability to run more advanced queries in less time would open the door to an untapped wealth of valuable new insights. To tackle this challenge, TEOCO continually explores new technologies to support our business analysts and data scientists, and their most taxing machine learning algorithms.
In the age of cloud-based computing, with tools like Hadoop and massive data lakes, this may surprise some people, but database appliances are more important than ever. A database appliance is a prepackaged or pre-configured, balanced set of hardware and database management software. While data lakes may offer a rich source of information for data scientists and other consumers of big data, machine learning applications also occasionally need well-integrated, systematically cleansed, easy-to-access relational data, which can best be obtained from a database appliance. A “big data” solution is not limited to using Hadoop-related technologies – but may be best suited by a combination of Hadoop and relational technologies and tools. This has been our approach at TEOCO. What’s become evident, however, is that times have changed, and the traditional database appliance needs an update.
As Eric Burgener, research director for storage at IDC, points out, by 2020, 60-70% of Fortune 2000 companies will have at least one real-time — as opposed to batch-oriented — big data analytics workload that they consider mission-critical. This is a significant departure from the past, when very few big data analytics platforms were considered mission-critical. As service providers contend with the never-ending growth of data and their quest for information and insight, they need to rethink how data is captured, stored, accessed, and analyzed. Fortunately, necessity continues to be the mother of invention, and we believe that invention can be found by following the Yellow Brick Road.
The Database Appliance – Reinvented
At TEOCO we process billions of records each day for our customers, from upwards of 50 different sources, deployed across multiple applications. More and more of our customers are requesting real- time analytics. To support this, our analytics solutions require powerful big data analytics platforms, so we are always keeping an eye out for the latest tools and technologies. Lately, we’ve been exploring data analytics utilizing flash memory, or NVMe (non-volatile memory express), where data moves directly from flash memory to the central processing unit. This is a data storage protocol created to speed up the transfer of data. Thanks to an architecture that utilizes direct flash queries, it also has a smaller footprint than typically required, with performance speeds that are faster than both traditional and cloud-based systems.
TEOCO Selects Yellowbrick Data
This July TEOCO announced it selected Yellowbrick Data as a key technology choice for its SmartHub Analytics platform. The Yellowbrick Database Appliance is a massively parallel processing analytic database designed to run exclusively on NVMe flash drives. In fact, it is the first analytic database built and optimized for flash memory from the ground up. Why is that important? Because retrofitting a data warehouse with high-performance SSD doesn’t really provide much improvement. As Yellowbrick Data CEO and co-founder Neil Carson explains, “People took these database platforms that were running data warehouses and stuck high-performance SSDs on them. Since flash is denser, smaller, faster, and lighter than hard disk drives, they were hoping this would make their data warehouses ten to a hundred times faster. But it doesn’t work like that. At the end of the day, the core architecture of a traditional database is built around a spinning disk.”
Yellowbrick uses an all-memory architecture, which means that data moves directly from flash memory to the central processing unit. Its modular design allows customers to scale up to petabytes of data by adding analytic nodes on the fly. Designed specifically for high-level data ingest and processing, customers can run mixed workloads, including ad-hoc queries, large batch queries, business reports, ETL processes and OBDC inserts – all simultaneously, without delay.
Another key advantage is that it delivers unprecedented analytic speed, but with a much smaller hardware footprint – 97% smaller, and it utilizes PostgresSQL to ensure a more seamless integration. Yellowbrick is the first data appliance we’ve found that is capable of ingesting and analyzing petabytes of operational and historic data at such high speeds. So, why does all this matter? Basically, it allows our business analysts and data scientists to do more types of analysis, much faster than before. By running more complex algorithms and tests in less time, we can explore new areas of deep learning that in the past were impossible – due to the limitations of yesterday’s technology. And amazingly, we can do all of this with a hardware footprint that is just a fraction of what this level of computing power would traditionally require. We’re pretty excited about it – and we think you will be too.
For more information on how TEOCO and Yellowbrick are working together to expand the frontiers of machine learning and data analysis for our customers, view the testimonial video here.