Feed: Yellowbrick Data.
Author: Joshua Miner
;
Yellowbrick recently hosted a webcast discussing how Yellowbrick offloads overburdened legacy data warehouses to extend their life, reduce costs, and increase the performance of your analytic environment. Read on for a summary of the webcast.
Market drivers
Legacy data warehouses struggle to meet modern demands, including:
- supporting more internal and external users
- providing simultaneous services for:
- new analytic applications
- ad hoc BI queries
- operational dashboards
- predictive analytics over deep historical data, and real-time streaming of IoT and OLTP workloads
- curbing costs through deployment flexibility and consolidation
Yellowbrick Data Warehouse overview and landscape
Yellowbrick Data helps enterprises meet today’s analytic and data warehouse challenges with the following features:
- Always on and available
- Ad hoc SQL queries that do not impact operational workloads
- Correct answers on any schema
- Easily scales to petabytes of capacity in a compact footprint
- Fast performance for simultaneous, mixed workloads, including real-time inserts, batch jobs, and interactive applications
- Support for thousands of concurrent users to add analysts and discover new optimizations
The chart below illustrates the current data warehouse landscape, sorting the primary vendors by different approaches:
Data warehousing approaches
Some vendors offer simplicity, with a scale up data warehouse in a single server. Other vendors deliver more performance and capacity than single-server solutions by scaling out many servers in a parallel processing architecture. Many single-server customers migrate to the scale out solutions once their data volume exceeds the capabilities of the single-server solutions. Yellowbrick Data is currently the only solution on the market that offers organizations a data warehouse that delivers high performance that scales to petabytes of capacity. Notably, it is available either in the cloud or a compact 6U on-premises system.
Data management evolution
The webcast discusses how data management has evolved in recent years.
- Enterprises began by consolidating key application data sets into a data warehouse in a server, where they could run analytics on the data.
- Companies soon wanted insights from internet data. However, the volume of this data exceeded the capabilities of single-server solutions. This led to the creation of the data lake. Solutions like MapReduce emerged to meet these needs.
- Because data lake technologies are challenging to work with, particularly for business users, SQL abstraction technologies, like Hive and Impala, emerged. While SQL-as-a-layer technologies made Big Data more user-friendly, they slow performance, and limited SQL surface area makes them unacceptable in many of today’s competitive environments.
Yellowbrick provides a modern architecture for scalable SQL analytics
As the slide above illustrates, platforms like Yellowbrick Data give enterprises a solution to this problem. Enterprises can move high value data to Yellowbrick, giving these workloads access to sophisticated SQL analytics and the highest possible performance, while easing resource contention in their data lake to ensure that the rest of the business also runs at top speed.
Integration recommendations
Yellowbrick is compatible with the PostgreSQL dialect providing easy ecosystem connectivity. Installation, deployment, and integration is fast and simple in almost any existing environment.
Some common integration cases for Yellowbrick, include:
Loading
- Data ingest directly from SQL applications, via real-time streams from Kafka, or transformations with Spark.
- Bulk loading. Using common connectors like ODBC, JDBC, and ADO.NET or the Yellowbrick 1 GB/s YBLOAD tool.
- Load and transform with Informatica, Attunity, Talend, Syncsort, and Spark ETL.
Presentation
- Interactive applications. Yellowbrick enables organizations to build new analytical applications.
- Powerful BI analytics. Organizations can perform ad hoc BI queries from applications like MicroStrategy, Tableau, and Business Objects and support many more users without increasing infrastructure footprint.
- Business critical reporting. Build prioritized responses and multi-department support with workload management.
- Data mining with SAS, R, and Python.
When you should consider Yellowbrick
The webcast provides recommendations about when customers should consider using Yellowbrick, depending on their current environment.
Environment type | When to consider Yellowbrick | Benefits of Yellowbrick |
Single server |
|
|
MPP |
|
|
Pre-configured systems (Oracle or SAP) |
|
|
Cloud-only |
|
|
A customer example
Symphony RetailAI told us that: “[Query] performance improvements we saw were from 3x to 10x…basically running them as is.” They also noted how the ANSI SQL standard architecture of Yellowbrick simple to learn: “We had six engineers touch the system and all found it very easy to use because there was a lot of commonality with the existing system we already had.”
Security application demonstration
The demonstration illustrates analytics on a Netflow dataset captured over a six-month period to detect intrusion.
The demonstration illustrates the following Yellowbrick capabilities:
- Integration with BI applications like Tableau and MicroStrategy.
- Fast performance: the demonstration shows results of a query that mapped protocols by frequency in Tableau. The query required a table scan of 8-billion rows and an inner join across 17-billion rows to match protocols to ports. It completed in just 11.5 seconds.
- High concurrency: JMeter runs hundreds of simultaneous queries against the system without slowdown.
Importantly, this is all possible in an efficient solution that can be deployed in your data center or the cloud.
You can view the 23-minute on-demand webcast here.