Are you in the boat where you are trying to build a truly Scalable, Self-service Data Science platform with the integration of R and Netezza?
If so, then this SHARE strategy will let you plan and keep yourself focused to make this initiative a success.
S – Situation, H – Hindrance
A – Action, R – Result
S – Situation
The last decade saw a rapid advancement in the development of cost effective distributed computing and parallel processing technologies. This has enabled us to split massive data sets into smaller chunks and process them on several computing machines in parallel. This has opened new corridors of opportunity in Data Analytics. Running CPU hungry, resource crunching Predictive Analytics and Machine Learning algorithms is no longer a luxury of few corporations, but a practical possibility for majority. The advancement of Cloud technology and the integration of Big Data computing with Cloud has made it feasible for even an individual to instantly spin up a cluster and run complex resource hungry algorithms for only a few dollars!
This turning point in technology makes it imperative for bigger corporations to ensure, to be competitive in the industry, that their Data Analytics and Computing methodology is based on principals that will become computing norms of the future. It becomes a vital necessity that the computing architectures we design today are built to withstand the challenges of the future. Hence it is easy to explain why there has been a tremendous push in the IT landscape to build Data Analytics solutions that integrate Open Source technologies like R and python with distributed, parallel processing and cloud technologies.
Action – A
Your project is driven by this exact need and the motivation to be one of the early players in this initiative. Being able to integrate Open Source R with Netezza will give you the opportunity to build massive distributed and parallel processing architectures that would form the processing backbone for Data Analytics products. You would be building Data Analytics products that are built on solid computing infrastructures that would withstand the test of time and made to be able to handle the challenges of the future.
Hindrance – H
A project of with nature would naturally come with many challenges. Since the project will most likely be born out of our own initiative and motivation, you would typically not have direct funding nor the ability to devote regular and assigned hours in this initiative. You have to be ready to invest our time in this initiative after having met your commitments on other projects.
Many hours after work need to be spent, setting up the environments for the POC, understanding maintenance and operational considerations to study the practicality of this initiative. You have to give it time and a lot of patience before support from our teams and Management will pour in.
Be prepared to receive a lot of push-back, mainly in terms of certification and compliance concerns of the Netezza and R open source packages. The fear of uncharted territories will make many analysts try to steer you away from this initiative. At this stage, try to use this technology to prove its value in only a few selected data science projects first, instead of trying to convince your clients to use this across the board.
At this phase, always remember the below mantra
“The success of tomorrows Big Data and Analytics products will be determined by how smartly we can integrate open source technologies like Python and R with distributed, cluster and cloud technologies.”
There is a very need of innovation and creativity in this space and your project embraces this vision. You are using innovative ideas and technology as well as creative methods to give a form to this vision. The other thing that makes your project unique is collaboration. Different individuals from different backgrounds and Business Lines are having to come together to make this project a reality. The database engineering, design and programming skills is beautifully complementing the statistical, mathematical and analytics skill sets to make this project a success.
Result – R
If you have still not given up, you will soon start seeing a tremendous commercial impact of your project.
- Being able to push down R code and run the analytics inside the database engine frees your organization from having to invest heavily on other data analytics tools and applications. This directly translates to cost savings.
- Your Organization can save on thousands of dollars of licensing cost even if you move a few of today’s data analytics processes to Open R and Netezza.
- Your organization can achieve a more self -service model and analysts can perform Visualizations, Data Engineering, Model Building, Prediction and Model Accuracy operations from a single R studio interface instead of having to rely on multiple tools and applications. This will contribute directly to increasing productivity.
- Being able to push down analytics on a massive parallel processing environment will reduce processing time drastically. This would give analysts the luxury of working with full data sets if required and would enable them to perform model building processes several times to come up with the most accurate models for the specific use case. This relates to a direct reduction of database resources and savings in processing time while increasing the potential of more data discovery.
- This is a truly scaleable architecture. As processing needs grow, you can achieve scale out by adding more Netezza processing nodes.
Evaluate – E
This project is using some of the most reputable, reliable and disruptive technologies in the marketplace. The data processing backbone is built on Netezza, which is a specialized data ware housing and analytics appliance. This comes with a robust collection of various data engineering and analytics libraries. These libraries form the application building blocks in your project. The use of open source R provides accessibility to R’s robust analytical libraries and graphical capabilities. While your project embraces innovation and creativity, it continues to have strong foundations that comes from the use of the most reliable technologies.
The Netezza and R initiative is a Global Initiative – Technology without Borders
This project does not cater to the needs of any specific business line or department. Since your goal is to develop a smart and innovative data analytics processing infrastructure built to withstand the challenges of the future, any data analytics teams in your Organization using Netezza can benefit from this project.
By this time you have hopefully also realized how you can port this technology very easily on other similar architecture, like Juypter and Apache Spark with MLLIB, Apache Spark R etc.