AnalyticOps
Legacy content from the Open Data Group website.
What is AnalyticOps?
As data science investments expand and develop, expectations on their insights are also growing. However, most organizations lack the organization and operational tools to maximize benefits from these investments. AnalyticOps is in essence the lynchpin of an organization that connects data science, IT, and business teams. With AnalyticOps, companies are equipped to achieve the best ROI for their data science investments.
Benefits
- Clearly defined organizational responsibility for data science deployments
- Organizational efficiency, allowing data science and IT teams to focus on their own responsibilities
- Establish safe, repeatable processes for managing data science assets
- See greater returns on data science investments
Core Responsibilities
- Simplify the model deployment process and make deployment independent of model format and data stream types
- Compute scale up and out with minimal disruption to the flow of business
- Manages data science model lifecycles
- Ensure model quality on live streams of data is satisfactory
- Assess total model costs based on accuracy and computational resources
Tools
An Analytic Engine is integrated into a system where it runs new and updated analytic models into operational workflows.
Analytic Engines streamline the AnalyticOps role, organizing all its capabilities for maximum value.
AnalyticOps: Part 1 - What is an Analytic Anyway?
Or is an “analytic” just a fancy term for a “business rule”?
Over the next few sections, I am going to tackle the subject of AnalyticOps, a relatively new function that organizations are going to have to implement and master to ensure maximum ROI on data science investments. I’ll start with some basics of defining just what is “an analytic” and move through some key elements of implementing the tools (an Analytic Engine) and the competencies that make up AnalyticOps.
To speak of "an analytic" is jargon in the context of modern information processing systems. Perhaps second only to “big data” in its over-use, “analytics” are pervasive on the minds of investors, executives and the layman alike. So let's define the jargon. I always suggest folks keep in mind that jargon is neither reality nor rigorous, so I like to put jargon in double quotes to point out that I’m not making a definition for you to argue with. With that said:
"An analytic" is a "process", "algorithm", or "technique" which is generally "mathematical" in essence which takes "data" or "information" as an input and outputs "actionable insights". Whoa that is a definition of jargon terminology that is full of jargon terminology. As is often the case, we see people using misunderstood words to describe further misunderstood concepts. Let's try a couple of different ways to outline a reasonable definition of "an analytic" by describing some typical properties, including what "an analytic" is NOT so much like, and providing a few possible examples.
First we should describe a few typical properties of "an analytic". Generally an organization needs "an analytic" when it wants to apply a fairly rigorous and generally mathematical "analytical technique" to a tangible real world problem or set of observations. Let’s take a concrete example. What if you wanted to predict "reasonably well" where a pumpkin shot out of a cannon (e.g. a pumpkin chunker https://www.punkinchunkin.com/) was going to land during the national competition in Arkansas? You’d start by transforming the physical space into X,Y,Z coordinates and apply some high school math to initial conditions and output your guess. Almost all "analytics" you'll run into in modern information processing follow the same steps of this simple example, so let's break it down a bit.
As an aside, most "analytics" you'll run into have an interesting history full of controversy, treachery, and triumph and were developed over time by groups of smart scientists, mathematicians, and/or data scientists. Most of us regular Joes will spend our time "deploying" or "applying" analytics rather them discovering, creating, and/or inventing them. If you happen to know or run into the folks responsible for developing the analytics you are using, thank them and tell them how you are using it...you might be surprised how happy they are to hear that their hard work is useful. Asking them questions like "does everyone agree on your method?" might bring fascination stories.
In the example above, "the analytic" is the well known math from high school physics which takes force, acceleration, mass, and some squared terms and outputs a predicted path. Even though we have an "an analytic" in our hands for pumpkin impact location predictions, there are still a few steps to "applying" or "deploying" our analytic at the Arkansas championship! First is the mapping of our real world space and governing forces in that big windy field in Arkansas to the idealized X,Y,Z coordinates, force metrics, etc. that our "analytic" requires to "compute" a guess or "prediction" in a repeatable way that gives us reasonably good confidence. When "deploying" or "applying" "an analytic", this process of mapping or transforming the tangible problem into the idealized world of the "analytic" is often called feature creation, featurization, feature space transformation, and probably twenty other jargon terms. The key point is that we are changing our initial tangible inputs from the specific situation to an idealized, usually, numeric "feature space" where the analytic can do its analytic duty in the most general way without getting bogged down in unimportant or prohibitively messy details.
This "feature space" is often very non intuitive and of high dimension, and therefore disorienting to those of us who did not actually develop the analytic. Our pumpkin example requires a math aptitude of at least geometry (X, Y, Z coordinates) to understand the feature space. Many modern information processing analytics are in very high dimensions (sometimes millions or more) and require some decent statistical background to truly understand. So, as you can see, one key property of most analytics is that they are fairly abstract. This disorienting abstractness really becomes of a sticking point between people who develop analytics and real world practitioners who have a job to do! "I have to chuck this pumpkin! Stop talking to me about vector spaces!" It should also be noted that the process of getting to the more generalized "feature space" from the all the dirty details of the specific problem such that the method is useful, is itself an analytic problem. We'll cover more of that detail in a later installment. For now, we are looking at the big picture and want to simply point out that an analytic "operates" in an idealized, usually numeric, often of high dimension, mathematical space; and when we deploy an analytic we need to deal with this.
Finally, once "the analytic" predicts a result, in our case the likely location of the pumpkin impact, we need to "transform" the answer back into the real farmland in Arkansas so we can be confident of building a winning chunker. We have to account for these "input/output" boundaries that do these "transforms". Whether these "transforms" are considered part of the "analytic“ or not is a matter of taste, debate, and food fights... For our purposes we will abstract that out and say that "feature transforms" are "somewhat different" from "an analytic" or at the very least call them "analytics to be applied to the data earlier or later in the analytic pipeline".
Now we can consider the defining steps to achieve value from “an analytic”:
"specific inputs" -> "feature transforms" -> "analytics" -> "inverse transforms" -> "actionable outputs".
So how is that different than "business rules"? Analytics and Business Rules looks strikingly similar in my experience, however, there are core elements that differentiate them and must be managed.
Business rules tend to be more human understandable and more directly embedded into the specific data formats or information processing software used by a business. Not all "business rules" make sense as "analytics", but many "analytics" map into useful "business rules" as "actionable insights".
Notice that in this definition "analytics" or "an analytic" is a fairly abstract concept similar to "information" and might need to be managed as an abstraction in the way that "information" can be abstractly managed by "a database". For some of you youngsters, you might not realize that there was a time not too long ago when "databases" didn't exist. Computer people just mixed their "information" or "data" with the programs that they were writing, and programming languages had different degrees of support for managing various types of data for specific types of problems. Languages like Fortran, RPG, COBOL/SNOBOL, PL/I all pre-dated the concept of a "relational database" and had different ways of managing "information" or "data" in programs written for them.
Eventually the “relational database” emerged as an extremely useful abstraction to separate information from the programs that processed information. Quite quickly they became the dominant abstraction. Interfaces to relational databases became commonplace throughout the information processing world via the now well-known standard called the Structured Query Language (SQL).
We face a similar situation today where "analytics" are usually woven into general programming languages like Python or Scala. To blur boundaries further, special tools like R and Julia are widely adopted, however they are not well integrated with modern data management stacks. Some folks use monolith platforms like SAS to develop, manage, and deploy analytics. That works well if you like SAS but seems to be against prevailing sentiment to disaggregate the stack when possible. Making matters even more complicated, analytics are often intimately combined with distributed data management systems like Apache Hadoop or Spark.
One might ask: are there standards for describing analytics abstractly in way that is analogous to the way that SQL can be used to describe information and interactions with databases? In fact there are two major ones: the Predictive Model Markup Language (PMML) and the Portable Format for Analytics (PFA). PFA is more modern and sophisticated than PMML, but they both allow for analytics to be abstracted and managed as concrete assets, which are abstracted away from the systems and programs of which they are apart.
You can find out more about PFA here: http://dmg.org/pfa/
For an example, let’s take a very simple analytic: adding 100 to an input stream of type double:
input: double
output: double
action:
{+[input,100]}
Now that is not very pretty, and that is by design. PFA is intended to be easily generated and read by computers. Not surprisingly there is a community that is building around PFA and you can even find flavors that are easy for humans to read and write like PrettyPFA (PPFA):
https://github.com/opendatagroup/hadrian/wiki/PrettyPFA-Reference
Here is the same analytic in PPFA:
metadata: {description: "read a value in, add 100"}
// denote input and output streams
input: double
output: double
// action describes what the engine will ‘do' in this case
add 100 to the input value
action:
{
input + 100.
}
Here is something you can try at home. Read this: https://en.wikipedia.org/wiki/Trajectory_of_a_projectile
And build our pumpkin chunkin analytic (or series of analytics) in PFA or PrettyPFA!
Now that we've given a fuzzy, imperfect definition of "an analytic" it is hopefully clearer how their mathematical and more abstract nature differentiates them from business rules. We’ve also discussed some concrete ways and emerging standards to describe them, which are not tied to any specific general purpose computing language, information processing architecture, or analytic tool.
How is AnalyticOps different than DevOps or Data Science?
Digging into the details, AnalyticOps encompasses the day-to-day activities, concerns, and focus of the person or people responsible for deploying analytics on the IT infrastructure. The AnaltyicOps function is accountable not to the data scientists nor the IT infrastructure leader, but directly to the business units depending on the analytics to make decisions. Let’s take a look at the role of an AnalyticOps specialist, and perhaps you’ll see that you, or someone on your team, is filling this role without even knowing it!
One of the primary functions of an AnalyticOps specialist involves taking the analytics from the data science team, in whatever form they use (R, Python, a spreadsheet of values, etc.) and deploying them into the live data of the business. This might include verifying that the analytics work properly with respect to the computing infrastructure, the data sources, and the data science metrics like predictive quality.
An AnalyticOps specialist will also manage scaling the verified and working analytics. Production implementations may need to utilize more data, with geographically distributed sources of data and compute resources. The AnalyticOps specialist takes care of these details., right down to optimizing the analytics utilization of specific computers.
Ensuring operational safety of deployed analytics is yet another responsibility of the AnaltyicOps specialist. For example, if a working, but incorrect analytic gets into a pricing engine, an organization can lose millions of dollars in a matter of minutes. A proper production implementation, owned via the AnalyticOps role, can prevent such costly errors from ever seeing the light of day.
But certainly practitioners in the data science or DevOps space will ask: “Isn’t this similar to what data scientists and DevOps people do?” Or, more bluntly, “I am doing that today, and we don’t need someone else to manage that stuff.” Certainly this is true today, but as the use of analytics becomes more mature, and abstractions like analytic deployment engines become more common, this must change. Separating out the specific tasks that are uniquely AnalyticOps allows an organization’s data scientists to instead pursue new machine learning algorithms, feature exploration, and value added analytic ideas. Similarly, DevOps folks will continue to focus on managing all that data an infrastructure, especially as Big Data continues its journey from disk (Apache Hadoop) to memory (Apache Spark) and on to network streams (Apache Storm).
For example, let’s say the data scientists come up with a recurrent neural net analytic that is 6% more predictive on the validation sets than the currently deployed random forest. The AnalyticOps specialist is uniquely positioned to not only verify that the neural net is indeed more predictive on live data, but to also see that the new method is 75% more computationally expensive on the existing infrastructure. With this visibility of predictive quality and operational cost, the business might decide to stay with the random forest because the ROI of the 6% increase for the cost that will be incurred can’t be justified.
Managing the life cycle of analytic assets--including their creation, updates, and cost histories--is another important role of the AnalyticOps specialist. Performing this role over an extended period of time allows the AnalyticsOps specialist to make minor (and at times major) adjustments to deploy analytics in a safe, process oriented, repeatable way. And the organization doubly benefits as this work does not distract the data scientists from finding “the next valuable analytic” or prevent the DevOps focus for wiring in that cool new feature for the client/customer.
So in short, AnalyticOps might not be that sexy when compared to “data science”, but like many extremely valuable and interesting technical jobs, it’s right in the heart of one of the biggest areas of value creation the tech world has seen in decades. AnalyticOps is quite literally a lynchpin of the analytics trend as organizations continue to increase the number of analytics that are critical to their mission.
Govern and Scale All Your Enterprise AI Initiatives with ModelOp Center
ModelOp is the leading AI Governance software for enterprises and helps safeguard all AI initiatives — including both traditional and generative AI, whether built in-house or by third-party vendors — without stifling innovation.
Through automation and integrations, ModelOp empowers enterprises to quickly address the critical governance and scale challenges necessary to protect and fully unlock the transformational value of enterprise AI — resulting in effective and responsible AI systems.
To See How ModelOp Center Can Help You Scale Your Approach to AI Governance