Where dataops fits Enterprises today are increasingly injecting machine learning into a vast array of products and services and dataops is an approach geared toward supporting the end-to-end needs of machine learning.
âFor example, this style makes it more feasible for data scientists to have the support of software engineering to provide what is needed when models are handed over to operations during deployment,â Ted Dunning and Ellen Friedman write in their book, Machine Learning Logistics.
âThe dataops approach is not limited to machine learning,â they add. âThis style of organization is useful for any data-oriented work, making it easier to take advantage of the benefits offered by building a global data fabric.â
They also note dataops fits well with microservices architectures. Dataops in practice To make the most of dataops, enterprises must evolve their data management strategies to deal with data at scale and in response to real-world events as they happen, according to Dunning and Friedman.
Because dataops builds on devops, cross-functional teams that cut across âskill guildsâ such as operations, software engineering, architecture and planning, product management, data analysis, data development, and data engineering are essential, and dataops teams should be managed in ways that ensure increased collaboration and communication among developers, operations professionals, and data experts.
Data scientists may also be included as key members of dataops teams, according to Dunning. âI think the most important thing to do here is to not stick with the more traditional Ivory Tower organization where data scientists live apart from dev teams,â he says. âThe most important step you can take is to actually embed data scientists in a devops team. When they live in the same room, eat the same meals, hear the same complaints, they will naturally gain alignment.â
But Dunning also notes that data scientists may not need to be permanently embedded in a dataops team.
âTypically, thereâs a data scientist embedded in the team for a time,â Dunning says. âTheir capabilities and sensibilities begin to rub off. Someone on the team then takes on the role of data engineer and kind of a low-budget data scientist. The actual data scientist embedded in the team then moves along. Itâs a fluid situation.â How to build a dataops team Most devops-based enterprises already have the nucleus of a dataops team on hand. Once they have identified projects that need data-intensive development, they need only add someone with data training to the team. Often that person is a data engineer rather than a data scientist. DataKitchen suggests organizations seek out dataops engineers who specialize in creating and implementing the processes that enable teamwork within data organizations. These individuals design the orchestrations that allow work to flow from development to production and ensure that hardware, software, data, and other resources are available on demand.
Many teams are built of individuals with overlapping skillsets, or individuals may take on multiple roles with a dataops team, depending on expertise. According to Michele Goetz, vice president and principal analyst at Forrester, some of the key areas of expertise on dataops teams include: -
Databases - Integration
- Data to process orchestration
- Data policy deployment
- Data and model integration
- Data security and privacy controls
Regardless of makeup, dataops teams must share a common goal: the data-driven needs of the services they support. Dataops roles According to Goetz, dataops team members include: - Data specialists, who support the data landscape and development best practices
- Data engineers, who provide ad hoc and system support to BI, analytics, and business applications
- Principal data engineers, who are developers working on product and customer-facing deliverables
Dataops salaries Here are some of the most popular job titles related to dataops and the average salary for each position, according to data from PayScale: Dataops tools The following are some of the most popular dataops tools: - Census: An operational analytics platform specialized for reverse ETL, the process of synching data from a source of truth (like a data warehouse) to frontline systems like CRM, advertising platforms, etc.
- Databricks Lakehouse Platform: a data management platform that unifies data warehousing and AI use cases
- Datafold: A data quality platform for detecting and fixing data quality issues
- DataKitchen: A data observability and automation platform that orchestrates end-to-end multi-tool, multi-environment data pipelines
- Dbt: A data transformation tool for creating data pipelines
- Tengu: A dataops orchestration platform for data and pipeline management
|