CIO Newsletter

Starting on a solid data foundation

Before choosing a platform for sharing data, an organization needs to understand what data it already has and strip it of errors and duplicates.

A big part of preparing data to be shared is an exercise in data normalization, says Juan Orlandini, chief architect and distinguished engineer at Insight Enterprises.

Data formats and data architectures are often inconsistent, and data might even be incomplete. âAll of a sudden, youâre trying to give this data to somebody whoâs not a data person,â he says, âand itâs really easy for them to draw erroneous or misleading insights from that data.â

Organizations often turn to outside help with data normalization because, if done incorrectly, a business might still be left with data quality issues and canât get as much use out of their data as intended.

As more companies use the cloud and cloud-native development, normalizing data has become more complicated.

âIt might be in a NoSQL database, a graph database, or in all these other types of databases now available, and making those consistent becomes really challenging,â Orlandini says.

Content from our sponsor

The open and secure edge-to-cloud platform

Do you want to connect your data, securely, where it lives and turn it into intelligence? Do you want a unified, trusted data source to make smart decisions wherever your people and devices are? Do you need a platform that comes to your edges, data centers, or colocation facilities?

If you do, then HPE GreenLake is the open and secure edge-to-cloud platform that you've been waiting for.

Click to learn more

Exercising tactful platform selection

In many cases, only IT has access to data and data intelligence tools in organizations that donât practice data democratization. So in order to make data accessible to all, new tools and technologies are required.

Of course, cost is a big consideration, says Orlandini, as well as deciding where to host the data, and having it available in a fiscally responsible way. An organization might also question if the data should be maintained on-premises due to security concerns in the public cloud. But Kevin Young, senior data and analytics consultant at consulting firm SPR, says organizations can first share data by creating a data lake like Amazon S3 or Google Cloud Storage. âMembers across the organization can add their data to the lake for all departments to consume,â says Young. But without proper care, a data lake can end up disorganized and cluttered with unusable data. Most organizations donât end up with data lakes, says Orlandini. âThey have data swamps,â he says.

But data lakes arenât the only option for creating a centralized data repository.

Another is through a data fabric, an architecture and set of data services that provide a unified view of an organizationâs data, and enable integration from various sources on-premises, in the cloud and on edge devices.

A data fabric allows datasets to be combined, without the need to make copies, and can make silos less likely.

There are many data fabric software vendors, like IBM Cloud Pak for Data and SAP Data Intelligence, which were both named leaders in Forresterâs Enterprise Data Fabric Q2 2022 report. But with many available options, it can be difficult to know which to choose.

The most important thing is to analyze and monitor data, says Amaresh Tripathy, global analytics leader at professional services firm Genpact.

âMany platforms are out there,â he says. âChoose any platform that works for you, but it should be automated and visible.â Also, the data should be easily accessible from a self-service platform that makes data analysis reporting easy, even for people with no technical experience â âLike a portal where people can see all the data, what it means, what the metrics are, and where itâs coming from,â says Tripathy.

Thereâs no perfect tool, and thereâs often a trade-off between how well a tool does data lineage, data cataloging, and maintains data quality. âMost organizations are trying to solve all three problems together,â Tripathy adds. âSometimes you over-index on one and donât get a very good value on another.â So an organization should decide whatâs most important, he says. âThey should know why theyâre doing it, which tool gives them the best bang for their buck on those three dimensions, and then make the appropriate decision.â

When thinking about how to share data, an organization can also consider implementing a data mesh, which takes the opposite approach to data fabric. While data fabric manages multiple data sources from a single virtual centralized system, a data mesh is a form of enterprise data architecture that takes a decentralized approach and creates multiple domain-specific systems.

With a data mesh, organizations can help ensure data is properly handled by putting it in the hands of those who best understand it, says Chris McLellan, director of operations at Data Collaboration Alliance, a global nonprofit that helps people and organizations get full control of their data. It could be a person, such as the head of finance, or a group of people that are acting as data stewards.

âAt its core, itâs got this concept of data as a product,â he says. âAnd a data product is something that can be owned and curated by someone with domain expertise.â

Implementing a data mesh architecture allows an organization to put specific data sets in the hands of subject matter experts. âThese people are closer to the regulations, the customer, and the end users,â McLellan says. âTheyâre closer to everything about that specific domain of information.â

Data mesh isnât linked to any specific tools, so individual teams can choose whichever ones best fit their needs, and there isnât the bottleneck of everything having to go through a central data team.

âYouâre seeing a decentralization not just of IT or app delivery, but also of data management and data governance,â says McLellan, âwhich are good things because marketers know the laws around consumer protection better than the IT team, and finance knows finance regulations better than IT.â

While there are many vendors selling data mesh, itâs still a shiny new object, Forrester warns, and it has its challenges, including conflicts in how itâs defined, the technologies it uses, and its value.

Training and change management

Once an architecture for data democratization is established, employees need to understand how to work with the new data processes. People can be given the right data, but even if theyâre trained as administrators or accountants, theyâre not necessarily going to understand what to do with it, says Insightâs Orlandini. Data access is not sufficient in itself to make an organization data-driven. âYou have to do some training,â he says. âIf you donât do it properly, youâre going to have mixed success at best, or it might be a failure.â

Some organizations have started their own in-house training programs to ensure employees understand how to interpret and properly handle data.

Genpact, for instance, introduced what it calls its DataBridge initiative last year to increase data literacy across the organization.

âOur intention was not to make 100,000 people citizen data scientists,â says Tripathy. âWe provide the awareness in the context of how they do their work.â For example, an employee doing claims analysis doesnât need to learn all about anomaly detection â what they need to understand is what anomaly detection means for them. âYou may or may not have all the skill sets to look at the data yourself, but you should be able to raise a question and seek help â and being able to ask that question in the right manner is the data-aware aspect of it,â he adds.

Laying the security and compliance groundwork

Proper data governance needs to be implemented from the start to maintain the integrity of data and avoid costly penalties.

Along with IT leaders, security and compliance teams need to be part of the initial conversation, says Insightâs Orlandini. âItâs a big challenge, and a lot of organizations struggle with this,â he says, adding that itâs a prerequisite a companyâs leadership understands exactly what theyâre offering to share, and makes sure itâs being offered to the right people.

âWe live in a highly regulated world where we have to be super careful,â he says, âespecially in industries like healthcare and finance where there are laws that have severe consequences if you let the wrong person have access to the wrong data.â

There are also tools that help organizations with data masking and data obfuscation to avoid revealing personally identifiable information. âYou can start getting insights without revealing PII data, HIPAA records, or any of those regulatory requirements that are out there,â he continues. âThere are also tools with attribute-based access controls where you actually tag data with very specific kinds of attributes â this has PII or HIPAA, whatever your attributes are â and then you only have access to the data with the right kind of attributes associated with it.â

In this way, the data controls itself automatically, and itâs available in a public cloud or hybrid environment with data in multiple locations, or even in private environments with strict compliance controls that can be put in place.

Long-term benefits

Not only can data democratization help an enterprise speed up its data pipelines, it can empower people to find new ways to solve problems through a better awareness of how to analyze and work with data.

Gartner says that by adopting data democratization, organizations can solve resource shortages, decrease bottlenecks, and enable business units to handle their own data requests more easily. By democratizing data, organizations can improve their decision-making by allowing more people to contribute to the analysis and interpretation of data; increase collaboration across teams within an organization; and enhance transparency, since more people have access to information, and can see how data-driven decisions are made.