AIoT Data Strategy
As part of their digital transformation initiatives, many companies are putting data strategies at the center stage. Most enterprise data strategies are a mixture of high-level vision, strategic principles, goal definitions, priority setting, data governance models, architecture tools and best practices for managing semantics and deriving information from raw data.
Since both AI and IoT are also very much about data, every AIoT initiative should also adopt a data strategy. However, it is important to note that this data strategy must work on the level of an individual AIoT-enabled product or solution, not the entire enterprise (unless, of course, the enterprise is pretty much built around said product/solution). This section of the AIoT Framework proposes a structure for an AIoT Data Strategy and identifies the typical dependencies that must be managed.
The AIoT Data Strategy proposed by the AIoT Framework is designed to work well for AIoT product/solution initiatives in the context of a larger enterprise. Consequently, it focuses on supporting product/solution implementation and long-term evolution and tries to avoid replicating typical elements of an enterprise data strategy.
The AIoT Data Strategy has four main elements. First, the development of a prioritization framework that aims to make the relationship between use cases and their data needs visible. Second, management of the data-specific implementation aspects, as well as the Data Lifecycle Management. Third, Data Capabilities required to support the data strategy. Fourth, a lean and efficient Data Governance approach was designed to work on the product/solution level.
Of course, each of these four elements of the AIoT Data Strategy has to be seen in the context of the enterprise that is hosting product/solution development: Enterprise Business Strategy must be well aligned with the use cases. Data-specific implementation projects frequently have to take cross-organization dependencies into consideration, e.g., if data are imported or exported across the boundaries of the current AIoT product/solution. Product/solution-specific data capabilities must be aligned with the existing enterprise capabilities. Product/solution-specific data governance always has to take existing enterprise-level governance into consideration.
Business Alignment & Prioritization
The starting point for business alignment and prioritization should be the actual use cases, which are defined and prioritized by business sponsors, or Epics which have been prioritized in the agile backlog. Sometimes, Epics might be too coarse grained. In this case, Features can be used instead.
For each Use Case/Epic, an analysis from the data perspective should be completed:
- What are the actual data needs to support the Use Case/Epic?
- Which of these data is believed to be already available, which must be newly acquired?
- How can the required data quality be ensured for the particular use case?
- What are potential financial aspects of the data acquisition?
- How do the use cases support the monetization side of things?
- Is this a case where the required data adds functional value to the use case, or is there a direct data monetization aspect to it?
- What are the relationships between the identified data and the other elements of the AIoT Data Strategy: Implementation & Data Lifecycle Management, specific capabilities applying to this particular kind of data, and Data Governance.
A key aspect of the analysis will be the Data Acquisition perspective. For data that can (at least theoretically) be acquired within the boundaries of the AIoT product/solution organization, the following questions should be answered:
- Is the required technical infrastructure already available?
- Does the team have the required capabilities and resources available?
- Especially in the case of AIoT data acquired via sensors:
- Are new sensors required?
- If so, what is the additional development and unit cost?
- Is there an additional downstream cost from the asset/sensor line-fit point of view (i.e. additional manufacturing costs)?
- What is the impact on the business plan?
- What is the impact on the project plan?
- What are the technical risks for new, unknown sensor technologies?
- What are required steps in terms of sourcing and procurement?
For data that need to be acquired from other business units, a number of additional questions will have to be answered:
- Is it technically feasible to access the data (availability of APIs, bandwidth, support of required data access frequency and volume, etc.)
- Can the neighboring business unit support your requirements, not only in terms of technical access, but also in terms of project support and timelines?
- Are there costs involved in technical implementation and/or data access (internal billing)?
- Are there potential limitations or restrictions due to existing internal data governance guidelines, regional or organizational boundaries, etc.?
For data that have to be acquired from external partners or suppliers, there are typically a number of additional complexities that will have to be addressed:
- Technical feasibility across enterprise boundaries
- Legal framework required for data access
- SLA insurance
- Billing and cost management
Based on all of the above, the team should be able to assess the overall feasibility and costs/efforts involved on a per use case/per data item basis. This information is then used as part of the overall prioritization process.
Data Pipeline: Implementation & Data Lifecycle Management
Sometimes it can be difficult to separate data-specific implementation aspects from general implementation aspects. This is an issue that the AIoT Data Strategy needs to deal with to avoid redundant efforts. Typical data-specific implementation and Data Lifecycle Management aspects include the following:
- Data Ingestion: In our context, data ingestion should first be seen as moving data from outside of our organization's boundary to within. Second, technical aspects such as stream vs. batch processing need to be addressed. Typical data ingestion tasks also include cleansing and quality assurance.
- Storage: Depending on the business and technical requirements, data can be stored permanently or temporarily, structured or unstructured, with or without backup, with cache-only or with operational/transactional support, etc. This often needs to be addressed differently for different data types.
- Integration: Data integration is the process of merging data from different sources into a single, unified view. In the case of AIoT, this can be -- for example -- sensor data fusion, done close to the sensors in the edge layer. Or it can be -- usually on a high-level of abstraction -- a real-time data stream integration process. Or it can be -- typically further in the backend -- a batch-oriented integration process.
- Transformation: Many projects spend much time with data transformation, since this is often a prerequisite for data integration or further data processing. The approaches chosen usually vary widely depending on the format, structure, complexity, and volume of the data being transformed.
- Modeling: Data modeling is usually a key step toward dealing with semantics of data and deriving information from raw data. There are different levels of data modeling, including conceptual, logical and physical levels. Another important type of model building on top of data models is AI/ML models. However, these models are usually less data-structure oriented and more mathematical/statistical models.
- Validation: Data validation is the tool that helps ensure data quality, e.g., by applying data cleansing and validation checks. Data validation can use simple, local "validation rules" or "validation constraints" that check for correctness and meaningfulness (e.g., a date of birth cannot be in the future). In some cases, data validation can actually be much more complex, e.g., involving interactions with remote systems, or even AI/ML-based validation algorithms.
- Analysis: In many cases, data analysis is a key use case other than, for example, transactional use of the data. Generally, data analysis supports the discovery of useful information and supports decision-making. Data analysis is a multifaceted topic. It is key that the required Data Capabilities are provided to support here.
- Access Control & Security: Finally, effectively ensuring confidentiality and secure handling of data must be part of every AIoT data strategy. This includes both IoT data coming from assets and data combining from users, other business units, or event external data sources. While security is sometimes dealt with on a different level, fine-grained data access control must usually be dealt with as part of the data strategy.
Finally, another key aspect of Implementation & Data Lifecycle Management is dealing with cross-organizational dependencies. While the earlier data acquisition phase might have already answered some of the high-level questions related to this topic, on the implementation level efficient stakeholder management is a key success factor. Often, earlier agreements with respect to technical data access or commercial conditions, will have to be reviewed, revised or refined during the implementation phase. Some practitioners say that this can sometimes be more difficult in the case of cross-divisional data integration within one enterprise than across enterprise boundaries.
Data Capabilities and Resource Availability
Data-related capabilities can be important in a number of different areas, including:
- Skills: Data-related skills can include a number of areas, including specific data-processing technologies and mathematical, statistical, or algorithmic skills in AI/ML, etc.
- Technology: For an AIoT product/solution initiative, it is usually important that technical management agrees on fixed setup technologies that cover most of the required use cases, e.g., batch vs real-time processing, basic analytics vs AI/ML, etc.
- Processes & Methods: Depending on the specific environment, this can also be a very important aspect. Data-related processes and methods can be specific to a certain analytics method, or they can be related to certain processes and methods defined by an enterprise organization as mandatory.
Depending on the project requirements, it is also important that specific capabilities be supported by appropriate resources. For example, if it is clear that an AIoT project will require the development of certain AI/ML algorithms, then the project management will have to ensure that this particular capability is supported by skilled resources that are available during the required time period. Managing the availability of such highly specialized resources is a topic that can be difficult to align with the pure agile project management paradigm and might require longer-term planning, involving alignment with HR or sourcing/procurement.
Finally, larger AIoT product/solution initiatives will require Data Governance as part of their Data Strategy. This Data Governance cannot be compared with a Data Governance approach typically found on the enterprise level. It needs to be lightweight and pragmatic, covering basic aspects such as:
- Data & Trust Policies: How is this specific AIoT product/solution dealing with this topic? This is likely to be very use case specific, so the AIoT initiative will have to build on generic enterprise-level requirements but will have to add policies specific to its own use case.
- Data Architecture: It is not always clear if data architecture is a discipline on its own, or if this is simply one facet of the product/solution architecture. For example, the AIoT Framework has a dedicated viewpoint to support the combination of data and functionality.
- Data Lineage: Data lineages traces where data originate, what happens with it on the way, and where it moves over time. Data lineage provides visibility and transparency and can help simplify root cause analysis in the data analytics process. Data Governance can either support the central documentation of data lineages or provide tools and best practices for implementation teams.
- Metadata Management and Data Catalog: Efficient management of metadata is a prerequisite for efficient data processing and analytics. Types of metadata include descriptive, structural and administrative. A data catalog can provide support for metadata management, together with other tools, such as search.
- Data Model Management: For many AIoT applications, centrally managing a high-level data model that describes key entities and their relationships, as well as dependencies on different use cases and components, can be of great help in creating transparency and improving alignment between different teams. The AIoT Framework proposes a lightweight AIoT Domain Model approach. In addition, the Data Governance team could also provide tooling and best practices for teams that need more detailed models in their areas. This can also be linked back to the Metadata Management and Data Catalog topics.
- API Management: In his famous "API Mandate", Amazon CEO Jeff Bezos declared that "All teams will henceforth expose their data and functionality through service interfaces." at Amazon. This executive-level support for an API-centric way of dealing with data exchange (and exposing component functionality) shows how important API management has become at the enterprise level. The success of an AIoT initiative will also depend strongly on it. If there is no enterprise-wide API infrastructure and management approach available, this is a key support element that must be provided and enforced by the Data Governance team.
Finally, the Data Governance / Data Strategy team should give itself a setup of KPIs by which they can measure their own success and the effectiveness and efficiency of the AIoT Data Strategy.
Authors and Contributors