Data Driven Process Optimisation - Open research Challenges

Modern organizations routinely deploy process analytics, including process discovery and variant analysis techniques, both to gain insight into the reality of their operational processes and also to identify process improvement opportunities. Process analytic approaches play a critical role in supporting the practice of Business Process Management and continuous process improvement by leveraging process-related data to identify performance bottlenecks, reducing costs, extracting insights and optimizing the utilization of available resources. Over the past two decades, problems such as automated process discovery, process conformance checking and process enhancement have been extensively studied. Common to all process discovery approaches is the idea of extracting a process design from an event log which best represents the executions recorded in the process logs mined.

Process mining techniques are primarily reliant on process logs, which don't always explicitly capture all the behaviour of past executed processes. Such logs are susceptible to domain gaps, data bias (due to incompleteness) and quality issues (due to noise and erroneous data recordings). Many real world processes are unstructured in nature and for these processes most state-of-the-art process discovery algorithms produce, hard-to-interpret, spaghetti-like models which poorly fit the event log. This negatively influences the usefulness of the discovered process model. Often times discovered models are hard to interpret from a process analyst perspective while also prone to under-fitting or over-fitting the given event logs, offering only minuscule support for improving process outcomes. Furthermore, a particular challenge in process mining is the management of business-process variants and contemporary business process management tools do not provide adequate support for modeling and management of process variants. For complex domains like healthcare, where improving clinical outcomes can directly impact the quality of life for patients, this implies that process analysts miss out on the opportunities for a complete understanding of the underlying process behaviour and subsequently extracting higher-impact insights.

Common-sense reasoning has been highlighted as one of the major challenges for the process analytics research community. It follows a broader trend in AI research where the need for solving complex tasks by incorporating knowledge and common-sense reasoning has been repeatedly highlighted. In machine learning research, recent interest in Neuro-symbolic AI and Data-centric AI techniques are based on the observations that we can make more theoretical progress in AI by reasoning about data instead of model architectures (e.g., number of layers or dimensions). Methods like data augmentation have gained popularity, allowing us to systematically engineer the data for building intelligent systems. In process mining, while a lot of emphasis has been on analyzing and extracting process insights from the observed behaviour logged in event logs, the knowledge dimension associated with business processes has received very little attention . i.e Current process mining techniques are self-contained and have minimal capacity to leverage and reasoning using prior knowledge. In practice, this means process mining algorithms can't perform inference that goes beyond the implicit knowledge which is recorded in the event logs. Traditional mining techniques focus on mining behaviour that inherently cannot represent all of the cascading hierarchical structure representing complex real-world processes. These algorithms can't reason about abstract relationships between various objects involved in the process. For example, easily-drawn inferences that people can readily answer without direct training like smoke is seen so there must be fire happening cannot be inferred by the current process mining methods. This leads to an incomplete understanding of process behaviour where process analysts are left trying to abstract, simplify, and even leave out key relationships needed for complete understanding of process behaviour. The structural complexity, poor data quality and data incompleteness of available data is a major challenge during process discovery. Process mining algorithms can extract a process model, However due to data quality challenges, some of the behaviour cannot be accurately extracted using the process discovery algorithms, limiting the mined model's utility.

Process knowledge has many faces and it will differ from other forms of organizational knowledge as it will be highly contextual, sometimes tacit and relevant to a particular domain. In this work we consider a knowledge-centric approach for assisting process analysts in understanding event data from multiple behavioral dimensions. Our framework addresses challenges associated with variability and data quality of event logs. To explain our ideas we consider a real-world scenario where process mining techniques are applied.

The question for process mining research community is this: How can we support the process discovery phase by leveraging knowledge graphs and more recently LLM fine-tuned on organization data?

A knowledge-graph based system that can facilitate the analysis of process variants from context-enriched event logs. It also can enable constraint-based filtering of atypical behavior to improve the quality of mined dependency graph. This line of thinking can also in my opinion help tackle the problem of semantic incompleteness of real-world event logs in order to improve the utility of extracted process models.

Over the past few years, Knowledge Graphs (KG) have emerged as a compelling abstraction for organizing enterprise knowledge. Knowledge Graphs employ a graph-based data model to capture knowledge in a concise and intuitive abstraction for a wide variety of domains. In its simplest form, a knowledge graph is represented by a directed edge-labelled graph composed of nodes and edges. Nodes represent entities of interest and edges capture potentially complex relations between the entities of a given domain . Formally, given a set of nodes $N$, and a set of labels $L$, a knowledge graph is a subset of the cross product $N \times L \times N$. Each member of this set is referred to as a triple and labels capture meanings of the relationships between the entities represented by notes.

Knowledge graphs are an effective tool for modeling interconnected, real-world scenarios while representing an ever-evolving substrate of knowledge within an organization. Knowledge graphs can be applied for integrating, managing and extracting value from diverse sources of data at large scale to present a unified view of enterprise knowledge. Knowledge acquisition tasks that are commonly performed to generate or extract implied information from Knowledge graph are knowledge graph completion, triple classification, entity recognition, and relation extraction.

We can argue that knowledge and processes are interlinked and knowledge should explicitly be made a key component of business processes mining practice. From a process analytics perspective, knowledge graphs and LLMs offer rich semantics for knowledge representation and can be leveraged to model domain knowledge and process properties that are typically not captured in event logs. Knowledge graphs also provide structured relational knowledge between concepts, making them a good fit for tasks that require reasoning. These capabilities will will allow process analysts to better understand process behaviour from event logs.

Process analytic approaches play a critical role in supporting the practice of business process management and continuous process improvement by leveraging process-related data to identify performance bottlenecks, extracting insights about reducing costs and optimizing the utilization of available resources. Process analytic techniques often have to contend with real-world settings where available logs are noisy or incomplete. Therefore, we need new solutions that levrage the powerful capabiltiies offered by LLMs and (in comparison with unaided classic process mining techniques). This will lead to improved utility and reliability of extracted process models. I hope future researchers can investigate the challenges associated with process discovery and process variant analysis by considering a knowledge-graph and LLM based approach that assists process analysts in understanding the process execution behaviour from event logs.