Artifact Compatibility for Enabling Collaboration in the Artificial Intelligence Ecosystem

. Different types of software components and data have to be combined to solve an artificial intelligence challenge. An emerging marketplace for these components will allow for their exchange and distribution. To facilitate and boost the collaboration on the marketplace a solution for finding compatible artifacts is needed. We propose a concept to define compatibility on such a marketplace and suggest appropriate scenarios on how users can interact with it to support the different types of required compatibility. We also propose an initial architecture that derives from and implements the compatibility principles and makes the scenarios feasible. We matured our concept in focus group workshops and inter-views with potential marketplace users from industry and academia. The results demonstrate the applicability of the concept in a real-world scenario.


Introduction
Despite the amount of available and rapidly increasing tools to manage data and cloud infrastructure offerings which can be used for collaboration purposes, Artificial Intelligence (AI) systems are still developed as "monolithic systems," where a single company owns the entire development chain [1].This end-to-end control implies an unnecessarily large cost of ownership and a long time to market creating an investment obstacle which might result in unexploited innovation potential and competitive advantage, particularly for SMEs.There are different reasons for the lack of collaboration between organizations like missing a trustful way to access and share data [1], and technical issues of having incompatible data formats [2].The Bonseyes AI marketplace is an emerging concept that addresses the challenge of cross-organizational collaboration in the field of AI [1].One of its main purposes is to facilitate and foster the exchange of AI artifacts like data and AI models being used in the process of AI system development.Bonseyes facilitates the exchange through a digital marketplace.It is a place where companies or organizations that need specific AI solutions, state needs in the form of AI challenges, and where others solve or collaboratively solve challenges by developing and trading or exchanging artifacts.The marketplace allows for specialization in the process of implementing AI solutions.Companies can specialize in solving a particular type of challenges, e.g., object detection or face recognition or they can specialize on offering a particular type of artifacts, e.g., data in the form of images of objects or faces, needed to solve the challenges.
Critical for the marketplace to become established is to facilitate efficient access and exchange of artifacts, ensuring their compatibility, and at the same time to offer a reliable system that protects the rights of the artifact vendors.Companies and organizations not having access to the vast amount of resources and expertise required to collect and process data would profit from the marketplace in that they would be able to more efficiently find partners who are willing to work on their challenges.Others specialized in delivering particular services in the process of AI systems development will be able to target customers and develop solutions more efficiently.A typical case of collaboration and a business case is a company that possesses data and requests expertise from data scientists to build AI models and algorithms that utilize the models.
In this article, we address the topic of compatibility of artifacts offered in the marketplace.In the next Section 2, we provide some background on the topic and define terms that we further use.In Section 3 we state our method and validation procedure.In Section 4 we present our compatibility concept.In Section 5 we discuss our contribution by connecting the concept to related work.Section 6 concludes.

Background
The AI marketplace is a meeting place for the AI community.There, organizations or individuals can exchange and trade artifacts like data and models that are used in the process of AI system development.Data, for example, can be numbers, text, images or sound.Video data is a sequence of images or frames.Data can be static or dynamic.Depending on the context, e.g., in the case of a video, the dynamic data can also be called live or stream data.There are also other classifications of data types possible to differentiate continuous, discrete, binary, and categorical data.Static data is offered as datasets or archives.For example, images may be offered as a set of JPEG or PNG files.When a data set is too large to be handled with standard software, e.g., statistical tools like SPSS, the data is termed as big data [3].
Models are machine learning (ML) algorithms designed to work and learn from the data, and in supervised machine learning from the knowledge associated with the data for example in the form of labels.The labels are sometimes also called annotations.Humans can classify data, and the class associated with the data will be its annotation.
The so enriched data can be as well an artifact and be registered in the marketplace.The procedure of learning is being called training, and the trained model is used to make predictions for the labels of yet for the model unseen data records.The procedure of predicting the annotations is being called inference.Some examples of other AI products that can be offered in the marketplace are the tools that create and process the artifacts.For example, software solutions that convert one data type into another or one model type into another.Also, software that cleans the data by automatically dealing with missing values, or tools that benchmark the models also classify as AI products that can be offered on the marketplace, since they increase the efficiency of the AI systems development process.
The artifacts and the tools can be combined and become parts of a so-called AI pipeline.A pipeline is used to solve a given challenge from the marketplace, e.g., find all the faces in an image and draw a bounding box around them.
We refer to organizations or individuals providing the challenges as challenge providers, and those who provide data as data providers.Humans who are working with the artifacts we refer to as data scientists.The same organization or individual can have different roles.Thus, the marketplace is a meeting place for these different roles, which offers them a platform to work collaboratively on a specific AI challenge.

Methodology
In our work, we follow the technology transfer model of Gorschek et al. [4].The model suggests to 1) identify the problem or issue, 2) study state of the art, 3) to create a candidate solution, 4) validate the solution in academia, 5) validate the solution statically for example through interviews, 6) validate the solution dynamically for instance through pilot projects, and 7) release the solution.They stress that the model can and should be adapted according to the industry needs.
Aims.The work at hand describes the outcome of the first two steps in the model namely identifying the issue, studying state of the art, and the beginning of the third step which is formulating a candidate solution.It aims to report our findings and to outline the candidate solution.Section 4.1 presents the compatibility model.The scenarios for the usage of the marketplace and the enabling system architecture we describe in Sections 4.2 and 4.3 respectively.
Method.We identified the challenge for compatibility in a pipeline in the course of the Bonseyes Project while conducting interviews and workshops with members of the AI community in industry and academia who participate in the project and are part of the Bonseyes consortium.We used these meetings to elicit the requirements for the marketplace.We also studied the literature on collaboration and licensing that can be relevant for a pipeline where distinct organizations are creating artifacts.Our initial thoughts resulted in a flow of actions and a system architecture similar to the ones presented in Fig. 2 and Fig. 3 but focused on licensing.
To obtain feedback on our ideas, we decided for a focus group workshop which is a suitable method to fulfill the aim [5, p.9-13].We used the work of Kontio et al. [6] for further guidance on preparing and conducting the workshop.We held an online workshop with seven practitioners and researchers from industry and academia in March 2018 where one of them was the first author of this paper who moderated the session, and another one was the third author who is responsible for the system architecture of the marketplace.A researcher working on privacy and security concepts for the marketplace was also present.The industry was represented by a senior developer working on AI pipelines, a research manager, and a developer both working on the concepts concerning the front-end of the marketplace, and the last participant was a senior project manager on the consortium level.All industry participants were working in SMEs.
After the feedback, we matured the concepts in interviews.However, we saw the need for a second focus group workshop to validate the changes.It took place one week after the first workshop and had the same setup as the first one except that two of the participants changed.Instead of the third author and the senior project manager, we had two researchers from different universities working on AI pipelines.
To validate and mature the new concepts the first author also conducted five semistructured one on one interviews, three interviews between the focus group workshops and two interviews shortly after these workshops.He was guided by the recommendations of Hove and Anda in [7].The first and fourth interviews were conducted in person while the rest were online.The first author discussed the new concept by presenting it and having similar questions like the ones stated below (see Data Collection), but now they were adapted towards the more general topic of artifact compatibility.The interviews, except the fourth one where exact definitions of terms were discussed, were not recorded, but notes of the obtained feedback and hints for improvement were taken.
The first interview was with a researcher outside the Bonseyes consortium, who later joined it.He was working in the field of tool support for sharing AI artifacts.The rest of the improvement and validation activities, like the focus group meetings, the following written correspondence, and four of the interviews, were within the consortium.

Data collection.
In the first workshop, we presented our ideas in the form of the diagrams shown in Fig. 2 and Fig. 3 and discussed the following questions: 1. Have you experienced licensing compatibility problems in your AI projects?2. Which artifacts cause them? 3. How did you solve them? 4. Does the suggested concept for checking license incompatibility support you in the process of managing license compatibility?

What would you change in this concept?
The session lasted one hour and was recorded.The discussion was continued after the session in an e-mail thread for open questions.As suggested by [5, p.13-15], we used the focus group to elicit inputs on how to mature our concept.We took notes during the session and listened to the recording afterward extracting additional feedback that was not noted in the first place.The notes were shared with the participants to ensure correctness.We did not receive any objections.
We used the feedback from the focus group to create the concept of compatibility (see Section 4.1).In the first interviews following the focus group, the first author discussed the new concept by presenting it with questions like the ones stated above but adapted towards the topic of artifact compatibility.The session lasted one hour.
The second interview was between the first and the third author who was also part of the first focus group.It lasted 1.5 hours.During the interviewee, the marketplace system architecture (see Section 4.3) was discussed intensively.The discussion also continued in an e-mail thread after the interview.
The third interview was with a researcher who works in the field of applied AI in a medical research institute and uses the deep learning technology to create applications for a hospital attached to his institute.The interview lasted an hour discussing questions like the ones above.He was part of the second focus group.
The feedback from the interviews was considered, and changes to the concepts were applied.In the second focus group workshop, the updated concepts were presented and validation questions were stated.The session lasted one hour.We took notes and recorded the session.The notes were again shared without receiving objections.However, we received additional feedback on the compatibility concept from the researcher manager in the SME responsible for the front-end of the marketplace.
In addition, two more interviews were conducted.The first with the CEO of an SME, who is at the same time a project manager of the consortium, and the second one with a senior developer in the same SME, who was part of both workshops.These additional interviews focused on the overall AI pipeline concept and the definition of terms used in it.The first interview lasted one hour and the second 1.5 hours.These discussions continued in e-mail and messaging threads.
The participants in the focus groups and the interview partners were selected so that they represent the developers of the marketplace as well as its future users.All of them were familiar with the context since all were or now are part of the consortium.

Analyses and results
. We aimed at maturing the concepts to the extent they could be implemented in the marketplace and validated empirically.In the first focus group workshop, we discussed usage scenarios and architecture of the marketplace.In the second, we discussed the updated version of marketplace and the compatibility model (see section 4.1).We analyzed the results of the workshops and the interviews by going through our notes and the recordings and updated the concepts iteratively.
Overall in the first workshop, we got positive feedback on our concept.For example, the senior developer of the feedback-providing SME stated: "Overall I think the architecture you presented is going match well our intended usages."During the workshop, he also noticed that licensing is an issue, and one license of an artifact can influence how another artifact is used, e.g., models trained with a specific dataset are not allowed to be used with another dataset.
While discussing the last two questions meant to gather feedback on the concept, which was at that time centered on licensing, we noticed that it was not addressing the full needs of the practitioners present at the workshop.For example, not only the dependency between licenses where discussed, but also how these dependencies relate to different types of artifacts and also dependencies between an AI challenge and artifacts were mentioned.In retrospection, we realized that we need to zoom out and look at the whole range of dependencies to have compatible artifacts in a pipeline to solve a challenge, and the licensing was a part of the bigger picture of compatibility.These thoughts resulted in the compatibility concept described in Section 4.
In the first ensuing interview, the compatibility model was presented (see Section 4.1).The interviewed researcher gave positive feedback and stressed the need for artifact interoperability and the issues connected to it.The result from the second interview is captured in Section 4.3.The third interview confirmed the compatibility concept and offered input on the variety of AI challenges.In the second workshop, the participants confirmed the improvements and suggested additional improvement possibilities.In the following two interviews, the pipeline, tools, and artifacts needed to solve the AI challenge were discussed and terms were defined.Here are some of these definitions, for the terms used in the next section: • AI Challenge: "A challenge specifies the requirements for a detection, location, classification, or regression capability offered by a target platform."• AI Tool: "A software component that creates or processes an artifact." • AI Artifact: "The product of the execution of a tool.It can be an output of the pipeline or an intermediate result that is processed by other tools." Threats to validity.We realize that we discussed the concept with a small group of people and will need to involve larger groups in the discussion in the feature to account for conclusion validity.Nevertheless, the participants in the focus groups and the interviews were chosen because of their expertise in the corresponding parts of the concept.We as well tried to account for external validity through involving different groups of data scientists in our research, people from industry and academia.We also interviewed a representative of the community at the time outside the consortium.

Compatibility Concept
We first present our compatibility model that is a general conceptual model of what kinds of compatibility are required for distinct organizations to collaboratively create a pipeline of AI artifacts.We then present a scenario for the usage of the AI marketplace that will be implemented.For that, we derive on the principles of the introduced model.The principles in that sense are the different compatibility types.Afterword we present the system architecture of the marketplace that allows for such scenarios.

AI Artifact Compatibility Model
We take a closer look at three key elements that from a data scientist's perspective have to be compatible so that a pipeline of artifacts is formed (Fig. 1): challenges, artifacts, and licenses.There is twofold compatibility between these elements: vertical and horizontal.Vertical compatibility addresses compatibility between a task and artifacts, and between artifacts and licenses.Horizontal compatibility ensures the compatibility between artifacts, and between licenses.

Challenge Category-Artifact Compatibility.
A pipeline aims to solve a specific AI Challenge.That means that artifacts are combined and modified to match the specific challenge.The artifacts can be easily discovered, if they belong to the same category of challenges.Just like for data there are different possibilities to categorize AI challenges: depending on, if the data is labeled or not, the challenges can relate to supervised and unsupervised ML problems respectively, or if the output of the prediction should be classes or a real number the problems are classification and regression problems respectively [8, p. 14-18].Or one can go further and differentiate between different types of classification depending on the data, e.g., image and text classification; or be even more specific by dividing the image classification into object or scene detection problems.The workshop participants preferred the latest categorization type.From the data scientist's perspective, it makes sense to have this type of compatibility since this would be one of the ways to search and filter the artifacts on the marketplace.
Artifact-License Compatibility.We adapt Ferrante's definition of software licensing [9] to artifact licensing and define it as any procedure that lets an enterprise or user acquire and use the artifact on a machine or network according to the artifact owners licensing agreement.Not any license is compatible with any artifact.Models for example, if they are intended for the public, can be licensed under one of the GNU licenses.License Compatibility.The difficulties bound to the compatibility between licenses in one artifact has already being mentioned.There is also plenty of literature concerned with the compatibility of free and open source licenses [11,12,13].It gets even more challenging when the artifacts are connected in a cross-organizational pipeline.A typical example of how proprietary licenses affect the distribution of the pipeline results are models that are trained on a proprietary set of data.In that case, the data would be the one artifact provided by one company, and it will be connected to another artifact a model from another company.Depending on the terms specified in the proprietary license of the data the compiled versions of the model might not be eligible for inference on other data sets.The marketplace would need to detect such incompatibilities between artifacts and give notice to the data scientist that they are about to connect artifacts under incompatible licensing.In certain cases, it would even need to enforce restrictions defined by the licenses, e.g., when having a license bound to time limitations for the usage of an artifact.

Artifact Compatibility.
Here we refer to the more technical compatibility between artifacts, e.g., can a certain model work with a particular dataset.Thus, if artifacts are compatible also depends on the tools that create and process them.Tools have to actively collaborate with one another, which is why in terms of the tools we speak of interoperability instead of compatibility.In that sense, we adapt Wiliden and Kaplan's definition of Software interoperability [14] and define tool interoperability as the ability for multiple tools to communicate and interact with one another exchanging compatible artifacts.The aim here is to synchronize the inputs and the outputs of the tools.Interfaces need to be defined, for example by utilizing the REST or SOAP standards.And because the ML community is spread between many different domains of research especially when it comes to applied ML, it will be essential to have not only syntactic but also semantic interoperability between the tools.For that purpose, an ontology would need to emerge.Ontologies allow for encoding of knowledge in a computer-processable way and thus making it transferable within and across domains [15].With our work, we aim to contribute to this goal.
In our research, we thematize the licensing of artifacts.We define tools as software components, and their licensing would be in accordance with the licensing of software.Being a part of an AI pipeline poses some specific requirements on the licensing of tools, this is further discussed in Section 5.1 Related Work.

AI Marketplace Scenarios
There are many different utilization scenarios possible for the AI Marketplace.The scenarios are a sequence of combinations of atomic use cases offered or rather implemented by the system.Such atomic use cases are for example: register an artifact, access an artifact, search for an artifact, register a custom license, etc.The scenarios depend on the intention of the marketplace users, which most likely will be data scientists, and the possibilities or use cases that the marketplace system implements.Figure 2 displays a flowchart diagram for the registration of an artifact on the marketplace.Following the steps on the diagram, we demonstrate the realization of the scenario where a data scientist registers an artifact under its own license.
In the illustrated procedure, a data scientist decides to register an artifact on the marketplace.He selects the appropriate "Register Artifact" function depending on the type of artifact he wants to register, e.g., data or model at the UI of the marketplace and so initiates the process illustrated in Fig. 2. Then he needs to enter the description of the artifact which will usually include what the artifact is about and how to use it.Afterward, he needs to select an appropriate license for the artifact to account for Artifact-License compatibility.If he decides for a standard license like any CC license for data or one of the GNU licenses for a model, he can get support from the marketplace for the decision which license might be appropriate for his artifact.Based on the example of how CC licenses are selected on the page of Creative Commons: creativecommons.org the marketplace will offer support by asking questions like what features of the license are important for the data scientist.If he, however, decides to use his own license like the scenario in Fig. 2 supposes, he can upload it and will then need to select licenses that are compatible with his own license.This step is intended to support the license compatibility discussed in the previous chapter.He will explicitly need to enter into this step, and this is why we did not model it like a decision point, he can, of course, skip it by selecting an appropriate "skip" option.The license selection procedure can be iterated through several times if the data scientist would like to license the artifact at the same time under different licenses so that better artifact compatibility for the ones who will later be working with the artifact is achieved.After all information about the artifact is entered, the user can finish the process by for example hitting an appropriate "Register artifact" button at the UI.

AI Marketplace System Architecture
Figure 3 illustrates the system architecture that will assure the different types of compatibility described in Section 4.1 and will allow for scenarios like the one discussed in Section 4.2.To understand the need for the different systems and components and their interoperability in the System Architecture of the marketplace we first look at the context of a pipeline in Table 1.A pipeline of tools and artifacts is meant to fulfill an AI challenge and will have a predefined and editable set of users, licenses, artifacts, tools, and resources.
These different context elements need to be managed.This is done through the Challenge-, User Account-, License-, Artifact-, Tools-and Resource Management Systems.
They will be interconnected by the illustrated schema (see Fig. 3) and will have the possibility to pull and push information from and to the associated systems.They can also be seen as services that exchange information whenever needed to realize use cases and ensure the compatibility and interoperability between the different artifacts and tools of a pipeline.

Context Element Description The AI challenge
The AI challenge that the pipeline is meant for Set of users The users accounted to access the pipeline Set of licenses The licenses for the artifacts and tools Set of artifacts The artifacts in the pipeline Set of tools The tools in the pipeline Set of resources The physical resources allocated to the pipeline The Pipeline itself is meant to run on an external system.The system will be installed either locally for example on the hardware provided by a data scientist or in a cloud environment that will allow for an inter-organizational collaboration in the development of a pipeline.The AI Pipeline System will manage different pipelines through the Pipeline Management and Orchestration Component.This component has its counterpart at the AI Marketplace that is the Pipeline Management System.Through the interplay between these components, different scenarios are possible, e.g., defining a pipeline already in the marketplace and so ensuring tool interoperability between the tools even before downloading or accessing them.Another scenario could be the enforcement of licenses.The Pipeline Management and Orchestration Component will in regular intervals connect to the Pipeline Management System and check for license compatibility.If a license is expired, that tool would not be started, and the user will get an appropriate notification to renew the license.
The data scientist will also be able to allocate system resources for the pipeline through the Pipeline Resource Management Component or the Resource Management System.We intend to offer possibilities for the data scientists to allocate cloud infrastructure resources attached to the Marketplace directly in it for example through a specially designed UI component that will connect to the Resource Management System, which will, in turn, be connected to a third-party infrastructure provider.In such scenario data scientists do not need to organize cloud infrastructure services by themselves.
The interactions between the AI Pipeline System and the AI Marketplace are possible and secured through the interplay between the Authentication System and the Authentication Component on the AI Marketplace and the AI Pipeline System respectively.This is a small part of the security concept for the marketplace, which is being currently discussed and developed.
While the Pipeline Management and Orchestration-and the Pipeline Resource Management Components manage entire pipelines and their resources, their counterparts at the single pipeline level the Tool and Artifact Management and Orchestration-and the Tool Resource Components manage tools and artifacts and their resource allocations.
The Tool and Artifact Resource Management Component can thereby only allocate resources to the tools that were allocated to the pipeline.The tools contain software components, which depending on the artifact type that they create and process, can differ significantly.These can be systems to access big data, can be systems to compile models or any other software tools that can be useful in an AI pipeline.The Connection Components are part of the tool interoperability and the artifact compatibility concept.

Discussion
The overall aim is to develop a marketplace system that will enhance the collaboration between different organizations regarding efficiency of pipeline development that addresses a specific AI Challenge.These organizations and the data scientists, in particular, should be facilitated to start sharing their achievements in a proprietary or non-proprietary way.The sharing and acquiring of tools and artifacts should be enabled in an easy to use and understandable way, however not compromising on security and copyright limitations.Having such a largely heterogeneous field of technologies like the ones in the ML domain and at the same time, the variety of privacy and security requirements bound to the AI artifacts makes the task extremely challenging.
In countless projects meetings within the Bonseyes consortium in smaller and bigger circles as well through our findings in the literature [16,17] we figured out that there are two major challenges that we need to address if we strive to develop a system that is perceived as useful.On the one hand, tools and artifacts need to be technically compatible and on the other trust between the collaborators has to be established.Licensing contracts are meant to protect intellectual property rights and deal with issues arising in conflict situations [18, p.101-112].Therefore we started by focusing on licensing issues.In the first focus group workshop, we realized that we have to broaden our view and address more of the needs of industry.We, however, found ourselves in the difficult situation of being confronted with many interrelated compatibility issues.We developed the AI artifact compatibility model to distinguish between different problem areas that we are going to encounter in the process of the AI marketplace development.
Having the model allowed us to much easier develop usage scenarios for the marketplace that address and offer solutions to the encountered compatibility problems.An example there is the scenario of a data scientist selecting licenses compatible to his proprietary model and so supporting the license compatibility of artifacts in a pipeline.
Having the model and the scenarios made it then again easier to develop a system architecture that supports the development of a marketplace which aims to ease the burdens of data scientists who collaboratively develop artifacts in a pipeline.
With our concept of the marketplace and the exchange of artifacts and tools, we aim to enable a cross-domain collaboration.E.g., a specific tool for annotation from one domain can be used in another one, or a particular dataset can be utilized in two different ways.Data scientists and data science applications, in general, can thus profit from the AI Marketplace.
However, we also realize that there are potential limitations of the concept.For example, there is certain expertise needed to work with specific artifacts and tools.Or there is a need to install a particular environment be able to automatically interact with the marketplace, e.g., the Docker Engine to be able to pull and push dockerized tools.Other than mentioning that an environment needs to be installed there was no discussion on the limitation in the workshops or the interviews.However, it will be important to elaborate and validate these concerns in the future.

Related Work
Software ecosystems are composed of different software components that can be supplied by different software vendors and have different licenses [19].Scacchi and Alspaugh [19] work on how software niches, which they define as networks of software producers, integrators, and consumers of specific components, are better defined by software component licenses and the architectural composition of the system.They state that if licenses change customers might decide to change to a more desirable product in the ecosystem.Our concept of the marketplace can also be seen as a software ecosystem where data scientists are enabled to pick the best tool or artifact that they need for their pipelines.An earlier version of Scacchi and Alspaugh's work can be found in [20].
Van Angeren et al. [21] published results about the strategies that software vendors use to select suppliers in software ecosystems and the factors influencing the decisions.The research shows that many software vendors would appreciate minimal dependence.This is also something from which the AI community might benefit utilizing the AI marketplace.
Research that combines the topics of AI, eco-systems, and IP is scarce, and we only came across the work of Keisner et al. [22].They focus on robotics ecosystems and the IP bound to these.It is not a technical paper proposing a new eco-system.They analyze, among other robotics and IP bound topics, what the role of patents for IP in robotics is, mention some open source eco-system platforms, and raise the question to whom belongs the IP that robots create.These topics will also be important for the AI marketplace in the near future and will need further investigation.We can also analyze the existing platforms in the field of robotics.
Despite the scarcity of combined research, there is specialized literature for particular compatibility issues identified by the AI Artifact compatibility model.Thereof we can derive ideas on how to solve specific compatibility issues related to compatibility of artifacts.
How vital licensing is and what kind of primary licensing models exist can be found in [9].These issues arise as application developers often try to find suitable algorithms for a task on the Internet, and there are no indicators as to why AI developers would behave differently.AI developers combine different services and software packages.The combination of these elements can increase the risk of license incompatibility [11] already at this experimental phase.If these algorithms reach production stages in the project, this might cause license incompatibilities with the intended license for the final solution.
Even if developers are careful, licenses can change over time and cause incompatibility issues.Di Penta et al. [12] collect five different famous cases in the field of free and open source software (FOSS) related to license incompatibility issues after license evolution.They also analyze some widely-spread FOSS systems and find out that a large proportion of the source code file changes is due to changes in the licensing statements and develop a method to track the evolution of licenses in files.
By the time of writing, we count 83 approved licenses only by the Open Source Initiative: https://opensource.org/.The plethora of legal issues related to contrasting licenses in open source and free software is discussed for example by Nimmer in [13].
Thompson and Jena in [23] provide an overview of different digital services needed for an electronic license.In [24] Raekow et al. provide a license management architecture for distributed environments.Cacciari look et al. provide in [25] SLA based licensing services for the cloud.A formal way of expressing formal licensing clauses is mentioned by Gangadharan and D'Andrea in [26].
We also looked at literature that concerns software interoperability where we could derive ideas for how AI tools interoperability and artifact compatibility should work.Componend based software development is discussed for example by Medvidovic et al. in [27].Chapman et al. provide a service management framework for on-demand cloud provisioning [28].

Summary and Conclusions
We developed and described an AI compatibility concept consisting of three parts: An AI artifact compatibility model, scenarios for the usage of the AI marketplace that comply with and implement the principles of the model, and a system architecture that is aligned and derives its structure from the compatibility model and enables the scenarios.We matured the concept through focus group workshops and interviews.The overall feedback on the concept was positive, and we were encouraged to continue our research.Some of the discussions especially the interviews went deep into the concepts and many questions for further investigation were raised.For example, concerning the license compatibility, and because there is a large variety of different artifacts which standard licenses are suitable for which artifacts?Which rights of the license owner should be enforced in what way?Or concerning tools interoperability, how exactly should the interfaces between the tools supporting the exchange of artifacts or how should the components of the pipeline environment in the AI Pipeline System be specified so that they can exchange data in a standard manner?What will be the limitations of these approaches?Establishing trust in the marketplace by addressing privacy and security concerns and raising the attention of the machine learning community as well as getting the attention of SMEs by offering attractive business models are additional fields where further research is required.
And data, if it is available as open data, can be licensed with one of the Creative Commons (CC) licenses.However, Creative Commons does not recommend the use of CC licenses for software since they miss specific terms for the distribution of source code[10].This would mean that models especially the ones with open source code should not be licensed under a CC license.The licensing of an artifact depends on what exactly it consists of.If for example a dataset consists of a mix of scientist's proprietary data and open data under the CC BY-SA 4.0 license and the data scientist would like to use his license to distribute it, this would not be possible.The ShareAlike term of the license obligates him to use the same license for the entire data set.He will either need to use the CC license or remove the part of the data licensed under that particular CC license to be able to utilize his own license for his proprietary data.

Fig. 2 .
Fig. 2. -Flowchart diagram for the registration of an artifact on the AI marketplace

Fig. 3 .
Fig. 3. -System architecture for the elements needed to realize the compatibility concept between the AI marketplace and a single pipeline of tools and artifacts