Delivering Behaviour Change Interventions: Development of a Mode of Delivery Ontology

Background: Investigating and improving the effects of behaviour change interventions requires detailed and consistent specification of all aspects of interventions. An important feature of interventions is the way in which these are delivered, i.e. their mode of delivery. This paper describes an ontology for specifying the mode of delivery of interventions, which forms part of the Behaviour Change Intervention Ontology, currently being developed in the Wellcome Trust funded Human Behaviour-Change Project. Methods: The Mode of Delivery Ontology was developed in an iterative process of annotating behaviour change interventions evaluation reports, and consulting with expert stakeholders. It consisted of seven steps: 1) annotation of 110 intervention reports to develop a preliminary classification of modes of delivery; 2) open review from international experts (n=25); 3) second round of annotations with 55 reports to test inter-rater reliability and identify limitations; 4) second round of expert review feedback (n=16); 5) final round of testing of the refined ontology by two annotators familiar and two annotators unfamiliar with the ontology; 6) specification of ontological relationships between entities; and 7) transformation into a machine-readable format using the Web Ontology Language (OWL) and publishing online. Results: The resulting ontology is a four-level hierarchical structure comprising 65 unique modes of delivery, organised by 15 upper-level classes: Informational , Environmental change, Somatic, Somatic alteration, Individual-based/ Pair-based /Group-based, Uni-directional/Interactional, Synchronous/ Asynchronous, Push/ Pull, Gamification, Arts feature. Relationships between entities consist of is_a. Inter-rater reliability of the Mode of Delivery Ontology for annotating intervention evaluation reports was a=0.80 (very good) for those familiar with the ontology and a= 0.58 (acceptable) for those unfamiliar with it. Conclusion: The ontology can be used for both annotating and writing behaviour change intervention evaluation reports in a consistent and coherent manner, thereby improving evidence comparison, synthesis, replication, and implementation of effective interventions.


Introduction
Patterns of human behaviour contribute significantly to the global disease burden, as well as to a wide range of environmental and social problems (e.g. Gakidou  Being able to specify intervention characteristics in a way that facilitates replication and evidence synthesis is an important step in building evidence efficiently and cumulatively. This requires conceptual frameworks that organise knowledge using clear, coherent, and shared terminology (Michie et al., 2017). Such frameworks promote communication and collaboration across disciplines and research groups, and can be helpful in advancing knowledge generation to inform intervention development, implementation, evaluation, and reporting (Craig et al., 2008;Hoffmann et al., 2014;Moher et al., 2001). Another benefit of using conceptual frameworks is that they can enhance researchers' ability to examine associations between specific intervention components and outcomes (Sheeran et al., 2017). This allows for a more thorough understanding of interventions and how they bring about their effects which, in turn, can inform the development of more effective interventions.

Ontologies
BCTTv1 is an example of a taxonomy, a knowledge representation structure in which a controlled vocabulary of agreed-upon terms is arranged hierarchically. An ontology is a more expressive structure for organising knowledge (see glossary of italicised terms, Table 1 . This collaboration between behavioural scientists, computer scientists and systems architects is building a database and platform for researchers, practitioners and policy-makers to address variants of the 'big question' of behaviour change: "What works, compared with what, how well, with what exposure, with what behaviours (for how long), for whom, in what settings and why?" Answering this involves extending previous work to classify all entities of behaviour change interventions and the relationships between them, i.e. a Behaviour change intervention ontology (BCIO), specified by a controlled vocabulary that by the upper level of the BCIO (Michie et al., 2020b) contains 42 entities. The Behaviour change intervention delivery entity of the ontology (i.e. the means by which BCI content is provided), comprises (a) BCI Source (i.e., a role played by a person, population or organisation that provides a behaviour change intervention), (b) BCI Schedule of delivery (an attribute of a behaviour change intervention that involves its temporal organisation), (c) BCI Style of delivery (an attribute of a BCI delivery that encompasses the characteristics of how a behaviour change intervention is communicated), and (d) BCI Mode of delivery (an attribute of a BCI delivery that is the physical or informational medium through which a behaviour change intervention is provided).

Amendments from Version 1
This version of the manuscript includes the changes made in response to the two reviewers. It provides more description of the peer-review process and how the Mode of Delivery Ontology can be used for different purposes. Two minor corrections were made to the definitions in the Ontology on the entities video game mode of delivery and somatic alteration mode of delivery.
Any further responses from the reviewers can be found at the end of the article

Annotation guidance manual
Written guidance on how to identify and tag pieces of text from intervention evaluation reports with specific codes relating to entities in the ontology.

Basic Formal Ontology (BFO)
An upper level ontology consisting of continuants and occurrents developed to support integration, especially of data obtained through scientific research.

Entity
Anything that exists, that can be a continuant or an occurrent as defined in the Basic Formal Ontology.

EPPI-Reviewer
A web-based software program for managing and analysing data in all types of systematic review (metaanalysis, framework synthesis, thematic synthesis etc). It manages references, stores PDF files and facilitates qualitative and quantitative analyses such as metaanalysis and thematic synthesis. It also has a facility to annotate published papers.

Ontology
A standardised framework providing a set of terms that can be used for the consistent annotation (or "tagging") of data and information across disciplinary and research community boundaries.

Parent class
A class within an ontology that is hierarchically related to one or more child (subsumed) classes such that all members of the child class are also members of the parent class and all properties of the parent class are also properties of the child class.

Reconciliation
The process of discussing differences between the annotations of two paired annotators on the same papers. Differences are discussed before a final reconciled version of coding for each paper is produced.

Aim
The aim of the MoD Ontology is to provide a clear, usable and reliable classification system to specify the MoDs of behaviour change interventions, including single BCTs. The development of an ontology with clear and unambiguously defined terms enables precision of reporting, which in turn promotes evidence synthesis, replication and analyses of associations between MoDs, other intervention characteristics and intervention outcomes.

Methods
The ontology was developed in seven iterative steps (detailed below), involving reviewing existing classification systems, annotation of behaviour change intervention reports (including testing of inter-rater reliability) and feedback from international expert stakeholders (outlined in Table 2).
Step 1: Development of the preliminary ontology and piloting Initial descriptions of MoD entities were extracted from 20 published behaviour change intervention evaluation reports, randomly selected using a random number generator from a larger database of reports annotated by behaviour change techniques and mechanisms of action (Michie et al., 2018), covering a range of health behaviours. Next, two researchers independently piloted the preliminary MoD ontology with another set of intervention reports, taken from the same database and using the same selection method. Guidance on how to annotate papers for MoD was developed by the research team, providing clear instructions on how to code each entity, including definitions and examples for each. Reports were annotated in batches of 10 until a satisfactory and stable criterion of inter-rater reliability was achieved. Inter-rater reliability of the extent to which researchers capture the same information from a report was measured in two ways. The first was percentage agreement of instances where both researchers had annotated an MoD. The second was the proportion of times annotators agreed on a code when both of them captured the same information from a report. This was calculated at every level of the hierarchy, and it was performed using Cohen's Kappa (Cohen, 1960), in Microsoft Excel 365. Kappa values >.61 were deemed as 'substantial' and values >.81 as 'strong' (Landis & Koch, 1977). The preliminary ontology was revised and updated iteratively throughout the annotation process. Where there were discrepancies between the two annotators, these were discussed, and amendments were made to the ontology if both annotators judged that these changes would improve clarity. In the case of disagreement, a senior member of the research team was consulted.
Step 2: Stakeholder review (Round 1) Nine international behavioural scientists with experience in behaviour change interventions, across a range of behavioural domains, were invited to provide feedback on the structure, content and terminology of the preliminary MoD Ontology. Following small adjustments based on this feedback, the MoD Ontology was published online, and a wider international research community was invited through mailing lists to submit feedback using an open Qualtrics form presenting the preliminary MoD structure, and entity labels and definitions (see https://osf. io/eyn3b/ (West et al., 2020)). Twenty-five behavioural scientists responded to indicate whether 1) there were any entities missing, 2) the structure was coherent, 3) there were changes needed in the terminology of the labels and definitions, and 4) there were additional suggestions for improvement.
Step 3: Inter-rater reliability testing (Round 2) The revised version was used to annotate MoD entities in a set of 55 published reports, randomly selected using a random number generator from the database mentioned in Step 1 (Michie et al., 2018). These papers covered the behavioural domains of physical activity, diet and smoking. Annotation of the reports was conducted independently by two researchers. The annotation process was carried out in batches of five papers. After every batch, annotations were compared, and discrepancies discussed. Inter-rater reliability was calculated using the same procedure as in Step 1. Where there were discrepancies, consensus was reached through discussion.
Step 4: Stakeholder review (Round 2) Experts who provided feedback in Step 2 were invited to submit feedback on the revised ontology. Experts were sent an email with a request to review the structure, labels and definitions of each entity, and indicate whether the structure was coherent and whether there was anything missing and provide suggestions for improved terminology. During this step, an ontology expert (JH) was consulted regarding the structure and definitions.
Step ). There was a reconciliation process after the first batch of 10, followed by any necessary amendments to the annotation manual. These amendments mainly involved the inclusion of examples (e.g. illustrating when to code or not to code certain pieces of information as MoD).
To examine the usability of the MoD Ontology for researchers and intervention developers with no prior knowledge of the MoD Ontology, we conducted a final round of inter-rater reliability assessment by asking two researchers unfamiliar with the ontology and without specific expertise in modes of delivery to annotate a random sample of randomised controlled trials from a database of papers annotated by BCTs, with no restrictions on the outcome behaviour. Inter-rater reliability was assessed using Krippendorff's Alpha (Hayes & Krippendorff, 2007), using Python 3.6 (code available on GitHub (Finnerty & Moore, 2020)).
Step 6: Specifying relationships within the MoD Ontology The research team developed relationships between ontology entities to formally capture the types of knowledge that are present in the ontology. The relationships were specified following best practices from Basic Formal Ontology (BFO) described in Arp et al. (2015) and Relation Ontology (Smith et al., 2005). Relationships can be generic and shared across multiple ontologies (e.g the "is a" relationship between classes where one class is a subclass of another class, or the "part of" relationship which captures the relationship between wholes and their parts) or they can be domain specific, which are introduced when needed to formally capture relationships unique to a given domain.
Step 7: Making the MoD Ontology machine-readable and available online The MoD Ontology was initially developed as a

Results
Step 1: Development of the preliminary ontology and piloting The data extracted from the behaviour change intervention reports led to the identification of 160 unique entities, which were represented in a four-level hierarchical structure, as well as two 'cross-cutting' entities (a description of the preliminary version is available as Extended data at https://osf.io/gu5ke/ (West et al., 2020)). A hundred reports were annotated, with adjustments made to the ontology as a result of the first 70; the ontology was stable for the final 30 reports. Average agreement between annotators for each batch of 10 reports varied between 72% and 95%. Inter-rater reliability was calculated for each level of the hierarchy separately and considered to be 'good' for all levels (% agreement 86.6 to 97.8; Kappa 0.68 to 0.97). Reliability was also calculated for each of the cross-cutting entities (Kappa = .55 and .75). Further details on the inter-rater reliability and changes made to the MoD Ontology in this step can be found as Extended data at: https://osf.io/r3wn2/ (West et al., 2020).
Step 2: Stakeholder review (Round 1) Feedback on the MoD ontology through the open review feedback form was received by 25 experts, of which 18 were from universities, 5 were from commercial sector organisations, 1 from public sector organisations and 1 from third sector. Twelve experts were from the United Kingdom, 2 from the United States of America, 3 from Ireland, 1 from Canada, 1 from the Netherlands, 1 from New Zealand, and we have no information about the country for the remaining 5 experts. These data were collated, synthesised, and discussed among the research team. This led to further amendments to the structure, content and terminology (full details on the feedback and corresponding changes made to the MoD Ontology are available as Extended data at https://osf.io/95n3a/ (West et al., 2020)).
Step 3: Inter-rater reliability testing (Round 2) For the 55 papers annotated in this round, agreement for whether a particular entity was considered an MoD was 61%; and agreement on the specific MoD code assigned was 87.9% (Kappa = .857) (inter-rater reliability results are available as Extended data at https://osf.io/sw2jv/ (West et al., 2020)).
Step 4: Stakeholder review (Round 2) Feedback was received from 16 of the 25 experts invited. Based on this, the following changes were made: 1) the entities "other" and "unclear" were removed, as all entities represented in an ontology need to be fully specified; and (2) increased clarity was provided on how the cross-cutting entities related to the other upper-level classes (see https://osf.io/ 3zhbc/ (West et al., 2020) for more details").
For the revised version, definitions were developed using pre-specified guidance, with the standard format of definitions being: A is a B that C, or involves or relates to C in some way, where A is the class being defined, B is a parent class and C describes a set of properties of A that distinguish it from other members of B (Michie et al., 2019b).
Step 5: Inter-rater reliability testing (Round 3) For the annotations conducted by researchers familiar with the MoD ontology, a very good agreement (a=0.80) was achieved after annotating 50 reports (25 smoking and 25 physical activity). For the annotations conducted by researchers unfamiliar with the ontology, acceptable agreement (a=0.58) was achieved after annotating 96 papers, targeting various behaviours (26 physical activity; 22 diet; 13 alcohol; 11 treatment adherence; nine sexual behaviours; seven multiple health behaviours; two for prescription, smoking, and screening, respectively; and one paper for organ donation and one for oral health) (Hayes & Krippendorff, 2007) (inter-rater reliability results are available as Extended data at https://osf.io/efp4x/ (West et al., 2020)).
Step 6: Specifying relationships within the MoD Ontology Currently, the only relationship used in the ontology represent its hierarchical structure, i.e. "subclass of" (is_a) relationships (e.g. face to face MoD "is_a" human interactional MoD). Formal representations of knowledge using explicit logical relationships allow computational tools to perform additional checks and inferences to enhance the resulting consistency of reporting for complex interventions.
Step 7 -Making the MoD Ontology machine-readable and available online A downloadable version of the final MoD Ontology can be found on GitHub (Finnerty & Moore, 2020). The hierarchical structure, labels, uniform resource identifiers (URIs) and definitions for all entities are described in Table 3. The ontology is accompanied by an annotation manual that provides guidance on how to annotate for these entities in reports of behaviour change interventions (available as Extended data at https://osf.io/4j2xh/ (West et al., 2020)).

Discussion
Given the lack of classification systems providing comprehensive coverage of how behaviour change interventions and techniques are delivered, we developed the first ontology of modes of delivery (MoD). This ontology consists of 65 entities organised in 15 upper-level entities. Inter-rater reliability was found to be 0. Further, by linking with other HBCP ontologies characterising behaviour change interventions, it will be possible to go a step further and identify which MoD(s) are more appropriate for different behaviours, populations, contexts, if they need to be tailored, and their potential for reach and engagement.

Strengths and limitations
These ontologies provide a framework for applying machine learning and reasoning algorithms to synthesise and interpret evidence, as well as predict outcome. This allows real-time up-to-date evidence to be interrogated by users such as policymakers, planners and intervention designers to answer variants of the "big question": "What works, compared with what, how well, with what exposure, with what behaviours (for how long), for whom, in what settings and why?", across a wide range of contexts. This body of work has the potential to have far-reaching use by and implications for policy-makers, practitioners and researchers -for example, by informing evidence-based guidelines and identifying knowledge gaps.
Further, the use of entity IDs for each entity in the ontology provides a machine-readable identifier for integration in future systems and also allows interoperability between existing ontologies.
Several limitations should be noted about the development process, and the resulting MoD Ontology. Given the rapid growth in new technologies and the fast-moving pace of behavioural science research, the MoD Ontology will need updating and refining as existing methods develop and new methods emerge. However, this is common to all ontologies and indeed considered 'best practice' in ontology development (Arp et al., 2015). Informational mode of delivery that involves use of printed material.

Upper-Level
Can include paper, acetate, text, diagrams and photographic images.
Letter mode of delivery BCIO:011006 Printed material mode of delivery that involves a letter or postcard that can be sent through the post or handed directly to the recipient.
Public notice mode of delivery BCIO:011007 Printed material mode of delivery that involves display of a poster, sign or notice in a public location.
Printed publication mode of delivery BCIO:011008 Printed material mode of delivery that involves use of a printed publication.
Labelling mode of delivery BCIO:011009 Printed material mode of delivery that involves information printed on a product or its packaging, or a label attached to or included with, a product or its packaging, and aims to convey information about that product.
Electronic mode of delivery BCIO:011010 Informational mode of delivery that involves electronic technology in the presentation of information to an intervention recipient.

Examples of usage
Television mode of delivery BCIO:011011 Electronic mode of delivery that involves presentation of information that is broadcast and displayed by television.
Includes internet and satellite television.
Mobile digital device mode of delivery BCIO:011012 Electronic mode of delivery that involves presentation of information by a handheld mobile digital device that can store, retrieve and process data.
Computer mode of delivery BCIO:011013 Electronic mode of delivery that involves presentation of information by a desktop or laptop computer.
Electronic billboard mode of delivery BCIO:011014 Electronic mode of delivery that involves presentation of information by an electronic screen positioned in a public location.
Wearable electronic device mode of delivery BCIO:011015 Electronic mode of delivery that involves presentation of information by an electronic screen positioned in a public location.
Includes a watch, clip-on device, spectacles, in-ear devfice, vibrating device.
Electronic environmental object mode of delivery BCIO:011016 Electronic mode of delivery that involves an electronic device positioned in the environment of the intervention recipient that can gather information and respond to commands.
Includes robots, and 'internet of things'.

3-D projection mode of delivery BCIO:011017
Electronic mode of delivery that involves presentation of a 3-D image.
Includes hologram but does not include virtual reality headsets.
Virtual reality mode of delivery BCIO:011018 Electronic mode of delivery that involves use of virtual reality through a virtual reality headset and optionally body movement sensors.
Playable electronic storage mode of delivery BCIO:011019 Electronic mode of delivery that involves presentation of information stored on an object that is inserted into a playing device.
Radio broadcast mode of delivery BCIO:011020 Electronic mode of delivery that involves presentation of audio information that is broadcast and received by a radio receiver.
Call mode of delivery BCIO:011021 Electronic mode of delivery that involves a communication process in which a signal is sent by a caller to a recipient to alert them of the communication intent, giving the recipient the opportunity to engage with the communication.
Includes automated calls and audio messaging. Mode of delivery that involves devices or substances that alter bodily processes or structure.

Upper-Level
Ingestion mode of delivery BCIO:011035 Somatic mode of delivery that involves ingestion of a chemical into the body. Mode of delivery that involves modifying the structure of the body of the recipient of the intervention.

Upper-Level
Includes surgery.
Individual-based mode of delivery *BCIO:011055 Mode of delivery that involves one recipient in the location where the intervention is delivered.
Pair-based mode of delivery *BCIO:011056 Mode of delivery that involves two recipients in the location where the intervention is delivered who have an interpersonal relationship.
Group-based mode of delivery *BCIO:011057 Mode of delivery that involves three or more people in the location where the intervention is delivered.
Uni-directional mode of delivery **BCIO:011058 Mode of delivery in which the only causal influence is from the intervention source to the recipient.
Interactional mode of delivery **BCIO:011059 Mode of delivery in which there is causal influence from the intervention source to the recipient and from the recipient to the source.
Synchronous mode of delivery ***BCIO:011060 Mode of delivery that involves delivery and receipt of the intervention or its components occurring at the same time or very close in time.

Examples of usage
Asynchronous mode of delivery ***BCIO:011061 Mode of delivery that involves receipt of the intervention or its components taking place a significant period of time after delivery.
Push mode of delivery ****BCIO:011062 Mode of delivery that is not dependent on actions on the part of the intervention recipient.
Pull mode of delivery ****BCIO:011063 Mode of delivery that requires some action on the part of the recipient.
Gamification mode of delivery BCIO:011064 Mode of delivery that involves application of typical elements of game playing to other areas of activity, typically as an online marketing technique to encourage engagement with a product or service.
Includes point scoring, competition with others, and rules of play.
Arts feature mode of delivery BCIO:011065 Mode of delivery that involves application of creativity on the part of the intervention recipient.
Includes art therapy, music therapy, dance and acting.
Note. Entity IDs correspond to Behaviour Change Intervention Ontology (BCIO);* Only one of individual-based, group-based or pair-based mode of delivery will apply; **only one of uni-directional or interactional mode of delivery will apply; ***only one of synchronous or asynchronous mode of delivery will apply; **** only one of push or pull mode of delivery will apply.
Secondly, the intervention reports included in the annotation process were from two larger projects, the Theory and Techniques Project (Michie et al., 2018) and the Human Behaviour-Change Project (Michie et al., 2017). The intervention reports annotated within the ontology development mainly addressed two health-related behaviours, smoking cessation and physical activity; there is always the possibility that other literature within and outside the health domain may indicate modes of delivery not captured in our set of papers or by our group of experts. However, external inter-rater reliability was tested across diverse behaviours and found to be acceptable. Future applications of the ontologies to a wider collection of non-health related behaviours and contexts is likely to extend and improve the ontology. The inter-rater reliability of the annotations conducted by coders unfamiliar with the ontology was lower than that found in other ontologies of the BCIO such as the Intervention Setting Ontology (Norris et al., 2020), a result that can be explained by the complexity of this ontology. Nonetheless, the coding guidelines were refined throughout the process and the level of reliability increased considerably between the first and second sets of 50 papers. It is our recommendation that anyone interested in using the MoD ontology should first familiarise themselves with the MoD entities (labels, definitions and examples) and their relationships, read the coding manual, and conduct some trial annotations and assessment of reliability.

Conclusions
The MoD Ontology provides a foundation on which future research can build, and its development is intended to be an ongoing and collaborative process. By providing greater clarity about how an intervention and its components are delivered, researchers can add to knowledge as to how MoDs influence intervention effectiveness, both directly and in interaction with other intervention-related entities. This will inform the selection of appropriate MoDs for interventions.

Ethics
Ethical approval was granted by University College London's ethics committee (CEHP/2016/555). Participant consent was gained from the first page of the online Qualtrics survey.

Data availability Underlying data
The BCIO is available from: https://github.com/HumanBehaviour-ChangeProject/ontologies. already follow a certain protocol of annotation, could this create some bias?
The authors mention 'mailing lists' as a way to recruit the experts. Could they provide more information on the mailing lists, or characteristics of experts?
Could the authors elaborate more on the potential reasons for discrepancies in interrater reliability 'whether a particular entity was considered an MoD was 61%; and agreement on the specific MoD code assigned was 87.9%' in round 2?

○
Step 5: could it be that the lower agreement between raters was not related to the fact that they were less familiar with the ontology, but by the fact that there were was a wider variety in target behaviors in this selection of reports? Taxonomies are also mostly applied to diet, physical activity, addictive behaviours; could it be that the ontology does not fit as well with screening, infectious diseases etc? ○   The authors describe their approach and results in building an ontology of mode of delivery of interventions in this paper. The paper is well-written, clearly structured and methodologically sound. I have listed a few suggestions and minor comments below: We appreciate the reviewer's positive feedback. We have addressed all comments. Thank you for this suggestion. We have added information related to this to the paragraph presented in the previous comment. Table 2. The text, for example, mentions 20 pilot reports in step 1, and then another 'set of interventions'. Table 2 then shows 120 BCI reports were extracted. Why 120? How did the authors decide this was an appropriate number? Same for step 3 (55 reports).

In the method part it was sometimes difficult to see the link between the text and
Thank you for noticing this. In step 1 there was an initial extraction of 20 reports for the first skeleton of the ontology and then 100 papers more were annotated to improve the coverage and specificity of the ontology and test its reliability. We have amended the table as follows: "Data extraction from 120 BCI reports: 20 reports for initial draft + 100 for improvements and inter-rater reliability calculations". The number of papers was not pre defined, the coders kept reviewing until an adequate Kappa was reached. The same was true for the number of papers in step 3.
Could more information be provided on the database? Are these reports that maybe already follow a certain protocol of annotation, could this create some bias?
Thank you for pointing this out. The 55 reports came from a collection of articles assembled for a previous project in our research group (Michie et al., 2018). These are articles in which authors described links between behaviour change techniques and intervention mechanisms of action. Mode of delivery might be described in more detail in these papers where other aspects of interventions are also specified in detail. This greater level of nuance is likely to be a greater challenge to create ontology categories to fit, and so make achieving good inter-rater reliability more difficult.

The authors mention 'mailing lists' as a way to recruit the experts. Could they provide more information on the mailing lists, or characteristics of experts?
We thank the reviewer for this important point. Invitations to potential participants were sent out via third-party mailing lists (e.g. conference). We have some data on the characteristics of the experts who participated, such as the type of organisations reviewers were from and countries. We also have a list of the specific institutions they were from. We have added this information to the results section, step 2 as follows: "Feedback on the MoD ontology through the open review feedback form was received by 25 experts, of which 18 were from universities, 5 were from commercial sector organisations, 1 from public sector organisations and 1 from third sector. Twelve experts were from the United Kingdom, 2 from the United States of America, 3 from Ireland, 1 from Canada, 1 from the Netherlands, 1 from New Zealand, and we have no information about the country for the remaining 5 experts." Could the authors elaborate more on the potential reasons for discrepancies in interrater reliability 'whether a particular entity was considered an MoD was 61%; and agreement on the specific MoD code assigned was 87.9%' in round 2?
The first element corresponds to recognizing that part of the text contains a description of a mode of delivery. One of the reasons for this lower agreement can be due to the fact that many papers describe mode of delivery poorly and it is stated in the coding manual that MoD should be coded when it is clearly stated in the paper (similarly to BCTTv1). When both coders identified a segment of the text as stating a MoD there was higher agreement about which specific MoD was stated, which demonstrates the utility of the MoD in distinguishing between different MoDs and clearly defining them.
Step 5: could it be that the lower agreement between raters was not related to the fact that they were less familiar with the ontology, but by the fact that there were was a wider variety in target behaviors in this selection of reports? Taxonomies are also mostly applied to diet, physical activity, addictive behaviours; could it be that the ontology does not fit as well with screening, infectious diseases etc?
This is an interesting point. The MoD ontology was designed to be applicable across behaviours, and MoD reporting or lack of it seems to be consistent across behaviours. We hope that future research using this ontology will provides the necessary data to explore this issue further, i.e. if lower agreement are related with familiarity and/or stability across behaviours.   3: definition of video game delivery seems to copy-pasted from the level above? Thank you for noticing this. We have now changed the definition to "Electronic mode of delivery that involves the intervention recipient playing a computer game." Table 3: Somatic alteration mode of delivery -also typo (copy-paste above)? Again, thank you for spotting this typo. We have changed to "Mode of delivery that involves modifying the structure of the body of the recipient of the intervention" addition to the literature. Overall, the paper is very well written and the studies sound. My only issue is about the extent to which the introduction includes references to other taxonomies/ontologies beyond the three that it does mention, and therefore how the paper is situated in the literature both in the introduction and discussion sections.
Abstract I didn't understand the sentence "Relationships between entities consist of is_a." Should the conclusion in the abstract recommend that people should be familiar with the ontology to ensure that it was used reliably, given that the reliability was only 0.58 when they were unfamiliar?
Introduction You introduce three classification systems but then move straight into the BCTTv1. It is not clear why you focus on that one and so this paragraph seems to come from nowhere. Could you make the reason you are moving from the three systems to the BCTTv1 more obvious? Also, you start a new paragraph after introducing the three systems but that is a very short paragraph, so I would suggest this needs to be one paragraph together. I also expected in the introduction to see more reference to previous taxonomies and problematising these to establish why this ontology was so important. You don't, for example, mention the EPOC taxonomy and I was not sure why.

Methods and results
Step 1. This step specifies health behaviours. Previously, you have not specified that this relates to health behaviours specifically, in fact you introduce this as including environmental and social problems, and some of the earlier work is related to health worker behaviours. It would be good to have some clarity about whether this is all human behaviour (which I think it is) and to what extent the methods relied on interventions related to health behaviours and whether this is a limitation of the methods. I know you do state this as a limitation but it would be good to see this up front. in the methods and a rationale for why the study was conducted in this way.
Step 2. Can you report the response rate (either in methods or results) and where the raters were from. I'm particularly interested in whether all were from a particular part of the world, what institutions were included. Much of the work rests on these individuals being experts so I think it would be appropriate to include some further information in the text that summarises their credentials and any potential biases they might introduce into the initial ontology.

Discussion
As per the introduction, it would be useful to see how this ontology fits with previous attempts at classifying modes of delivery. If there are none (if the EPOC taxonomy is not an example of this) then it would be good to state that as part of the reason for developing this anew.

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others?
should first familiarise themselves with it. We have added a sentence to the discussion section of the manuscript as follows: "It is our recommendation that anyone interested in using the MoD ontology should first familiarise themselves with the MoD entities (labels, definitions and examples) and their relationships, read the coding manual, and conduct some trial annotation and assessment of reliability."

Introduction
You introduce three classification systems but then move straight into the BCTTv1. It is not clear why you focus on that one and so this paragraph seems to come from nowhere. Could you make the reason you are moving from the three systems to the BCTTv1 more obvious? Also, you start a new paragraph after introducing the three systems but that is a very short paragraph, so I would suggest this needs to be one paragraph together. I also expected in the introduction to see more reference to previous taxonomies and problematising these to establish why this ontology was so important. You don't, for example, mention the EPOC taxonomy and I was not sure why.
Thank you for your comment. We have revised this section to reflect the BCTTv1 as an example of a taxonomy focusing on the content of interventions. In addition, we added information about the EPOC taxonomy in the "Delivery of Behaviour Change Interventions" section.

Methods and results
Step 1. This step specifies health behaviours. Previously, you have not specified that this relates to health behaviours specifically, in fact you introduce this as including environmental and social problems, and some of the earlier work is related to health worker behaviours. It would be good to have some clarity about whether this is all human behaviour (which I think it is) and to what extent the methods relied on interventions related to health behaviours and whether this is a limitation of the methods. I know you do state this as a limitation but it would be good to see this up front. in the methods and a rationale for why the study was conducted in this way.
Thank you for pointing this out. This is indeed intended as an ontology of modes of delivery for all domains of behaviour change interventions. The limitations section of the discussion addresses the limitations of having annotated mainly health-related behaviour papers within the ontology development stages, and we have now made this point clearer, as follows: "Secondly, the intervention reports included in the annotation process were from two larger projects, the Theory and Techniques Project ( Michie et al., 2018) and the Human Behaviour-Change Project ( Michie et al., 2017). The intervention reports annotated within the ontology development mainly addressed two health-related behaviours, smoking cessation and physical activity; there is always the possibility that other literature within and outside the health domain may indicate modes of delivery not captured in our set of papers or by our group of experts. However, external inter-rater reliability was tested across diverse behaviours and found to be acceptable. Future applications of the ontologies to a wider collection of non-health related behaviours and contexts is likely to extend and improve the ontology." Step 2. Can you report the response rate (either in methods or results) and where the raters were from. I'm particularly interested in whether all were from a particular part of the world, what institutions were included. Much of the work rests on these individuals being experts so I think it would be appropriate to include some further information in the text that summarises their credentials and any potential biases they might introduce into the initial ontology.
We thank the reviewer for this important point. We have data on the type of organisations reviewers were from: 18 were from universities, 5 from commercial sector organisations, 1 from public sector organisations and 1 third sector; and the countries: 12 experts were from the United Kingdom, 2 from the United States of America, 3 from Ireland, 1 from Canada, 1 from the Netherlands, 1 from New Zealand, and for 5 of them we have no information about the country. We also have a list of the specific institutions they were from. We don't have response rate data as the invitations to participate were sent out via third-party mailing lists (e.g. conference) and so we do not know how many people were subscribed to each list. We have added the following information to the results, step 2: "Feedback on the MoD ontology through the open review feedback form was received by 25 experts, of which 18 were from universities, 5 were from commercial sector organisations, 1 from public sector organisations and 1 from third sector. Twelve experts were from the United Kingdom, 2 from the United States of America, 3 from Ireland, 1 from Canada, 1 from the Netherlands, 1 from New Zealand, and we have no information about the country for the remaining 5 experts."

Discussion
As per the introduction, it would be useful to see how this ontology fits with previous attempts at classifying modes of delivery. If there are none (if the EPOC taxonomy is not an example of this) then it would be good to state that as part of the reason for developing this.