Framing superbugs-testing whether advocacy frames change attitudes, intention or behaviour using an online randomised control experiment [version 1; peer review: awaiting peer review]

Background: Antimicrobial resistance (AMR) presents a significant threat to global health, requiring multifaceted action by individuals and policymakers. Advocates must persuade others to act. Making communication about AMR more effective could plausibly increase support for action. The Wellcome Trust-funded ‘Reframing Resistance’ project used communications research to develop framing recommendations for the language practitioners use to describe AMR. The aim of this study was to explore how this language influenced attitudes and behaviours towards AMR. Methods: This study was a randomised trial to evaluate the effects of different styles of AMR framing language upon attitudinal and behavioural measures. Participants (n=1,934) were recruited in October 2019 using an on-line tool called “Prolific” and randomly assigned to review five variations of AMR narratives: four experimental frames which incorporated different combinations of language recommended by the framing guidelines, or a control frame without these features, taken from UN AMR communications. Participants were then asked a series of attitudinal and behavioural questions in relation to the AMR narrative they reviewed. Attitudes were measured using five-point Likert-type scales and behaviours were measured using binary variables. Descriptive analysis was used to explore respondents’ characteristics and multivariable logistic regression models were used to establish independent associations between AMR frames and respondents’ attitudes and behaviours. Results: Participants who reviewed narratives that followed framing language guidelines were more likely to donate money or sign a petition, and rated narratives as more usable and important than participants who reviewed the control framing. Conclusions: While larger trials with more diverse participants are needed to confirm generalisability, these results suggest that applying Open Peer Review Reviewer Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 15 Wellcome Open Research 2021, 6:131 Last updated: 27 MAY 2021


Introduction
Antimicrobial resistance (AMR) occurs when infections and illnesses treated by antimicrobial drugs evolve and become resistant to the medicines designed to kill them. Key drivers of AMR are a vast increase in the selective pressure upon microbial infections caused by the rapidly growing use of antibiotics in healthcare and agriculture, and market incentive issues which mean that no new classes of antibiotics have been developed in thirty years (World Health Organisation, 2019). The O'Neill review, led by a former Chief Economist of Goldman Sachs, estimated around 700,000 AMR-related deaths per annum and, if no action is taken, AMR could cause the deaths of 10 million people per annum by 2050 (O'Neill, 2016). A range of policy actions have been proposed to mitigate this problem and, whilst there has been some new investment and action since the publication of the O'Neill review, the actions undertaken to date are widely viewed to be insufficient (Clift, 2019).
Framing and communicating the risks that are associated with AMR are well-established in a range of policy and academic fields. Although there have been national level AMR awareness campaigns, the issue remains relatively poorly known (Huttner et al., 2019). Encouragingly, this may mean that framing has more potential to be efficacious with respect to AMR than for more contentious issues.
However, there is little published evidence about the most appropriate means for presenting and communicating AMR. In this context, the Wellcome Trust commissioned communications research in 2018 on the optimal 'framing' of AMR. The research project was conducted by a commercial market research company in seven countries and utilised a methodology that included quantitative online message testing with over 12,000 people and qualitative research using 18 focus groups. This research recommendations were not validated using real-world data. Therefore, this paper provides evidence for the impact of different narratives of framing on AMR issues. We tested the hypothesis that the description of AMR developed using communications research or theoretical underpinnings will increase attitudinal response toward the perceived usability of the frame and the perceived importance of AMR.
The research question for this study was: do frames developed using de novo communications research or the behavioural evidence base change attitudinal and behavioural response, or increase understanding?

Study design and settings
Five concise AMR treatment frames were created -structured narratives that aim to communicate AMR risks to promote certain reader attitudinal shifts and/or actions. All were between 175 and 250 words and are reproduced in full in Figure 1 and Figure 2. The Control -labelled United Nations (UN) -was adapted from the AMR section of the website of the UN body, the Food and Agriculture Organisation. It was selected as a real-world example of current communication practice (Box 1).

Control. United Nations Frame (UN)
"Antimicrobial resistance (AMR) is a major global threat of increasing concern to human and animal health. It also has implications for both food safety and food security and the economic wellbeing of millions of farming households.
AMR refers to when microorganisms -bacteria, fungi, viruses, and parasites -evolve resistance to antimicrobial substances, like antibiotics. This can occur naturally through adaption to the environment, the pace of AMR's spread is now on the uptick due to inappropriate and excessive use of antimicrobials As a result of AMR, medicines that were once effective treatments for disease become less so -or even useless, leading to a reduced ability to successfully treat infections, increased mortality; more severe or prolonged illnesses; production losses in agriculture; and reduced livelihoods and food security.
The health consequences and economic costs of AMR are respectively estimated at 10 million human fatalities a year and a 2 to 3.5 percent decrease in global Gross Domestic Product (GDP), amounting to US$ 100 trillion by 2050. However, the full impact remains hard to estimate." The Control -labelled United Nations (UN) -was adapted from the AMR section of the website of the UN body, the Food and Agriculture Organisation. It was selected as a real-world example of current communication practice Treatment 1 (RRes, Box 2) was taken directly from the recommendation in the 'Reframing Resistance' report (Wellcome, 2019, p. 30 This variant is directly taken from the 'Reframing Resistance' report, which recommends the use of this narrative.   This variant adds messenger and episodic factors to the Control UN frame. As per Treatment 2, the messenger has the formal authority of a fictional senior Doctor from the World Health Organisation. The episodic focus was adapted from a case study on the 'Resistance Fighters' campaign website (https://antimicrobialresistancefighters.org/) and was selected as a caesarean section is a procedure with high recognition levels.

Treatment 4. UN+Gain (UN_G).
"Antimicrobial resistance (AMR) is a major global threat of increasing concern to human and animal health. It also has implications for both food safety and food security and the economic wellbeing of millions of farming households. Treatment 1 utilises a loss frame in relation to AMR. Following the adaptation of Prospect Theory to health information, a gain frame may be more appropriate for a prevention issue such as AMR -tested in this treatment.
The messenger and episodic treatments. As credible messengers and an episodic focus are well evidenced strength factors (Busby et al., 2018) two frames were developed to test their impact in this context. Neither the Control (UN) frame nor Treatment 1 (RRes) frames use a specific messenger or episodic focus, so it was straightforward to create new treatment frames incorporating these factors -Treatment 2 (RRes_ME) and Treatment 3 (UN_ME) -reproduced in Box 2. The messenger has the formal authority of a fictional senior Doctor from the World Health Organisation. The episodic focus was adapted from a case study about a woman who had suffered protracted complications due to AMR following a caesarean section, taken from the 'Resistance Fighters' campaign website 1 .
By adding two strength factors we do create a risk of conflating the impact of both. However, following precedent from the Behavioural Insight Team (Bholat et al., 2018) it seemed plausible that a greater effect size would be created by the additive combination of factors. Given limited budgets and a focus on practitioner applicability, studying the combination was judged to be of greater research relevance than testing solely one of the two factors.
The gain frame treatment. Given the strong theoretical basis and wide, if not definitive, evidence base for the impact of gain and loss frames (Rothman & Salovey, 1997), we decided to test this in the context of AMR. Following the adaptation of Prospect Theory to advocacy communication in Spence & Pidgeon (2010) above, a gain frame may be more appropriate for a prevention issue such as AMR. The control (UN) version uses a loss frame (this is assumed to be an intuitive expert decision). The fourth treatment, 'UN_G', adds a gain frame, which is predicted by Rothman & Salovey (1997) to increase behavioural responses.
As can be seen in Box 2, the 'Reframing Resistance' commercial research utilised a hybrid gain-loss frame -adapting Treatment 1 (RRes) would have been less straightforward.
A further test could have been to create two new treatment versions of RRes, one purely loss and one purely gain. However, moving from a hybrid frame to a loss or gain has less precedent in the literature, so it is harder to be confident that there could be a significant effect. Within the limited number of frames this study could test findings that were likely to increase response were prioritised as this would be of more practical use to AMR advocates.

Variables of interest
Framing studies use problematically inconsistent categorisations of persuasiveness -frequently conflating attitudes and intentions with behaviour (Gallagher & Updegraff, 2012). This study assesses attitudinal and behavioural measures to gain a more nuanced understanding frame persuasiveness. The primary dependent variable was behavioural response. Given the constraints of an online study, respondents were asked to take two behavioural actions. The first asked if they would sign a petition for or against greater government investment in AMR and the second whether they would donate a percentage of their fee for completing the survey to an AMR charity. This was recoded into a combined dummy variable where 0= no behavioural action and 1= one or more behavioural actions.
In addition, this study assesses three attitudinal response variables -Usability (HCMean), Importance (SUMean) and Understanding. Usability was assessed by asking two questions about the perceived clarity and helpfulness of the frame -for example, "How clear/helpful, or otherwise, do you find this as an explanation of what is happening with AMR? Please rate on a scale from 1 to 5, where 1 is extremely unclear/unhelpful, and 5 is extremely clear/helpful." Importance was assessed by asking about how urgent and significant respondents felt AMR to be -for example, "Based on this statement, how urgent a priority do you now believe AMR is? Select scale where 1 is not at all urgent and 5 is highly urgent". These were combined into two dummy variables, using the mean of each respondent's combined response. A 1-5 Likert was chosen as the literature suggests that it is less confusing, increasing both response rate and response quality whilst reducing respondents' "frustration level" (Babakus & Mangold 1992;Buttle, 1996).
One question was asked to gauge if the frames influenced respondents' understanding of AMR. The qualitative stage of the Reframing Resistance (Wellcome, 2019) project had identified 'widespread' misattribution of AMR -with some respondents believing that humans, not infections, became resistant to medicines. Respondents were asked to choose one of three options, for example, "Drug Resistant Infections occur when your body becomes resistant to medicines, meaning that those medicines no longer work as well on you", two of which were incorrect. This was recoded into a dummy variable where 0= incorrect and 1 = correct. Independent variables of gender, age, education level, media consumption and country of residence were also recorded. To test for variance in different English-speaking cultural contexts the survey was run in the US and the UK.

Sample size and participant recruitment
Given that such a large number of psychological studies are significantly underpowered (Maxwell et al., 2015) calculating sample size a priori is crucial. Based on comparable studies (Opiyo et al., 2013;Rosenbaum et al., 2010 andSantesso et al., 2015) that indicate small to medium effect sizes we estimated a mean difference of 10% between answers within the treatment and control groups. At an alpha of 0.05 and power of 0.9, with an estimated standard deviation of 0.3 in both samples, Stata calculates a sample size of 190 per group. Allowing for attrition/ineligibility at 5%, we aimed to recruit 200 people to each arm of the trial, so 1,000 per country.
The survey was developed in the Qualtrics online survey platform and carried out using the Prolific online respondent platform. UK and US based respondents participated in a between-subjects survey design on the 23rd October 2019 and were paid approximately US$1 for completion of the survey, which took an average of 6 minutes to complete.
Participants were randomly assigned to one of five conditions using the Qualtrics randomisation algorithm. The survey asked subjects to provide key demographic information, read one of five framed statements about AMR, and answer seven follow up questions about Usability, Importance, Understanding and Behaviour. The sample was split evenly between the UK and the US to enable analysis by country. To reflect different media consumption patterns in the US and the UK, two versions of the survey were created, each containing different lists of popular publications based on country of residence. The UK version of the survey is provided as extended data (Metcalfe et al., 2021b).
The survey excluded those who failed an attention test, reported residence in the wrong country, or never consumed any of the listed media channels. The rationale for the latter exclusion was twofold -it mirrored the selection criteria that Wellcome had used for the quantitative online message testing in 'Reframing Resistance', and the theory of change underpinning the work is media advocacy -the primary mechanism for distribution of framed messages would be through media channels, so focusing on users of those channels makes the findings more pertinent.
Ethical approval, consent and data collection All participants provided full and informed consent at the time the study is carried out via the start of the survey. Each participant was assigned a unique numeric identifier so no personal information could be accessed by researchers. Prior to recruitment, the London School of Economics (LSE) ethics checklist was completed. No significant issues arose and no further approval was required. The completed checklist is provided as extended data (Metcalfe et al., 2021b).

Statistical methods
Data was analysed in Stata 15 (the full Stata code used is included as extended data, (Metcalfe et al., 2021b)).
Descriptive statistics were collected and ranked by mean response. The differences between the control and four treatments were analysed via the creation of aggregated categories for the four dependent variables. Where dependent variables were assessed via a continuous variable the significance of differences between responses to the treatment frames were assessed using the Wilcoxon rank-sum method due to the nonparametric data skew. The remaining (dichotomous) dependent variables were assessed via a two-way proportions test. The alpha level used for the tests was 0.05 (Hair et al., 2006). Linear and logistic regression analysis, analysis of variance and post-hoc pairwise comparisons were also conducted to further interrogate results.
Critical values were adjusted to allow for family-wise error rates. Because several dependent variables were tested at the same time, the Holm-Bonferroni (H-B) adjustment was used -this method performs better than the Bonferroni correction procedure in terms of maintaining statistical power (Holm, 1979). This approach first ranks the p-values from smallest to largest. The alpha value 0.05 of the lower p-value is divided by the number of dependent variables. For example, the revised H-B significance value for the first ranked p-values follows: The second smallest p-value then has the alpha value divided by the number of tests minus 1. This process continued oneby-one with the progressively greater p-values until statistical significance was no longer met.

Results
Of 2,101 respondents, 1,934 participants met all eligibility criteria and were included in the analysis (Figure 1) (Metcalfe et al., 2021a). Dummy variables were created for gender, media consumption, country of residence and employment status. Sample characteristics are summarised below (see the linked Underlying Data for full details): -58% of the sample were female -33% consumed media from the provided UK or US lists every day -58% had a degree or postgraduate qualification -79% were working Collectively, the subjects: -Assessed the five frames as usable -rating clarity at a mean of 4.41 (SD 0.75) and helpfulness at a mean of 4.31 (SD 0.77) -Felt AMR was both a significant threat (mean 4.40, SD 0.67) and an urgent concern (mean attitude rating of 4.33, SD 0.72) -Understood the nature of the problem (91% gave a correct answer) -Almost two thirds (65%) undertook at least one of the prompted behaviours.
A preliminary step was to check if the frames had influenced the dependent variables in the expected direction. The results showed that, in the control group, the dependent variables Usability and Importance had means of 4.23 and 4.36 on a 1-5 scale. The dichotomous Understanding and Behavioural dependent variables had means of 0.91 and 0.59, respectively. The Understanding mean showed no clear difference between control and treatment frames. Among participants reading the treatment frames, Usability, Importance and Behaviour means were all higher than the control means. A Spearman's correlation between the dependent variables found a moderate correlation of 0.361 between Usability and Importance, and weak correlations between Behaviour and Usability (0.194), and Behaviour and Importance (0.261), indicating they were linked but distinctive variables (Akoglu, 2018).
Treatment 1 vs control Initial analysis investigated the relative performance of the Control frame (UN) and Treatment Frame 1 (RRes), representing 775 observations (Table 1). In terms of the continuous dependent variables, a skewness/kurtosis test for both Usability and Importance finds that the three p-values are all less than 0.001 -so we can conclude that the distribution of attendance is statistically different from a normal distribution. Both exhibited a negative skew (Usability -1.26, Importance -0.92) and were strongly platykurtic (Usability 4.99, Importance 4.44). Because the distribution is not normal, and the variable is continuous a Mann-Whitney test was used. For the dichotomous variables Understanding and Behaviour, a two-sample test of proportions was used.
The Usability and Behaviour variables show a clear difference in mean between Control (UN) and Treatment 1 (RRes), with significant z scores and p values, even when adjusted with the Holm-Bonferroni correction.
The influence of the frame upon respondent behaviour was assessed using the dependent variable Behaviour, where 0 = no action taken and 1= at least one action taken. A two-sample test of proportions shows a significant z score of -2.94 and a p-value of 0.003, we can reject the null hypothesis of no difference between groups for behavioural response.
For the continuous dependent variable Usability, the non-adjusted p value is <0.001, indicating significance. A simple analysis of variance (ANOVA) also finds an R squared of 0.02 -the proportion of variance explained by the treatment dummy variable is very small. We can reject the null hypothesis for this attitudinal response variable. The variable Importance also increases, but not to an extent where we can reject the null hypothesis.
The survey also contained one question to gauge Understanding. The mean for control (UN) was 0.91 and for the treatment (RRes) was 0.89 -there is no significant difference as reflected in a p-value of 0.34 and we cannot reject the null hypothesis that there is no difference in this attitudinal response variable between groups.
The 'Reframing Resistance' frame created a statistically significant increase in perceived Usability and Behaviour. There was no significant difference in Importance or Understanding between the control (UN) and treatment (RRes) frames.

Impact of gain frame: UN_G vs. control (UN)
Following the approach used for Treatment 1, the distribution is not normal for dependent variables Usability and Importance, so a Mann-Whitney test was used, and a two sample test of proportions was used for the variables Behaviour and Understanding. The results are described in Table 2.
Contrary to hypothesis, the addition of a Gain frame to control did not create a statistically significant increase in the dependent variable of Behaviour, or the secondary dependent variables of Importance, Usability and Understanding. Overall, we cannot reject the null hypothesis.

Impact of the messenger-episodic frames
Two treatment frames (UN_ME and RRes_ME) incorporating messenger and episodic variants were tested, one adapting the Control (UN) frame, and the second adapting the (RRes) frame. Augmenting the UN frame with messenger and episodic factors makes a statistically significant difference to behavioural measures versus the UN control. Secondary dependent variables of Usability and Importance also increased significantly. As in all treatments, Understanding does not alter significantly versus control. The RRes_ME results see significant increases in Behaviour and Usability, but not Importance or Understanding. The results are described in Table 3. Table 4 summarises the effects observed across all 4 experimental frames versus control.

Regression of dependent variables: Understanding and Importance
As the dependent variables Importance and Understanding are continuous it is possible to run linear regressions to model the relationships between variables. The model output for Importance is described in Figure 2.
F-statistics of 5.19 (Usability) and 13.00 (Importance), both with corresponding p-values of <0.001, indicate the models fit better than a model with no predictors. In terms of Usability, the independent variables age and media consumption are positively associated, showing high t and low p-values. This suggests that older respondents and those who read from the target Initial analysis investigated the relative performance of the Control frame (UN) and Treatment Frame 1 (RRes), representing 775 observations. The frame created a statistically significant increase in perceived Usability and Behaviour. There was no significant difference in Importance or Understanding between the control (UN) and treatment (RRes) frames.
media list daily were more likely to find the frames helpful and clear. Level of education, country of residence, gender and employment status are not significantly associated. For the dependent variable Importance, age and media were again significantly associated. In addition, gender exhibited a t statistic of 3.29 and a p value of 0.001, indicating that women were significantly more likely to think of AMR as a significant and urgent problem. Standardised regression coefficients suggest that the strongest predictor variable for Usability is media, where a one standard deviation increase leads to a 0.08 standard deviation increase in Usability. For Importance the strongest predictor variable is age, where a one standard deviation increase  Two treatment frames (UN_ME and RRes_ME) incorporating Messenger and Episodic variants were tested, one adapting the Control (UN) frame, and the second adapting the (RRes) frame. Augmenting the UN frame with Messenger and Episodic factors makes a statistically significant difference to behavioural measures versus the UN control. Secondary dependent variables of Usability and Importance also increased significantly. As in all treatments, Understanding does not alter significantly versus control. The RRes_ME results see significant increases in Behaviour and Usability, but not Importance or Understanding.
would lead to a predicted 0.12 standard deviation increase in Importance. However, adjusted R 2 values of 0.012 (Usability) and 0.03 (Importance) indicate that only around 1 and 3 percent of variability is explained by the factors included in this model.

Additional analysis of dependent variable: Behaviour
As the main dependent variable of interest, Behaviour, is dichotomous it is recommended to run a logistic regression rather than a typical linear regression model. The results are described in Figure 3.  This model confirms high z scores and low (non-adjusted) p values for treatments 1 (RRes), and 3 (UN_ME), and nonsignificant effects for treatments 2 (RRes_ME) and 4 (UN_G). The likelihood ratio chi-square of 30.64 with a p-value of <0.001 indicates the model as a whole fits significantly better than an empty model (i.e. a model with no predictors).
In terms of independent variables, country and age are significantly correlated with the dependent variable, showing high z scores and low p-values. In the case of age, the z score is negative -this suggests UK respondents and younger respondents were more likely to take action than US and older participants. The odds ratios show that the odds of taking one behavioural action are increased by a factor of 1.30 if you are from the UK. However, a pseudo R 2 of 0.012 indicates that this model is around 1 percent better than a null model.
The difference in behavioural response between the US and the UK warranted further investigation. Behavioural response was based on two questions -whether respondents were willing to sign a petition and/or make a small donation. A two sample test of proportion outputs found no significant difference between countries in terms of respondent likelihood to sign a petition (Pr(|Z| > |z|) = 0.269). The same test of likelihood to make a donation found UK respondents were significantly more likely to take action (Pr(|T| > |t|) = 0.000). The overall behavioural difference appears to have been driven by different levels of willingness to make a donation.
To create a pairwise analysis of all conditions for the dependent variable Behaviour the mean square error was calculated via an analysis of variance, and then a pairwise comparison was run. To reduce the probability of a Type 1 error a Tukey honestly significant difference post hoc test was used. The outputs are described in Figure 4.
This more conservative analysis also finds that condition 3 (UN_ME) and condition 1 (RRes) were significantly different from the control (UN) in terms of behavioural response. Treatments 2 (RRes_ME) and 4 (UN_G) do not demonstrate a significant difference from control. Comparisons between treatments one to four do not show significant differences, even though some t scores are relatively high.

Discussion
The study is the first randomised experiment of advocacy communication frames on AMR. The results suggest that the way in which this important issue is communicated can change the way in which the issue is both perceived and acted upon. Despite the significant commentary on AMR in the academic and policy literature, there has been surprisingly little systematic research about optimal ways to advocate for change, or optimal processes by which to derive strong frames. The findings are compatible with the existing framing literature in the finding that strong frames make a small but significant difference to attitudinal measures. The study also indicates that applying findings from the behavioural science evidence base and the communications research process can develop strong frames. The results of the study have implications for AMR advocates and indicate that a more ambitious research programme in this area would be beneficial.
This section considers the implications and potential drivers of the findings from the tests of the research hypotheses, and options for future research.

Treatment 1
The Reframing Resistance frame (RRes) had a significant impact versus control on behaviour. In terms of underlying mechanisms, processing fluency may have played a part. The UN control contains more complex language than the RRes treatment and more formal language such as 'food security' and 'Gross Domestic Product'. Post hoc testing of the UN and RRes frames found a Flesch reading ease score of 21.8 and 41.8, respectively. This places the UN frame score in the most difficult band of comprehension, and the RRes frame in the second most difficult. It is possible that the RRes treatments' primary advantage is in using more straightforward language. According to Oppenheimer (2006), "Simpler writing is easier to process, and studies have demonstrated that processing fluency is associated with a variety of positive dimensions." (p.140). If the consultative aspects of the market research process leads to frames that are easier to read, this could drive positive behavioural outcomes. This raises the testable possibility that even more impactful frames could be created when processing fluency is an explicit consideration in frame development.

Treatments 2-4
We predicted that building frames based on transferable findings from behavioural science (such as a gain frame) would increase behavioural response versus the UN control frame. The results here were mixed -this approach is clearly not a panacea, but has potential nonetheless.
The addition of a gain frame (UN_G) was not more effective than a loss frame (UN Control) across most measures. Given the extensive literature concerning gain and loss frames it's a disappointing finding, that emphasises the importance of testing applications of theory in context rather than assuming transferability. One explanation may be that AMR is a bridge too far from the theoretical underpinnings of prospect theory, and the findings do not apply to more abstract issues.
Intriguingly, the gain frame did drive a robust difference on one attitudinal dependent variable, Usability, but not on the other, Importance. Why did respondents regard a gain frame to be more helpful and clearer than a loss frame? One difference is that Usability questions are focused less on the issue of AMR and more towards the words themselves. Perhaps gain frames are more likeable than loss frames, even if they are not necessarily more motivating.
The addition of the messenger and episodic devices identified as significant factors in a meta analytic review focusing on behavioural impact (Gallagher & Updegraff, 2012) did drive a significant increase in response for the UN_ME frame on the dependent variables versus control. However, the RRes_ ME frame behavioural results were not statistically significant versus control. Results for RRes_ME were overall very similar to the RRes results. Given that the _ME additions to the control and RRes frames were identical, this difference is intriguing. It is possible that the RRes frame represents the upper bounds of the potential difference a frame can make, and significant improvements upon it are difficult. Alternatively, there is some evidence that a credible messenger is used as a heuristic to 'offload' the cognitive effort required to interpret information (Martin & Marks, 2019). Given that we know the UN frame is more difficult to process perhaps a credible source operated in a comparable way, whereas the more straightforward language of the RRes frame reduced the need for 'offloading'. Promisingly, this latter possibility could be tested experimentally utilising psycho-physiological measures (such as measuring microsaccades of pupil dilation) to assess cognitive load during response to frames (Clark et al., 2018).

Understanding the Understanding variable
This dependent variable was clearly non-discriminatory, and highlights a potentially problematic outcome of the market research process. Qualitative research led to a belief that understanding was a widespread problem, which is referenced extensively in the 'Reframing Resistance' report (Wellcome, 2019). Given that, on average, only 10% of respondents gave an incorrect answer, it is possible that the questions we used to test this understanding were not specific or challenging enough, or that the proximity of the question to a statement about AMR simply meant the correct answer was more salient. Alternatively, it is possible that public understanding of AMR is higher than the market research had indicated. This is potentially a very useful finding for advocates -indicating there may be less of a need to educate public audiences about the nature of AMR than the conclusions of the O'Neill (2016) review suggest. It would be straightforward to more extensively test understanding levels, for example building on the methodology of Bholat et al. (2018) in their study of Central Bank communication.

The influence of independent variables
The difference in overall behavioural response between US and UK respondents was interesting. Given that there was no significant difference between the countries in willingness to sign a petition, (or indeed, the other dependent variables), the higher UK willingness to donate drove the overall finding. There may be different cultural norms over donating to charity which drove this difference. The fact that older respondents were more likely to think the issue of AMR was Important, but younger respondents were more likely to take a behavioural action is similarly counterintuitive -potentially reflecting different behavioural motivation or different levels of comfort with online donation. Both findings highlight the challenges in comparing the results of using a donation question to gauge behavioural response.
These identifiable differences in the study, driven by independent variables, highlight the importance of developing a more nuanced understanding of the impact of frames and other nudges upon segments of target populations. Costa & Kahn (2013) show, for example, that republicans in the US do not respond to social norm messaging about electricity consumption in the same way as democrats. Here, the limited demographic and supplementary information of this study hinders detailed analysis. It is possible that factors such as a value orientations, personality characteristics and recent experience of being prescribed or taking antibiotics make a difference to response. Certainly, the low R 2 scores for all the dependent variable regression models indicate a significant amount of variability is not accounted for by the independent variables captured in this study.

Limitations
The study approach described above has several limitations. Not least, it has the inherent artificiality of an online experiment completed by a survey panel. Paid online survey respondents are an imperfect representation of target audiences, and whilst the selection criteria strove to maximise the relevance of the respondents it is by no means a perfect sample. Larger, more representative samples, additional countries and mixed methods including field experimentation would increase the robustness of the findings.

Accuracy of behavioural proxies
Whilst the measures of a signed petition and percentage donation were a sincere attempt to move beyond the more usual attitudinal and stated preference questions that dominate the literature and to better understand behavioural impact, it is by no means certain they meaningfully correspond with real-world behaviour. Future research would ideally be conducted in more authentic settings -for example, if different treatments were placed as 'advertorials' in online newspapers with an option to sign up to a real petition would a framing effect also be apparent? Moreover, this research design creates no understanding of the longitudinal impact, such as how quickly framing effects dissipate, or what level of information 'dosage'/frame repetition is optimal.
Underlying mechanisms and treatment comparability Ideally, this study would have also explored underlying mechanisms behind frame strength. As Aarøe (2011) points out, "an impressive body of research has investigated framing effects, we still only have very limited knowledge of the factors that shape frame strength" (p. 207). The transferability of findings is enhanced by a greater understanding of psychological mechanisms, and further research around framing AMR would also ideally explore considerations such as the role of self-efficacy or processing fluency in creating frame strength.
It could be argued that the UN version is at an inherent disadvantage compared to the new treatments which have new forms of information such as additional narrative. The UN version may also have been designed for use by a variety of audiences -the public, industry and policymakers -rather than being optimised for any particular audience or function. However, the control is a real-world example that was authored with at least implicit assumptions about utility. Therefore, following the logic of Thaler & Sunstein (2003), it would be perverse to argue that explicitly considering choice architecture invalidates any improvement in the treatment conditions.

Generalisability beyond study cohort
Participants were recruited to broadly reflect the audience that the original Reframing Resistance study envisaged as likely recipients of messaging for media advocacy. This, together with the biases resulting from the use of an online platform to perform the testing, mean that it cannot be assumed that the findings would automatically generalise to other populations. The theories that inform the work would suggest broad applicability, especially in Western settings -but the extent to which the same principles of framing can be applied across languages and cultures remains to be seen.

Future studies
This study raises a number of further potential research questions. For AMR advocates, the real-world relevance of the findings needs to be tested. Given that some AMR frames had an effect in a relatively large online panel survey, can this finding be replicated in a real-life setting? Kahan (2013), argues eloquently for the need for climate communications to be evidence based 'all the way down' and for communicators and scholars to work more effectively together. We believe this argument holds with respect to the lower profile issue of AMR. Whilst the work on using social norms to reduce unneeded frontline prescribing in the UK (Hallsworth et al., 2016) is useful, there is a need for more, and more ambitious, work considering AMR. In particular, increasing our understanding of how to optimise the strength of effect and persistence of framing, alongside the influence of cultural heterogeneity, values, experience and other predictors, would enable progressively more targeted and effective communication.
A range of broader questions for anyone releasing communications are also raised. Should organisations and agencies evolve their processes? Are there ways to optimise and predict the impacts of different frames more quickly? The potential for better empirical results is exciting. For scholars, potential theoretical questions include building our understanding of the underpinning mechanisms behind strong and weak frames, exploring the constraints of prospect theory and utilising the opportunities that new technology presents to test at speed and at scale. And, given the well-documented issue of positive result publication bias (Olson et al., 2002), caution and studying the replicability of findings is advisable On a cautionary note, the absolute difference in the means of response between the treatments and control across Usability, Importance and Behaviour were all relatively small. For example, the absolute difference in behavioural response between control and the strongest treatment (UN_ME) was only 19%. This change is welcome, and could plausibly make a difference to advocacy outcomes, but it is not in and of itself sufficient. Addressing systemic issues like AMR requires a multifaceted response, and we hope this study plays a small part in contributing to that.

Conclusion
AMR is a significant global health challenge for policymakers. There is a need for approaches that effectively, and cost-effectively, support advocates. This experiment shows that strong frames can make a small but significant difference to changing people's attitudes about AMR and, potentially, their behaviour. Whilst extensively used in political communication and health promotion, perhaps the power of framing could be applied more systematically outside of these fields.
The finding that the Reframing Resistance project created a frame that changed behaviour as well as attitudes is noteworthy. A very different, less labour intensive, process of developing frames by transferring learnings from the evidence base also created one comparably strong frame. This indicates an alternative to common practice may be of use, opening up a new option for practitioners seeking to create strong frames. The fact that two other frames developed in this way were not significant improvements highlights that evidence is only sometimes transferable and reinforces the value of systematic, randomised testing in framing research. As Sally Davies, the UK Special Envoy on AMR, says in the preface of the Reframing Resistance report: "By basing our communications in evidence -as we would be required to do for any other intervention we develop -we can better unlock the huge potential we have as advocates to more effectively galvanise support for the antimicrobial resistance cause and stimulate action." (Wellcome, 2019, p. 4) There is a clear opportunity to expand research in this area, for example by better understanding the mechanisms underpinning framing effects in this context, exploring the replicability of these findings in different countries, assessing longitudinal impact or assessing the impact of different AMR frames upon real world behaviour.