Integrative vectors for regulated expression of SARS-CoV-2 proteins implicated in RNA metabolism

Infection with SARS-CoV-2 is expected to result in substantial reorganization of host cell RNA metabolism. We identified 14 proteins that were predicted to interact with host RNAs or RNA binding proteins, based on published data for SARS-CoV and SARS-CoV-2. Here, we describe a series of affinity-tagged and codon-optimized expression constructs for each of these 14 proteins. Each viral gene was separately tagged at the N-terminus with Flag-His 8, the C-terminus with His 8-Flag, or left untagged. The resulting constructs were stably integrated into the HEK293 Flp-In T-REx genome. Each viral gene was expressed under the control of an inducible Tet-On promoter, allowing expression levels to be tuned to match physiological conditions during infection. Expression time courses were successfully generated for most of the fusion proteins and quantified by western blot. A few fusion proteins were poorly expressed, whereas others, including Nsp1, Nsp12, and N protein, were toxic unless care was taken to minimize background expression. All plasmids can be obtained from Addgene and cell lines are available. We anticipate that availability of these resources will facilitate a more detailed understanding of coronavirus molecular biology.


Introduction
SARS-CoV-2 is a large positive-sense, single-stranded RNA virus that encodes four structural proteins, several accessory proteins, and sixteen nonstructural proteins (nsp1-16) (Figure 1). The latter are mainly engaged in enzymatic activities important for translation and replication of the RNA genome. In addition, nonstructural proteins also target host RNA metabolism in order to manipulate cellular gene expression or facilitate immune evasion (Gordon et al., 2020;Thoms et al., 2020;Yuen et al., 2020). Understanding these pathways in greater detail will be an important step in the development of antiviral therapies.
Our lab has previously developed techniques to map the RNA-bound proteome, including crosslinking and analysis of cDNA (CRAC), to identify binding sites for proteins on RNA, and total RNA-associated proteome purification (TRAPP) to identify and quantify the RNA-bound proteome (Granneman et al., 2009;Shchepachev et al., 2019). Among other things, TRAPP has given insights into the mechanism of stress-induced translation shutdown in yeast (Bresson et al., 2020b). In the context of SARS-CoV-2, CRAC is expected to identify host RNAs that are targeted by viral factors, while TRAPP will reveal how host cell RNA-protein interactions are globally remodeled in response to specific viral proteins.
To facilitate application of these techniques to SARS-CoV-2, we generated a series of synthetic, codon-optimized constructs for 14 different viral proteins that are expected to interact with RNA or RNA binding proteins. To remove the need for error-prone PCR steps, we devised a cloning scheme in which a single synthetic construct could be used to generate untagged, N-, or C-terminally tagged versions of the protein. Using these vectors, we generated and tested a series of human cell lines with the viral open reading frames (ORFs) stably integrated, under the control of an inducible Tet-On promoter. We expect that broad availability of this collection will enable a deeper understanding of the coronavirus life cycle and its impact on host RNA biology.

Results and discussion
Design and cloning of viral expression constructs We selected 14 proteins for initial analysis, based on putative roles in viral or host RNA metabolism ( Figure 1 and Table 1). Two of the selected proteins, Nsp7 and Nsp8, reportedly form a stable heterodimer in vivo (Gao et al., 2020;Hillen et al., 2020;te Velthuis et al., 2012), so we also designed constructs in which Nsp7 and Nsp8 were expressed as a fusion protein, connected by a short, unstructured linker.
For cloning, we selected pcDNA5-FRT/TO (Thermo Fisher) as the backbone vector ( Figure 2A). This vector can be used for transient transfection or flippase (Flp) recombinase-mediated integration into the genome of cells with a pre-inserted Flp Recombination Target (FRT). It also carries a hygromycin resistance gene to allow selection for integration ( Figure 2). As host cells, we used HEK293 Flp-In T-REx cells (293FiTR), but other cell lines that carry an FRT site could also be used.
The expression levels of viral proteins vary substantially during infection, so we used constructs that, in addition to stably integrating, were expressed under the control of a tetracycline-regulated human cytomegalovirus (CMV)⁄TetO2 promoter, which is induced by the addition of doxycycline to the medium (Figure 2). Viral protein expression can then be titrated by varying either doxycycline concentration or induction time.
Using pcDNA5-FRT/TO as a starting point, we generated two additional parental vectors with pre-inserted tandem-affinity purification tags, either an N-terminal, FH-tag (FLAG-Ala 4 -His 8 ) or C-terminal HF-tag (His 8 -Ala 4 -FLAG) ( Figure 2A). We have recently shown that these tags work well for tandem affinity purification, including in the denaturing conditions used for CRAC (Bresson et al., 2020b).
The insert sequences were designed such that a single synthetic construct could be used to generate an untagged, or N-or C-terminal fusion protein ( Figure 2B). BamHI and EcoRV restriction sites were placed on either side of the open reading frame, together with an AvrII site overlapping the stop codon (important for C-terminal cloning, as discussed below).
Generating the untagged and N-terminal tagged constructs was straightforward. The synthetic constructs were cut with BamHI and EcoRV and ligated into pre-cut vector, either pcDNA5-FRT/TO (generating an untagged construct), or pcDNA5-FRT/TO-N-Flag-His8 (generating an N-terminally tagged construct) ( Figure 2C). The resulting plasmids were verified by colony PCR and Sanger sequencing.
Generating the C-terminal tagged construct required additional steps to remove the in-frame stop codon upstream of the EcoRV restriction site. Cleavage of the AvrII restriction site, which overlaps the stop codon in the synthetic constructs ( Figure 2B), left a 5′ overhang containing the stop codon. Subsequently, the overhang (and thus the stop codon) was removed by treatment with Mung Bean Nuclease, an ssDNA endonuclease ( Figure 2C). The resulting fragment possessed a blunt 3′ end, compatible with the EcoRV restriction site in the target vector. Subsequently, the 5′ end of the insert was prepared by digestion with BamHI, and the resulting fragment was ligated into pcDNA5-FRT/TO-C-His8-Flag pre-cut with BamHI and EcoRV.
In total, we generated 54 viral protein expression vectors, and three additional GFP expression vectors as controls. These constructs, together with the parental tagging vectors, are listed in Table 2.

Expression of fusion proteins
Each construct was introduced into Flp-In T-REx cells (Thermo Fisher) by transfection, followed by hygromycin treatment for 10-16 days to select for chromosomal integration ( Figure 2E). In the initial experiments, all constructs except the Nsp1 series, and the GC3 (high-expression) optimized versions of untagged N and FH-N protein yielded stable hygromycin-resistant cells (Table 2, column 4). Resistant clones were obtained for FH-N by co-transfecting this construct with a plasmid encoding a tetracycline repressor protein (pcDNA6/TR), to suppress possible high levels of FH-N expression at initial transfection. To allow for C-terminal cloning, an AvrII site was inserted such that it overlapped the stop codon (see text for details). In order to accommodate the AvrII site, an alanine residue was added to the end of each expression construct. The viral open reading frames (ORFs) were codon-optimized for moderate or high expression, and lacked BamHI, AvrII, and EcoRV sites.
(C) For untagged and N-terminal tagging, inserts were digested with BamHI and EcoRV and ligated directly into plasmid precut with the same enzymes. For C-terminal cloning, the inserts were first digested with AvrII, blunted with Mung Bean Nuclease, and then cut with BamHI. The resulting fragment was ligated into plasmid cut with BamHI and EcoRV. (D) Untagged, N-, and C-terminally tagged expression constructs. (E) Strategy for generating stable cell lines (see text for details). For expression analyses, doxycycline was added to the cell culture medium to a final concentration of 1 μg ml -1 , and timepoints were collected for analysis by western blot. A prominent cross-reacting band visible in all samples was used as a loading control for most analyses. The exceptions were Nsp15 and Nsp12, for which anti-M2-Flag and GAPDH were used, as they have migration positions close to the cross-reacting band. Representative data are shown in Figure 3B-C; all other western scans are shown in Figure 4. Quantitation is shown in Figure 5, and associated raw data are provided as Underlying data. Clear protein expression was observed for 29 of the 34 tagged constructs (Table 2, column 5). In general, C-terminal tagged proteins were more highly expressed than their N-terminally tagged counterparts.
As described above, we were initially unable to generate stable cell lines for any of the Nsp1 constructs. To confirm that Nsp1 could be expressed in cells, we transiently transfected each construct into 293FiTR cells, and confirmed protein expression by western blot. Only the C-terminally tagged Nsp1 showed robust expression ( Figure 6).
To allow absolute quantification of SARS-CoV-2 protein expression, we used the N-terminally tagged N protein (integrated from plasmid 22; moderate expression) as a reference standard. Cells containing integrated FH-N were treated with doxycycline for 0, 6, and 18 hours. Protein was extracted, separated by SDS-PAGE and analyzed using mass spectrometry with label-free quantification. To compare proteins, their abundance was expressed as a percentage of the total proteome. This value was calculated using the relative, intensity-based absolute quantification (riBAQ) score for each protein, which represents the iBAQ score for a given protein divided by the iBAQ scores for all proteins. After induction for six hours, N protein comprised 0.2% of the cellular proteome ( Figure 3A).     Because all of the tagged proteins possessed an identical FLAG epitope, aliquots of this 6h N protein sample could then be used as a reference standard on all subsequent western blots (e.g. lane 1 in Figure 3B), to allow similar abundance estimates for the other viral proteins (Figure 4; see Underlying data). This approach allows viral protein levels to be carefully titrated, so that protein induction approximately matches physiological conditions. This is important because different viral proteins show extreme differences in expression during infection. The nucleocapsid (N) protein can represent 2% of total protein, whereas the non-structural proteins may be 2 to 3 orders of magnitude less abundant (Finkel et al., 2020;Grenga et al., 2020). Moreover, all proteins will vary in their abundance throughout the course of infection.

Conclusions
There has been a large amount of research activity surrounding COVID-19 but, understandably, this has predominately focused on epidemiology and the development of anti-viral strategies. The fundamental biology of SARS-CoV-2 infection has been relatively less addressed, and we aim to help fill this gap. To facilitate this process, we report the construction of 57 expression vectors and cell lines. The plasmids can be obtained from Addgene and cell lines are available. We expect that these collections will be a valuable resource for future research into the mechanisms by which coronavirus exploits the genetic machinery of its host to facilitate its own replication.

Construction of expression vectors
All viral genes were cloned into each of three parental vectors: pcDNA5-FRT-TO (to generate an untagged version of the protein), pcDNA5-FRT-TO-N-Flag-His8 (N-terminal tag), and pcDNA5-FRT-TO-C-His8-Flag (C-terminal tag). The N-terminal tag consisted of a single Flag motif, a four-alanine spacer, eight consecutive histidine residues, and a short unstructured linker (DYKDDDDKAAAAHHHHHHHHGSG). The C-terminal tag was essentially the same but in reverse (SGGHHHHHHHHAAA ADYKDDDDK). Oligos used and sequences of fusion constructs are provided as Underlying data.

Construction of pcDNA5-FRT-TO-N-Flag-His8.
To generate the Flag-His8 tag, we first designed partially complementary DNA oligonucleotides (oSB707 and oSB708) containing the Flag-His sequence. These oligos contained 20 nucleotides of complementarity at their 3′ ends, and an additional 35-36 nucleotides at their 5′ ends. The oligos were annealed in a reaction consisting of 12.5 μM forward oligo and 12.5 μM reverse oligo in a 40 μL reaction volume. The hybridization reaction was initially incubated at 95°C for 6 min, and gradually decreased to 25°C at the rate of 1.33°C/min. Subsequently, the annealed oligos were incubated in the same buffer supplemented with 250 μM dNTPs and 5U Klenow exo-(NEB) in a 50μL reaction at 37°C to fill in the single-stranded regions. After 1 hour, the insert sequence was purified on a silica column (

Construction of pcDNA5-FRT-TO-C-His8-Flag.
The pcDNA5-FRT-TO-C-His8-Flag sequence was generated as described above for the N-terminal tagging vector, with the following changes: 1) the oligos used for hybridization were oSB709 and oSB710, and 2) the insert and pcDNA5-FRT-TO were digested with 40U of EcoRV-HF (NEB, Cat No. R3195) and 40U of XhoI (NEB, Cat No. R0146).

Cloning of untagged and N-terminally tagged viral genes.
pcDNA5-FRT-TO and pcDNA-FRT-TO-N-Flag-His8 were each digested in a 50 μL reaction consisting of 2 μg DNA, 1X Cutsmart buffer, 20U of BamHI-HF, and 20U of EcoRV-HF at 37°C for two hours. In parallel, 2 μg of plasmid (Kan R ) containing the desired viral gene was digested under identical conditions. All three reactions were purified using a PCR cleanup kit (QIAquick PCR Purification Kit, Qiagen, Cat No. 28106) Subsequently, the acceptor plasmids were phosphatase treated as described above and again purified using a PCR cleanup kit.
Digested vector and insert were ligated together in a reaction consisting of 40 ng vector, 120 ng insert, 50 mM Tris-HCl 7.5, 10 mM MgCl 2 , 1 mM ATP, 10 mM DTT, and 400 U of T4 DNA ligase (NEB) in a 10 reaction volume.

Cloning of C-terminally tagged viral genes.
To prepare the backbone, 2 μg of pcDNA5-FRT-TO-C-His8-Flag was digested with BamHI-HF and EcoRV-HF, as above, followed by DNA purification. To prepare viral gene inserts, 1 μg of pUC containing the relevant gene was initially digested with AvrII (NEB, Cat No. R0174S) in the same reaction conditions followed by purification. The 5' overhang, encoding the stop codon, was then removed by digestion with Mung Bean Nuclease (NEB, Cat No. M0250): 1 U of Mung Bean Nuclease was added to 1 μg of digested vector in 30 μl of 1x Cutsmart buffer, and incubated for 30 minutes at 30 °C. After purification, the final insert was produced by digestion with BamHI-HF. The insert was then ligated into the backbone after purification, following procedure described above with a 4:1 molar ratio of insert to backbone.
Cloning of eGFP controls. The eGFP insert was amplified using pEGFP-N2 (Clontech) as a template and the DNA oligonucleotides oAH211 and oAH212 (untagged and N-terminal cloning) or oAH211 and oAH213 (C-terminal cloning). Subsequently, the PCR-generated inserts were cloned into pcDNA5-FRT-TO, pcDNA-FRT-TO-N-Flag-His8, and pcDNA5-FRT-TO-C-His8-Flag as described above for the viral constructs.
The medium was replaced approximately five hours later to remove the transfection reagents. The next day, the cells were split to a 10 cm plate, and after an additional 24 hours, hygromycin B (150 μg/mL) and blasticidin S (15 μg/mL) were added to the medium. Stable integrants were selected over the course of 10-16 days, with medium replacement at regular intervals. Thereafter, stable cell lines were maintained in hygromycin B and blasticidin S.
For the FH-N constructs that initially yielded no colonies, transfection was repeated with the addition of pcDNA6/TR (Invitrogen, Cat No. V102520), encoding the tetracycline repressor protein: total transfected DNA was kept at 1 μg, with pcDNA5/FRT-TO, pcDNA6/TR and pOG44 used at a ratio of 1:4.5:4.5.

Induction tests
For induction tests, 1-2×10 5 cells were plated into six wells each of a 24-well plate. The following day, the cells were induced with 1 μg/mL of doxycycline, and harvested at varying timepoints by resuspending the cells in 50-100 μL of 1X Passive Lysis Buffer (Promega, Cat No. E1941). Inductions were staggered so that all timepoints could be harvested simultaneously. For the transient transfection experiments, cells were induced two hours post-transfection with 1 μg/mL of doxycycline. As with the stable cell lines, inductions were staggered and all timepoints were harvested at once.
For proteins larger than 20 kDa, cell lysates were resolved on 4-12% Bis-Tris gels, run at 180V for 1h in 1X MOPS (ThermoFisher,Cat No. NP0001). For smaller proteins, lysates were run for 25 min in MES buffer (Thermo Fisher,Cat No. NP0002). Proteins were wet transferred onto PVDF membrane (Millipore, Cat No. IPFL00010), blocked with 5% skimmed milk and probed successively with anti-Flag (Agilent, 200474-21; rat monoclonal antibody) and anti-Rat (Licor, 926-68076; goat polyclonal antibody). This anti-Flag antibody generated a prominent cross-reacting band at around 50 kDa which was used for loading normalization. For proteins running at the same size of the cross-reacting band (Nsp15 and Nsp12), an anti-M2-FLAG antibody (Sigma-Aldrich, F1804) was used in combination with anti-GAPDH (BioRad, VPA00187) as the loading control. For these, anti-Mouse (Licor, 926-68070) and anti-Rabbit (Licor, 926-32211) were then used. Protein bands were visualized and quantified by scanning in a Licor Odyssey CLX. Normalization was performed using the appropriate loading control and the N protein expression standard which was included on each gel.
Mass spectrometry HEK293 cells with stably integrated N-terminal tagged N protein (from plasmid 22) were induced with 1 μg/ml doxycyline for 0, 6, and 18 hours. Cells were harvested in lysis buffer containing 0.1% Rapigest (Waters, Cat No. 186001861) and sonicated (10 cycles of 30 seconds on, 30 seconds off in Bioruptor Pico, Diagenode). Approximately 30 μg of protein was resolved on a 4-20% Miniprotean TGX gel (Bio-Red, Cat No. 4561093), run in Tris-Glycine running buffer at 100 V. Subsequently, the gel was rinsed with water, stained for 1 hour with Imperial Protein Stain (Thermo Scientific, Cat No. 10006123), rinsed several times, and destained in water for three hours. Each lane was cut into four fractions and processed using in-gel digestion and the STAGE tip method, as previously described (Bresson et al., 2020b;Rappsilber et al., 2007).  (Bresson et al., 2020a).

Availability of materials
This project contains the following underlying data: - Table 3.xlsx (Raw data for Figures 3C and 5) - Table 4.xlsx (Mass-spectrometry data for N expression) - Table 5.xlsx (Oligonucleotides used in this work) - © 2020 Petit C. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Chad Petit
Department of Biochemistry and Molecular Genetics, University of Alabama at Birmingham, Birmingham, AL, USA The manuscript is well written with the development of the methodology clearly stated and presented to allow maximum impact for the scientific community. The etiological agent of the COVID-19 pandemic is the Severe Acute Respiratory Syndrome (SARS) coronavirus (CoV) 2. Unfortunately, the specific pathways and host-pathogen interactions that facilitate the viral lifecycle for this novel virus are largely not understood. The authors of this manuscript have selected 14 SARS-CoV-2 proteins that are expected to interact with RNA or RNA binding proteins for construction of reagents that can be used for analyzing their role in the coronavirus lifecycle. Briefly, numerous cell lines were established that express various forms (e.g. amino tagged, carboxyl tagged, etc) of each of the viral proteins. Furthermore, the expression of the viral proteins can be tuned to their relative expression levels during viral replication using different concentrations of doxycycline or incubation time.
The rationale for the development of these reagents is sound in that they could be used for future research to understand how these viral proteins affect the host cell. The description of the synthesis of the reagents is clear and provided at a level of details that would allow the replication of the method by other labs. The conclusions about the method and their relevance for future deployment in other research programs are more than adequately supported by the data presented in the manuscript.

Is the description of the method technically sound? Yes
Are sufficient details provided to allow replication of the method development and its use by others?