Non-native fold of the putative VPS39 zinc finger domain [version 1; peer review: 2 approved]

Background: The multi-subunit homotypic fusion and vacuole protein sorting (HOPS) membrane-tethering complex is involved in regulating the fusion of late endosomes and autophagosomes with lysosomes in eukaryotes. The C-terminal regions of several HOPS components have been shown to be required for correct complex assembly, including the C-terminal really interesting new gene (RING) zinc finger domains of HOPS components VPS18 and VPS41. We sought to structurally characterise the putative C-terminal zinc finger domain of VPS39, which we hypothesised may be important for binding of VPS39 to cellular partners or to other HOPS components. Methods: We recombinantly expressed, purified and solved the crystal structure of the proposed zinc-binding region of VPS39. Results: In the structure, this region forms an anti-parallel β-hairpin that is incorporated into a homotetrameric eight-stranded β-barrel. However, the fold is stabilised by coordination of zinc ions by residues from the purification tag and an intramolecular disulphide bond between two predicted zinc ligands. Conclusions: We solved the structure of the VPS39 C-terminal domain adopting a non-native fold. Our work highlights the risk of non-native folds when purifying small zinc-containing domains with hexahistidine tags. However, the non-native structure we observe may have implications for rational protein design.


Introduction
Eukaryotic cells use an interconnected system of membranebound compartments to partition intracellular space, allowing a multitude of biological reactions to proceed simultaneously in distinct chemical environments. The primary carriers of macromolecules between these compartments are vesicles, which bud from donor membranes in a cargo-dependent manner before fusing with an acceptor membrane at the destination compartment. Membrane fusion in the endomembrane system is critically dependent on SNARE (soluble N-ethylmaleimide sensitive factor attachment protein receptor) proteins, the co-folding of which on opposing membranes provides the energy for membrane bilayer mixing and thus vesicle fusion 1 . SNARE activity is tightly regulated by both Sec1/Munc18 family proteins, which bind directly to SNAREs, and by multi-protein 'tethering' complexes that bring vesicles into close apposition to allow the physical contact of SNARE proteins on opposing membranes 2 . The conserved multi-subunit tethering complexes CORVET (class C core vacuole/endosome tethering) and HOPS (homotypic fusion and vacuole protein sorting) combine both of these activities by incorporating the Sec1/Munc18 family protein VPS33A 3-5 . CORVET mediates homotypic fusion of early endosomes 6 , while HOPS mediates heterotypic fusion of late endosomes with lysosomes 4,5 and autophagosomes with lysosomes 7-9 .
The human CORVET and HOPS complexes share four conserved core subunits (VPS11, VPS16, VPS18, VPS33A), known collectively as the class C core 3,10 . Two additional, unique subunits direct each complex to its respective membrane target; VPS8 and TRAP1 direct CORVET to Rab5-positive membranes 6,11 , while VPS41 and VPS39 direct HOPS to Rab7-positive membranes 12,13 . Previous studies using truncation mapping have highlighted the importance of the C-terminal regions of HOPS components in assembly of the HOPS complex 3,14-16 . Recruitment of VPS41 to the class C core is facilitated by the C-terminal RING (really interesting new gene) domains of VPS18 and VPS41, which interact directly 15 . RING domains are a type of zinc finger, with an eight-residue motif containing six or seven cysteine residues and one or two histidine residues that coordinate two zinc ions [17][18][19] . RING domains may be involved in protein-protein, protein-lipid or protein-nucleic acid interactions, and have a wide variety of cellular functions [17][18][19] .
The C terminus of VPS39 contains a putative zinc finger domain 15 (Figure 1A), the closest homologue of which is the zinc finger domain of Saccharomyces cerevisiae protein Pcf11 ( Figure 1B) 20 . This putative VPS39 zinc finger domain is much shorter than those of VPS18 and VPS41, and is predicted to bind only one zinc ion via four ligands 15 . Given that VPS41 is recruited to the class C core by an interaction between two zinc finger domains 14 , and that the C-terminal region of VPS39 is required for its interaction with VPS11 16 , we hypothesised that the putative VPS39 C-terminal zinc finger domain may be required for its incorporation into the HOPS complex or for binding other cellular partners.
There is currently no high-resolution structural information available for any region of human VPS39, nor its yeast homologue vps39 (a.k.a. vam6). An atomic-resolution structure of the putative VPS39 zinc finger domain may further our understanding of HOPS complex assembly and function. We solved the structure of crystals formed by the VPS39 zinc finger domain to 2.9 Å resolution, but observed that the protein had adopted a non-native fold mediated by interactions between zinc ions and the purification tag.

Protein expression and purification
Residues 840-875 of human VPS39 isoform 2 (UniProt ID Q96JC1-2), corresponding to the putative C-terminal zinc finger domain, were cloned into pOPTH, (derived from pOPT 21 ), with an N-terminal MetHis 6 purification tag and expressed in Escherichia coli strain BL21(DE3) pLysS. Bacteria were cultured in 2×TY medium, recombinant proteins being expressed overnight at 22°C following addition of 0.4 mM isopropyl β-d-1-thiogalactopyranoside. Cultures were harvested by centrifugation at 5000×g for 15 min and cell pellets were stored at -80°C.
X-ray crystallography VPS39(840-875) was crystallised in sitting drops by mixing 200 nL of 19.4 mg/mL protein in SEC buffer with 200 nL of reservoir solution (100 mM HEPES pH 7.5, 200 mM ammonium acetate, 45% (v/v) 2-methyl-2,4-pentanediol (MPD)) and equilibrating against 80 μL of reservoir at 20°C for 30 months. The VPS39 crystal was cryo-cooled by plunging into liquid nitrogen, no cryopreservant being added as the high concentration of MPD in the reservoir solution was predicted to provide sufficient cryoprotection. Diffraction data were recorded at 100 K on a Pilatus 6M-F detector (Dectris) at Diamond Light Source beamline I04. Data were collected in three sweeps, as shown in Table 1.
Images were processed using DIALS version 1.14.13 23 then CCP4 suite version 7.0.078 24 programs POINTLESS version 1.11.21 25 and AIMLESS version 0.7.4 26 as implemented by the xia2 version 0.5.902 data processing pipeline 27 . Data collection statistics are shown in Table 2. Two-wavelength multiple anomalous dispersion analysis was performed using the CCP4 suite version 7.1.001 24 CRANK2 version 2.0.229 automated experimental phasing pipeline 28 , with substructure determination performed with SHELXD version 2019/1 29 , density modification performed with Parrot version 0.8 30 , and iterative model building and refinement performed with Buccaneer version 1.1 31,32 and Refmac5 version 5.8.0258 33 . Cycles of iterative manual building with COOT version 0.8.9 34 and TLS plus positional refinement using Refmac5 version 5.8.0258 33 with local non-crystallographic symmetry (NCS) restraints were initially performed using the high-energy remote wavelength dataset (Table 2). Building was assisted by the use of real-time molecular dynamics-assisted model building and map fitting with ISOLDE version 1.0b3 35 . To ameliorate radiation damage evident in the structure, later stages of refinement were performed using the first 300 frames of the second peak wavelength dataset (Peak 2; Table 1), processed using xia2 as above with the same set of reflections kept 'free' for cross-validation 36 . Final cycles of refinement were performed using autoBUSTER version 2.10.3 37 with local NCS restraints and bond length/ angle restraints for zinc ligands to ensure chemically-plausible zinc coordination 38 . The quality of the model was monitored throughout refinement using MolProbity version 4.5.1 39 and the validation tools in COOT version 0.8.9 34 . Refinement statistics are shown in Table 2. Molecular images were produced in PyMOL 2.4.0a0 Open-Source 40 and figures were composed in Inkscape version 1.0 41 . VPS39 C-terminal domain residues predicted to bind zinc were identified via generation of a homology model using I-TASSER version 5.1 42 with the structure of S. cerevisiae Pcf11 (PDB ID: 2NAX) 20 as the template.

Results
The C-terminal region of human VPS39 contains a putative zinc finger domain (residues 840-875, Figure 1A) with four predicted zinc-binding residues (Cys841, Cys844, His863, Cys866). These residues are predicted to coordinate a single zinc ion based on homology to the zinc finger domain of S. cerevisiae protein Pcf11 ( Figure 1B). The coordinates for this theoretical model are available (see Underlying data) 43 .
The VPS39 C-terminal domain was expressed with an N-terminal His 6 tag in E. coli and purified using nickel affinity capture followed by SEC. The protein eluted from SEC as a single, symmetrical peak near the end of the elution profile ( Figure 1C), consistent with expectations for a small folded protein domain. Analysis of the eluted fractions by SDS-PAGE showed a single predominant band that migrated as would be expected for the VPS39 zinc finger domain (5.1 kDa; Figure 1C), with a much less intense band at higher apparent molecular mass that was presumed to be a small amount of SDS-resistant VPS39 dimer.  The protein was concentrated and sparse matrix crystallisation screening was performed, but no crystals were obtained in the following six weeks. Approximately 30 months later, the crystallisation trays were re-inspected prior to disposal and a single crystal was observed ( Figure 1D). This crystal was harvested and diffraction data were recorded at two wavelengths (Table 2), allowing the structure of the VPS39 zinc finger domain to be solved using anomalous dispersion signal from the incorporated zinc ions. The model was initially refined against the highenergy data, but later stages of the refinement proved challenging because map features were indistinct and loop density was poor. We were concerned that intense X-ray exposure during data collection at the peak wavelength, where the zinc ions would have a large X-ray absorption cross-section 44 , may have caused radiation damage. The final stages of refinement were thus performed using data recorded in the first 300 frames of the second sweep at the peak wavelength (Table 1 and  Table 2), which represented the best compromise between total X-ray exposure/damage and data redundancy/resolution. The structure was refined to 2.90 Å resolution with residuals R = 0.238, R free = 0.269 and good stereochemistry, with an overall MolProbity score 39 of 2.05 ( Table 2). The structure is available under PDB ID: 6ZE9; raw diffraction images, crystallographic datasets and X-ray fluorescence scans are available (see Underlying data) 45 .
The asymmetric unit contains three copies of the VPS39 C-terminal domain: two full-length copies (residues 840-875; purple and teal in Figure 2A) and a third copy spanning residues 840-869 (blue in Figure 2A). The remaining C-terminal residues of the third copy are absent from the electron density and presumably disordered. Each copy of the VPS39 C-terminal domain forms an antiparallel β-hairpin, with residues 849-860 forming a loop linking the two β-strands ( Figure 2A). Strikingly, the VPS39 C-terminal domains are all organised around crystallographic symmetry axes such that they form eight-stranded β-barrels ( Figure 2B). There are two distinct homotetramers formed: the first comprises two NCS-related chains that interact with two additional chains that are related by crystallographic two-fold rotational symmetry ( Figure 2C), while the second homotetramer is formed by a single VPS39 C-terminal domain interacting with three additional chains that are related by two orthogonal two-fold crystallographic symmetry axes ( Figure 2D).
The asymmetric unit contains three zinc ions, consistent with the four predicted zinc ligands in each VPS39 copy based on homology to Pcf11 ( Figure 1B). All zinc ions have tetrahedral geometry. However, only one of the predicted zinc ligands (Cys844) is involved in zinc ion coordination ( Figure 2E). Of the remaining predicted zinc ligands, Cys841 and Cys866 had become oxidised to form an intramolecular disulphide bond in each VPS39 molecule ( Figure 2F) and the final predicted ligand (His863) is not in close proximity to the zinc ions. Instead, the remaining zinc ligands are provided by two histidine side chains from the MetHis 6 purification tag (His-3 and His-1) and the terminal carboxylate group of the polypeptide chain (Thr875) or a water molecule ( Figure 2G). As two of the ligands for each zinc ion derive from the affinity purification tag and the fold of the VPS39 C-terminal domain that we observe differs significantly from that of the closest sequence homologue (compare Figure 1B and Figure 2A), we conclude that the observed fold is non-native.

Discussion
We present the crystal structure of the human VPS39 zinc finger domain in a non-native fold. In the structure, three copies of the VPS39 C-terminal domain in the asymmetric unit ( Figure 2A) combine with symmetry-related chains to form two similar, homotetrameric, eight-stranded β-barrels ( Figure 2C, D).
In each copy of VPS39, two of the residues predicted to bind zinc ions (Cys844 and Cys866; Figure 2E) instead form intramolecular disulphide bonds ( Figure 2F), with the remaining zinc ligands provided by side chains from the N-terminal His 6 purification tag and the carboxylate group of the polypeptide chain or a water molecule ( Figure 2G).
Structural characterisation of VPS39 was undertaken to complement a yeast two-hybrid screen of HOPS component zinc finger domains, including the putative VPS39 zinc finger domain, with the aim of identifying cellular binding proteins 15 . However, as pull-down experiments failed to validate any of the potential interactions that were tested, structural characterisation of the VPS39 C-terminal domain was not actively pursued. After 30 months, as the crystallisation trials were being discarded, a single VPS39 C-terminal domain crystal was identified and used for successful structure determination. It seems very likely that the non-native fold that we observed arose from re-folding of the purified VPS39 C-terminal domain during the extended crystallisation experiment. The elution of freshly purified VPS39 C-terminal domain from SEC ( Figure 1C) was consistent with this small protein being monomeric, whereas the β-barrels of VPS39 in the crystal structure would be likely to elute much earlier, although we concede that formation of a β-barrel fold from the outset remains possible.
Refolding of the VPS39 C-terminal domain to form the observed β-barrels is likely to have been promoted via the concerted actions of zinc binding by the purification tag, disulphide bond formation and formation of β-sheets with unsatisfied backbone hydrogen bonds. The histidine side chains from the MetHis 6 purification tag could have competed with Cys841 and Cys866 for coordination of the zinc ions, thereby liberating the side chains of these two cysteine residues. While the VPS39 C-terminal domain was purified under reducing conditions (the SEC buffer being supplemented with 1 mM DTT), it is likely that the contents of the crystallisation drops became oxidised during their extended incubation. The liberated cysteine side chains may thus have formed the observed intramolecular disulphide bond, prohibiting them from competing with the MetHis 6 tag side chains for re-binding to the zinc ion. Either or both molecular rearrangements could have promoted re-folding of the protein backbone to adopt the extended β-hairpin fold observed in this structure. The refolded VPS39 β-sheets would have unsatisfied backbone hydrogen bonds, which could have promoted similar refolding of additional VPS39 molecules (akin to nucleation of amyloid fibrils). Such stimulated refolding could promote further exchange of zinc ligands and disulphide bond formation, acting as a ratchet to increase the pool of refolded VPS39 for crystallisation. The covalent interaction between β-barrels, mediated by the carboxy terminus of the polypeptide binding to the zinc ions, would have promoted stability of the crystal once nucleated.
While the structure presented here does not provide biological insight into the organisation or function of the putative VPS39 C-terminal zinc finger domain, there are still useful lessons to be learned. Firstly, nickel-affinity chromatography should be used with caution when purifying zinc-binding proteins as the similar chemical properties of zinc and nickel can lead to competition between purification tag residues and native zinc ligands for zinc ions. If this purification strategy is used, constructs should be engineered to include a protease cleavage site that can be used to remove the purification tag before downstream applications, particularly those involving long incubations such as crystallisation. We have previously reported structures where purification tag residues give rise to folding artefacts 46 and where metal ions help mediate non-natural 'swapped' β-strand topologies of crystallised molecules 47 . While His 6 tags are generally benign for crystallisation and may indeed be beneficial in some cases 48 , caution should be exercised when using them to purify small zinc-containing domains.
The non-native β-barrel fold of the VPS39 C-terminal domain we observe here highlights the power of metal ion coordination to strongly promote the stable (re)folding of proteins 49 , especially given the simple sequence requirements for efficient zinc binding (cysteine and histidine side chains or carboxylate groups). As a result, it is not uncommon for such features to arise spontaneously 50,51 , as has been previously noted in studies on directed protein evolution. Small zinc finger domains are often highly thermostable and tolerant to sequence changes outside of the zinc ligands 52 , which has led to their use as scaffolds for modular protein design 53-55 . Novel, non-native, metal ion-coordinating folds such as the VPS39 fold reported in this work are potentially less likely to interact with off-target cellular components when used as biologics 56 . The non-native fold of the VPS39 C-terminal domain presented here therefore expands the number of protein scaffolds available for rational therapeutic design. This project contains atomic coordinates for the theoretical model of the VPS39 zinc finger domain shown in Figure 1B. This article reports a non-native X-ray crystallography structure of the small C-terminal zinc finger domain of VPS39, which is hypothesized to bind HOPS tethering complex subunits or other cellular partners. The observed structure adopts an antiparallel b-hairpin structure that in turn forms eight-stranded b-barrels in the crystal. The authors anticipated four residues (Cys841, Cys844, His863, Cys866) would coordinate the zinc ion, but instead the zinc ion is coordinated by noncanonical residues, including two histidines from the His 6 affinity tag. The structure is further stabilized by an intramolecular disulfide bond formed between Cys841 and Cys866. The authors think refolding of the VPS39 domain happened during crystallization; they suggest Cys residues oxidized over an extended 30-month incubation period, which prevented them from competing with the MetHis 6 to bind zinc ion. The authors propose His 6 affinity tags should be used with caution in zinc-binding proteins, and suggest that non-native folds may be promising scaffolds in therapeutic protein design. This study is well-documented and well-presented. We suggest clarification regarding a few minor points in the final version.

Figure 1
We suggest the authors include a sequence alignment between Pcf11 and VPS39 zinc finger domains with key Zn-binding residues marked, since authors use Pcf11 as an expected model for VPS39. This would help the reader follow their logic with a clear visual representation of Cys/His residues predicted to bind zinc. 1.
In Figure 1C, the higher bands were presumed to be an SDS-resistant VPS39 dimer. It's possible both native and non-native folds already existed at that point. Were all fractions used for crystallization trials, or did the authors use only fractions containing the single band? 2.
Which column was used in Figure 1C? Are standards available to support VPS39 molecular 3.
The authors think refolding of the VPS39 domain happened during crystallization; they suggest Cys residues oxidized over an extended 30-month incubation period, which prevented them from competing with the MetHis6 to bind zinc ion. The authors propose His6 affinity tags should be used with caution in zinc-binding proteins, and suggest that non-native folds may be promising scaffolds in therapeutic protein design. This study is well-documented and well-presented. We suggest clarification regarding a few minor points in the final version. Figure 1 We suggest the authors include a sequence alignment between Pcf11 and VPS39 zinc finger domains with key Zn-binding residues marked, since authors use Pcf11 as an expected model for VPS39. This would help the reader follow their logic with a clear visual representation of Cys/His residues predicted to bind zinc. We thank the reviewer for this helpful suggestion. We have included the relevant alignment as Figure 1B. In Figure 1C, the higher bands were presumed to be an SDS-resistant VPS39 dimer. It's possible both native and non-native folds already existed at that point. Were all fractions used for crystallization trials, or did the authors use only fractions containing the single band?
We have marked the fractions that were pooled, concentrated and used for crystallisation trials on the inset of Figure 1E. We agree that we can't exclude the possibility that the nonnative fold was present in the initial purified sample, and that the higher molecular weight band corresponded to an aberrantly folded protein. We have expanded the final sentence of the second paragraph of the discussion to explicitly mention this possibility. Figure 1C? Are standards available to support VPS39 molecular mass? We confirm that the chromatogram in Figure 1E is from a preparative Superdex 75 16/600 size-exclusion column. We did not calibrate this column using molecular mass standards when performing the purification. We note that the VPS39 C-terminal domain peak eluted between 94 and 102 mL while the buffer components eluted at approximately 110 mL (small peak evident in Figure 1E), consistent with a small folded domain. However, as stated in the discussion, we can't discount the possibility that at least some of the protein formed higherorder oligomers when purified.

Methods
The authors mentioned snap-freezing purified VPS39 for storage, but did not specify whether fresh or frozen protein was used in crystallization trials. Could the freezethaw cycle affect the protein fold? The authors might comment on whether fresh or frozen protein was used for crystallization set up. The protein used for crystallisation was freshly purified: following SEC purification the sample was stored overnight at 4°C, and the protein was concentrated and used for crystallisation the following day without being snap-frozen. We apologise for this ambiguity. We have updated the penultimate sentence of the second paragraph of the methods section to explicitly state how the sample used for crystallisation was handled, as follows: "After storage overnight at 4°C, purified VPS39 was concentrated using 3 kDa nominal molecular weight cut-off centrifugal concentrators (Millipore) and subjected to crystallisation trials as described below." The used construct contains only 35 residues, which is extremely small. Although it contains all predicted zinc-coordinating sites, it might not comprise the full, stable domain. ○ A structure-based sequence alignment of the predicted structure of Vps39 CTD (Fig 1B), the observed structure and the structure of Pcf11 zinc finger would be interesting in this context.

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Partly month) crystallization experiment and the presence of a Met-6xHis affinity tag led to refolding of the protein into a non-native structure in the crystallization set-up. They propose that the accidentally created beta-barrel might be a useful template for future protein engineering applications. The results of the study are well documented and the conclusions are clearly presented. I suggest including one additional aspect in the manuscript, which is to discuss if the recombinant protein might not have been properly folded from the beginning. The following points should be considered: SDS-PAGE analysis of SEC fractions showed the presence of an SDS-resistant species of Vps39 CTD during purification. This might arise from the beta-structures observed in the crystal already being present at this point. We agree -we cannot unambiguously assert that all the protein adopted the native conformation at the time of initial purification. We have updated the final sentence of the second paragraph of the discussion to more clearly state this point: "The elution of freshly purified VPS39 C-terminal domain from SEC ( Figure 1C) was consistent with this small protein being monomeric, whereas the tetrameric β-barrels of VPS39 in the crystal structure would be likely to elute much earlier, although we concede that formation of a β-barrel fold from the outset remains possible and that the higher molecular mass band observed in SDS-PAGE may represent SDS-resistant β-barrels or other aberrantly folded forms of the VPS39 Cterminal domain." The used construct contains only 35 residues, which is extremely small. Although it contains all predicted zinc-coordinating sites, it might not comprise the full, stable domain. We agree that this predicted domain is small, but its small size is not without precedent. . While we can't unambiguously assert that our choice of domain boundary was correct given the non-native fold we observed, we believe the size of our construct is consistent with expectations for an isolated zinc-binding domain.
A structure-based sequence alignment of the predicted structure of Vps39 CTD (Fig  1B), the observed structure and the structure of Pcf11 zinc finger would be interesting in this context. We thank the reviewer for suggesting that we include a sequence alignment of the Pcf11 and VPS39 zinc-binding domains, which we have included as Figure 1B. We have also included a figure of the Pcf11 zinc finger domain, highlighting the region that is not conserved between Pcf11 and VPS39 Figure 1C. Given the divergence in the predicted versus observed folds of the VPS39 C-terminal domain, we fear that a structure-based sequence alignment would be difficult and potentially uninformative. We have thus not included a second sequence for the VPS39 C-terminal domain, based on the non-native βbarrel fold, in this alignment, but note that a comparison of zinc ligands is presented in Figure 2E.

Minor point: the spelling of TRIS should be corrected
We thank the reviewer for pointing out the correct IUPAC abbreviation for TRIS. We have changed it throughout the manuscript.