The University of Alabama at Birmingham has requested that the Research Collaboratory for Structural Bioinformatics Protein Data Bank remove certain protein structure files deposited by a former UAB employee. UAB also has identified nine publications related to the same protein structures that should be retracted from various scientific journals, and is making those journals aware of this matter.
Allegations of data fabrication and/or falsification were made concerning certain protein structures published by the former UAB employee. In accordance with UAB’s scientific integrity policy, and that of the Office of Research Integrity of the U.S. Department of Health & Human Services, UAB empanelled a committee of experts with no conflicting interests to investigate these allegations. After a thorough examination of the available data, which included a re-analysis of each structure alleged to have been fabricated, the committee found a preponderance of evidence that structures 1BEF, 1CMW, 1DF9/2QID, 1G40, 1G44, 1L6L, 2OU1, 1RID, 1Y8E, 2A01, and 2HR0 were more likely than not falsified and/or fabricated and recommended that they be removed from the public record. The former employee was H.M. Krishna Murthy, who was found by the Investigation Committee to be solely responsible for the fraudulent data.
“Scientific misconduct is absolutely unacceptable,” said UAB Scientific Integrity Officer Richard B. Marchase, Ph.D., vice president for Research and Economic Development. “It was important that the files be removed from the database and the articles be retracted to ensure that future research in the areas of macromolecular structure analysis and the function of proteins could continue uncompromised by faulty data.”
Structure Summaries (Protein Data Bank Codes)
1BEF is most likely a unique structure that is globally superimposed on structure 1JXP. 1BEF and 1JXP have the same general structure as well as the same crystallographic orientation and translation relative to the origin. However, the crystal forms of 1BEF and 1JXP are distinctly different, with unrelated space group and unit cells : 1BEF crystallized P21 (a = 48.4 Å, b = 62.4 Å, c = 39.6 Å, ? = 96.7°) while 1JXP crystallized in P6322 (a = b = 96.96 Å, c = 167.1 Å). No other example of such superposition can be found in the PDB. Furthermore, 1BEF appears to be a physically improbable structure, with 1) statistically anomalous geometry, 2) unrealistic electron density and thermal factors, 3) anomalous and unreasonable packing of the central core, and 4) an unacceptable level of inter-atomic clashing. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1BEF, or demonstrate that it was an experimentally determined structure, were available for examination.
The B factor distribution, sigma values, geometry and crystal packing for 1CMW are essentially normal. However, the exact numerical relationship of the B factors to those of the 1TAQ structure, which was used as a starting model, shows that B factors were not refined as described in the publication. Specifically, the B factors are identical to an accuracy of 0.01 by an exact numerical difference of 16.00. This could have occurred only if the 1TAQ B factors were copied into 1CMW after subtracting 16.00 and left without refinement. The Fourier maps computed with the structure factors reveal a striking absence of densities corresponding to water molecules, in spite of almost perfect agreement of the density with the submitted coordinates. Taken together, these abnormalities strongly suggest that 1CMW, in large part, corresponds directly to the 1TAQ starting model and the structure factors may have been calculated directly from this model. Therefore, ICMW does not correctly represent original data. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1CMW, or demonstrate that it was an experimentally determined structure, is available for examination.
1DF9 and 2QID
Per Dr. Murthy, 1DF9 was replaced with a corrected file, 2QID, because of bad contacts that had been identified by Dr. Piet Gros et al. The xyz coordinates and thermal factors of the proteins and inhibitor molecules of 1DF9 and 2QID are exactly the same. The R-factors (0.199), free R-factors (0.243), deviations of bonds from ideality (0.018Å) and number of reflections (41212) are exactly the same for 1DF9 and 2QID. However 1DF9 contains 331 waters while 2QID contains 176 water molecules. The xyz coordinates and thermal factors of the common water molecules are identical. 1DF9 has 188 bad water-protein contacts (< 2.0 Å) and 19 extremely bad water-protein contacts (< 1.0 Å). 2QID has no bad water-protein contacts (< 2.0 Å). It appears that 2QID was produced by the removal of 155 obviously incorrect water molecules from 1DF9. The differences in the water molecules of 1DF9 compared to 2QID have no apparent experimental explanation. Furthermore, it is not possible that two different models, which differ by over 100 water molecules, show exactly the same fit of model to data, as indicated by the R-factor and free R-factors of 1DF9 and 2QID. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structures of 1DF9/2QID, or demonstrate that these were experimentally determined structures, is available for examination.
1G40 was originally deposited in October 2000 with space group symmetry P212121. The reported unit cell had the following parameters: a = 65.3 Å, b = 115.4 Å and c = 121.9 Å, which were the values reported in the publication. This corresponds to a theoretical value of 45,170 possible reflections for this unit cell at a resolution of 2.2 Å, and it agrees with the reported number of reflections in the data set (39,322 reflections; 87% complete). In February 2007, the unit cell was inexplicably changed to: a = 65.3 Å, b = 104.40 Å, c = 141.90 Å. This implies that the diffraction pattern, even if the symmetry were the same, would be very different. Furthermore, the number of reflections in the data set would have to increase. It is simply not believable that such a discrepancy would be the consequence of an honest mistake, i.e., typographical errors on the PDB submission, as claimed by the PI. Also, the distribution of B-factor values bears no relationship to solvent accessibility or crystal contacts, and 1G40 does not contain any water molecules in spite of the good data resolution limit (2.2 Å). Further examination of the structure reveals absurd crystal packing in one particular area (residues Leu36, Pro37, Gly38 and Tyr39), which accounts for 16 of the 19 of the bad contacts in the structure. This region cannot possibly be correct. It is noteworthy that the protein conformation in this region of the structure is the same in two other structures published by Murthy et al. (1Y8E and 1RID). The 1Y8E and 1RID structures were refined at 2.2 and 2.1 Å, respectively. However, due to different crystallographic symmetry, these regions of the molecule have no bad contacts in 1Y8E or 1RID. The B-factors for Pro37 and Tyr39 are very close to the average overall B-factor for the structure. It is extremely unlikely that a model with such incorrect packing could be refined to an R-factor=19.8% and Rfree=23.4% at 2.2 Å resolution. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure 1G40 or demonstrate that this was an experimentally determined structures were available for examination.
1G44 has a distribution of B-factor values that bears no relationship to solvent accessibility or crystal contacts. Also, this structure has very low R-factors in spite of unrealistic intermolecular and intramolecular contacts and crystal packing, for example, there are 36 chemically impossible close contacts. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1G44, or demonstrate that this was an experimentally determined structures, were available for examination.
1L6L contains 2036 residues, 1011 waters and 64 BOG molecules. This is not the structure reported in Table 3, which supposedly contained 2366 residues, 1522 waters, and 67 BOG molecules. One can explain the discrepancy of waters if one assumes that the “last water” , which is number 1522 was carelessly reported rather than the total number of waters in the PDB file. However, the remaining discrepancies cannot be reconciled. Overall the 1L6L entry contains 21,138 atoms, compared to 21,619 reported in the paper. Furthermore, the lattice packing of 1L6L has considerable gaps that are also hard to reconcile with the 2.3? resolution diffraction limit for this structure, and the 1L6L crystals exhibit a solvent content of 78% (Vm=5.6). According to the Matthews Probability Calculator Server, the probability that this arrangement exists and diffracts to 2.3? resolution is 0.28% (see Kanardjieff and Rupp, Protein Science 2003 12:1865-1871). Finally, even if one assumes that 2OU1 and 1L6L represent extremely poor refinements, one cannot reconcile the fact that the PDB entries do not match the publication record. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1L6L or demonstrate that these was an experimentally determined structure, were available for examination.
2OU1 is missing 269 atoms that should be in the file based on Table 3 in the published paper. In addition, 2OU1 has 863 residues and 558 waters not 869 residues and 761 waters molecules as reported in Table 3. Thus, 2OU1 does not directly correspond to the coordinates reported in the Biochemistry paper. In addition, the packing of 2OU1 is highly asymmetric, which is not reflected in the B-factors of the 12 different chains. In addition, the structure registers a 0% molprobity clash score. These odd features of 2OU1 are not consistent with reported R/Rfree values of 18.7/21.9 at 2? resolution with data in the highest shell that exhibits an average I/σI of 10.4. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 2OU1 or demonstrate that these was an experimentally determined structure, were available for examination.
This structure exhibits poor geometry, improbable B factors, large solvent gaps in the crystal lattice, and an extremely high solvent content (~75%), features which are not consistent with a 2.1Å structure. 1RID exhibits extremely poor stereochemistry, but excellent refinement statistics. Analysis of 1RID’s updated coordinates using the molprobity website places the overall quality of the structure in the 0th percentile. This is because the r.m.s. deviations between ideal (Eng-Huber from CNS) and observed parameters are extremely poor and not consistent with the values reported in the published paper. Electron density, calculated using the deposited structure factors, is in excellent agreement with implausible or impossible structural features. Also, 1RID exhibits poor to impossible crystal packing and the lattice shows planes of molecules with no reasonable crystal packing interactions in the “a” direction of the 1RID lattice. For example, one of the closest contacts is A175 Glu to A54 Thr, but these are 4.6 ? apart. Furthermore, analysis of the structure factors reveals unreasonable sigma values, unreasonable inclusion of all low resolution terms out to 77Å resolution, and an unknown origin of different Rfree flags in original and updated files. The revised data submitted for 1RID is different and no longer contains the low resolution terms. However, neither the original nor revised data are the unmodified structure factors for 1RID as expected from the publication. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structures of 1RID, or demonstrate that this was an experimentally determined structure, are available for examination.
1Y8E exhibits a number of unlikely or improbable features, including poor geometry, improbable B factors, large solvent gaps in the crystal lattice (at least 8Å), and an extremely high solvent content (~77%; Vm 5.07). These features are not consistent with a 2.2Å structure. To justify the structure of 1Y8E, PDB entries 1OCY and 1H6W were cited as proving that crystals with lattice gaps can diffract to high resolution. However, these examples differ from 1Y8E in several ways. 1OCY and 1H6W are viral fiber proteins with large disordered segments, which are part of one protein. In the case of 1Y8E, it is hypothesized that disordered suramin molecules (molecular weight (MW) ~1500) “connect” the lattice. Since there is nothing to hold these molecules in place, such as in a covalent peptide chain, this explanation cannot be accepted. In addition, solvent content and Vm values for 1OCY, 1H6W, 1EZX, which all contain disordered segments, were compared to 1Y8E using Bernhard Rupp’s Matthews probability server. This analysis revealed solvent content, Vm values, and Probabilities(P) for 1OCY (64%,Vm=3.44,P=0.38), 1H6W (57%,Vm=2.87,P=0.98), 1EZX (50%, Vm=2.5, P=1.0), and 1Y8E (76%, Vm=5.07, P=0.008 assume MW=30,000). This one parameter calls into question the validity of 1Y8E. In addition, electron density, calculated using the deposited structure factors, is in excellent agreement with implausible or impossible structural features. Additional analysis of the structure factors reveals unreasonable sigma values, unreasonable inclusion of all low resolution terms out to 150Å resolution, and an unknown origin of different Rfree flags in original and updated files. The revised data submitted for 1Y8E is different and no longer contains the extreme low resolution terms. However, the conclusion remains that neither the original or revised data are the unmodified structure factors for 1Y8E as expected from the publication. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1Y8E or demonstrate that these was an experimentally determined structure, were available for examination.
2A01 exhibits a number of abnormalities that suggest the structure was not generated from actual diffraction data. These abnormalities include: almost perfect correspondence of the electron density to physically impossible features, viz. close contacts and poor geometry; abnormal B factor distribution that does not vary along the chain to reflect solvent exposure and packing, even in regions that are extremely exposed to the solvent; crystal packing that is characterized by very few intermolecular contacts and very high solvent content that is inconsistent with the resolution and B factor distribution of the published data; anomalies in the structure factor file, in particular the unreasonable sigma values that are indicative of data that have been computationally generated or manipulated. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 2A01, or demonstrate that this was an experimentally determined structure, were available for examination by the committee.
The coordinates for 2HR0 do not form a connected network of molecules in the crystal lattice. The diffraction data do not show the features that should arise from the presence of bulk solvent, whereas the molecular arrangement indicates that large regions are not occupied by protein molecules. The values for ksol and Bsol bulk solvent parameters in 2HR0 are far outside the normally accepted ranges for these parameters. It is also noteworthy that nowhere in the Methods section of his Nature paper is there any mention of non-standard bulk solvent corrections to the Fobs values.
The B-factors of the model do not vary significantly throughout the molecule, even though long segments of the chain are almost completely exposed to solvent (Janssen et al., Nature 448:E1-E2, 2007; note Figure 2 of their communication). The Rfree and R distributions are exceptionally low at low resolution, and the difference between Rfree and R is unusually small for a structure refined at 2.3 Å resolution with an amplitude-based target function. (Janssen et al Nature 9 Aug 2007; note Figure 1b of their communication). Dr. Murthy provided two responses to this allegation: (1) the Rfree and R distribution would be expected if X-ray terms in a restrained refinement were weighted more heavily than usual, and (2) overweighting the X-ray terms would reduce the R-value at the cost of some geometric distortion. Overweighting the X-ray terms can reduce the R-value at the cost of geometric distortion, however, the reported errors in the bond lengths, bond angles and torsion angles all suggest that the geometry was sufficiently restrained during refinement. Furthermore, many of the unrealistic contacts in this structure are far worse than simple geometric distortion.
There are 30 chemically impossible, close contacts shorter than 2.2 Å. Despite the large number of physically impossible clashes, the deposited structure factors show remarkably good correspondence in these regions. Inspection of both the Aσ-weighted 2Fo-Fc and the Fo-Fc electron density maps revealed very well-defined electron densities in every region of bad contacts, with no negative peaks present in the Fo-Fc difference electron density map and with B-factors no higher than elsewhere in the structure. This strongly suggests that the deposited structure factors have been calculated from the structure and do not reflect experimental data. Finally, the range of values for σF is orders of magnitude too large, larger even than the range of structure factor amplitudes. Regarding this point, experimental (“real”) σF values are derived from estimates of measurement uncertainties. For this reason, their values are limited and their range is a small fraction of the range of Fo. However, the range of Fo for 2HR0 is 0 < Fo < 14,215, while the range of σ is 0 < σ < 9948. This range for σ is completely unrealistic. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 2HR0, or demonstrate that this was an experimentally determined structure, were available for examination.