Where does the interaction data available for search and download on this web portal come from?
The interaction data comes from two sources. The majority comes from a series of systematic screens of ever larger ORF collections performed in the Vidal lab at Dana-Farber Cancer Institute/Harvard Medical School. These screens were performed using a systematic binary mapping pipeline based upon primary screens using a high-throughput yeast two-hybrid assay, followed by validation of the dataset using two or more orthogonal assays.
The remainder of the data comes from a curated set of interactions reported in the literature from small scale studies. The publically available interaction data was filtered to identify the high-quality binary interactions as described in Rolland et al Cell 2014 (http://www.cell.com/abstract/S0092-8674(14)01422-6).
Does the interaction data originate from experiments and/or predictions?
All of the systematic data come from our systematic screening pipeline and have at least one piece of experimental data.
The literature curated dataset has been filtered to identify the high-quality binary set of interactions, and each interaction is required to have at least two pieces of experimental evidence in the original publications.
I have my query genes in a different identifier format (neither gene symbols nor Uniprot IDs). What can I do to still use them as query on this web portal?
Currently our portal can only be searched using either gene symbols or Uniprot identifiers but we are working on very much diversifying the set of allowed gene identifiers to search with. In the mean time, you can convert your list of query genes into gene symbols or Uniprot IDs at these websites (http://www.uniprot.org/uploadlists/, https://david.ncifcrf.gov/conversion.jsp).
Why does my search not return any PPIs?
We have currently screened ORFs corresponding to over 17,000 human genes using our binary interaction mapping pipeline (a full list of the genes we have screened is available here (ensembl_gene_ids_screened_at_CCSB.tsv). However, we may not have screened your gene of interest yet because we don’t currently have an ORF clone available for this gene.
The other possibility is that even though we have screened for PPIs with an ORF of your gene of interest, this ORF may not have resulted in any PPIs. While our binary interaction mapping pipeline is designed to be systematic and un-biased, there are some proteins which may prove to be refractory to the assays used. For example, (i) proteins that are secreted or require significant post-translational modification may not form stable interactions under our assay conditions, (ii) some human proteins may be unstable or not fold correctly when expressed in yeast, or (iii) some proteins may only interact as parts of large complexes and not as binary pairs. Furthermore, we are currently screening a single ORF clone for each gene in our space. In some cases, our clone may represent a minor alternative isoform which does not form stable protein-protein interactions.
What would be a good confidence score cutoff to filter the interactions?
The confidence score is intended to rank human binary protein-protein interactions (PPIs) identified in systematic screens at CCSB based on their biophysical quality, rather than serve as an absolute cutoff to filter interactions. This score quantifies only a small variance in biophysical quality within the dataset, therefore should not be used to discard PPIs for quality concerns but instead it can be used to prioritize a list of PPIs of interest for experimental follow-up.
Why is there not a confidence score for every PPI?
The confidence score of a pair is calculated based on several features of how the interaction was detected during the screening. This data is only available for pairs detected in the most recent screens (since 2014), and hence we are only able to calculate confidence scores for these pairs.
In which format do I need to save the search results for upload into Cytoscape?
To upload the search results into Cytoscape, export them as a .csv file.
Can I use your unpublished interaction data in my publication?
Yes, there are no restrictions on using small numbers of our unpublished interactions in your publications. We have a narrow 12 month moratorium on the publication of global analyses on the full unpublished dataset. For more details please see the Guidelines on use of preliminary data (Download Page)
How should I cite the web portal?
A manuscript to describe the web portal is in preparation. Please, check back for updates. To cite the interaction data, please, see below.
How should I cite the published interaction data?
If you use published interaction data please cite the relevant publication (Documentation).
How should I cite the unpublished interaction data?
Users are expected to acknowledge the following in all oral or written presentations, disclosures, or publications of the analyses:
The Center for Cancer Systems Biology (CCSB) at the Dana-Farber Cancer Institute
The funding organization(s) that supported the work:
(1) The National Human Genome Research Institute (NHGRI) of NIH
(2) The Ellison Foundation, Boston, MA
(3) The Dana-Farber Cancer Institute Strategic Initiative
How can I get information on the clone used to identify an interaction returned from my search?
The clones used in our screens come from the ORFeome clone collection assembled at CCSB (http://horfdb.dfci.harvard.edu) and via the ORFeome Collaboration (http://www.orfeomecollaboration.org). Details on the cloning strategy, the source material and the nucleotide sequence of the clones are provided at our ORFeome web portal (http://horfdb.dfci.harvard.edu). Part of our next update will be the possibility to obtain more detailed experimental information on every interaction displayed on the results page including the ORFs, fusion constructs, orientations, and Y2H assay versions used to detect this PPI.
Isn't yeast two-hybrid data full of false positives?
No! Like any other experimental approach, the quality of the data generated is dependent on the careful design of the experiment and rigorous attention to detail in performing the experiments. We have been developing our binary interaction mapping pipeline for over 15 years and established numerous quality control measures. All of our primary yeast two-hybrid datasets are validated by testing a subset of interactions in at least two orthogonal assays to ensure that the quality of the dataset is equal to, or greater than, a representative sample of interactions selected from the literature (Rual et al. 2005 Nature (http://www.nature.com/nature/journal/v437/n7062/abs/nature04209), Rolland et al. 2014 Cell (http://www.cell.com/abstract/S0092-8674(14)01422-6).