Motivation: Both the lack or limitation of experimental data of transcription factor binding sites (TFBS) in plants and the independent evolutions of plant TFs make computational approaches for identifying plant TFBSs lagging behind the relevant human researches. Observing that TFs are highly conserved among plant species, here we first employ the deep convolutional neural network (DeepCNN) to build 265 Arabidopsis TFBS prediction models based on available DAP-seq (DNA affinity purification sequencing) datasets, and then transfer them into homologous TFs in other plants.
Results: DeepCNN not only achieves greater successes on Arabidopsis TFBS predictions when compared with gkm-SVM and MEME, but also has learned its known motif for most Arabidopsis TFs as well as cooperative TF motifs with PPI (protein-protein-interaction) evidences as its biological interpretability. Under the idea of transfer learning, trans-species prediction performances on ten TFs of other three plants of Oryza sativa, Zea mays and Glycine max demonstrate the feasibility of current strategy.
Availability and implementation: The trained 265 Arabidopsis TFBS prediction models were packaged in a Docker image named TSPTFBS, which is freely available on DockerHub at //hub.docker.com/r/vanadiummm/tsptfbs. Source code and documentation are available on GitHub at: //github.com/liulifenyf/TSPTFBS.