Photo of University Hall

View Profile Page

Faculty/Staff Login:

Charles Du

Professor, Biology

PhD, Texas A&M University
Download vCard


My research interest is bioinformatics, a unique interdisciplinary area, which is the merger of biotechnology and information technology with the goal of revealing new insights and principles in biology. My current grant, entitled “Facile Production and Efficient Indexing of Transposon-tagged Lines Using Next-generation Sequencing Technology for Maize” funded by NSF, uses the next generation sequencing technology to analyze the maize genome. I have been carrying out bioinformatics analysis for sequence assembly, annotation, and mapping of millions of short reads to the maize reference genome. My another project is The Genomics Education Partnership (GEP). The Partnership is funded by the Howard Hughes Medical Institute along with fifty other institutions across the country, with the goal of providing research opportunities for undergraduate students. My research results have been published in very prestigious scientific journals, such as Science, Proceedings of the National Academy of Sciences of the United States of America, Genome Research, BMC Biology, BMC Evolutionary Biology, BMC Genomics, and Molecular Plant-Microbe Interaction.


Bioinformatics, Evolutionary Genomics

Office Hours


11:00 am - 12:00 pm
11:00 am - 12:00 pm


Research Projects

A sequence-indexed Reverse Genetics Resource for Maize: a Set of Lines with Single Ds-GFP Insertions Spread Throughout the Genome

This NSF-funded grant, provides invaluable resources for researchers to fully exploit the maize genome organization. The availability of a mutant line in which a single gene has been disrupted by a green fluorescent labeled Ds transposon or Dsg gives biologists a powerful tool in understanding the action of that gene. To this end, my lab has carried out bioinformatics analysis for sequence assembly, annotation, and mapping of millions of short reads to the maize reference genome. More specifically we developed a software package, InsertionMapper, to extract Dsg transposon junctions from the terabytes of NextGen sequencing data and map them to the maize genome. We also created a web-searchable database hosted at Montclair State University, link below.

The computational identification and characterization of Helitrons

My 2014 PNAS publication entitled “HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes” is a seminal paper that will lead to further work by many researchers. Helitron transposons have drawn much attention since their recent discovery because of their active role in reshaping genomes by capturing gene fragments and shuffling exons during transposition. Unlike other DNA transposons, Helitrons do not end in inverted repeats or create target site duplications, so they are challenging to identify. We developed a generalized tool, HelitronScanner, which uncovered a large number of new Helitrons from plant and animal genomes. HelitronScanner provides a valuable resource to the community for large-scale and automated Helitron identification, thus setting the stage for unprecedented comparative and evolutionary studies of this unusual transposon family. These findings are especially important for studying the mechanism of Helitron transposition that is still an unsolved puzzle. This line of research will continue to be fruitful for years to come and will continue to uncover the many roles that transposable elements have played in the shaping of extant plant genomes.

Integrated Regulatory Networks

Our new focus is on the integrated regulatory network study based on multidimensional motifs and gene expression profiles in plants. The complex nature of relationship between phenotype and genotype is the most important and challenging area in biology. To infer various regulatory networks that dominate the genetic information flow and final protein production is the key to understanding the underlying genetic mechanisms. We have applied multidimensional motif discovery algorithms for complex biological processes and integrated data sources into a uniform model. More specifically our method integrates information theory and probability-based algorithms, takes gene/gene product expression profiles under different conditions, and calculates their inter-relationships denoted by mutual information content with prediction confidence levels. Our newly devised pipeline for regulatory network inference is a valuable tool to data mining mountains of plant genomics data. We analyzed genome-wide spatiotemporal transcriptome RNA-Seq data of B73 maize seed development using this pipeline. The results showed that there are 91 transcription factors and 1,167 genes present exclusively in seed development among an overall of 26,105 investigated genes. This work provides an in-depth dynamic view of the complex regulatory network in maize kernel development. We are going to present our preliminary results at the 57th Annual Maize Genetics Conference to be held at Chicago in March 2015.