The most popular varieties of tea—including black tea, green tea, Oolong tea, white tea, and chai—all come from the leaves of the evergreen shrub
Understanding how the tea tree genetically differs from its close relatives may help tea growers figure out what makes
Previous studies have suggested that tea owes much of its flavor to a group of antioxidants called flavonoids, molecules that are thought to help plants survive in their environments. One, a bitter-tasting flavonoid called catechin, is particularly associated with tea flavor. Levels of catechin and other flavonoids vary among
Researchers from Kunming Institute of Botany gathering tea tree leaves. Credit: Chao Shi
Caffeine and flavonoids such as catechins are not proteins (and therefore not encoded in the genome directly), but genetically encoded proteins in the tea leaves manufacture them. All
Gao and his colleagues estimate that more than half of the base pairs (67%) in the tea tree genome are part of retrotransposon sequences, or "jumping genes", which have copied-and-pasted themselves into different spots in the genome numerous times. The large number of retrotransposons resulted in a dramatic expansion in genome size of tea tree, and possibly many, many duplicates of certain genes, including the disease-resistant ones. The researchers think that these "expanded" gene families must have helped tea
However, these duplicated genes and the large number of repeat sequences also turned assembling a tea tree genome into an uphill battle. "Our lab has successfully sequenced and assembled more than twenty plant genomes," says Gao. "But this genome, the tea tree genome, was tough."
Li-zhi Gao collecting tea tree leaves to be sequenced. Credit: Yong-sheng Yi
For one thing, the tea tree genome turned out to be much larger than initially expected. At 3.02 billion base pairs in length, the tea tree genome is more than four times the size of the coffee plant genome and much larger than most sequenced plant species. Further complicating the picture is the fact that many of those genes are duplicates or near-duplicates. Whole genomes are too long to sequence in one piece, so instead, scientists must copy thousands upon thousands of genome fragments, sequence them, and identify overlapping sequences that appear in multiple fragments. Those overlap sites become sign posts for lining up the fragments in the correct order. However, when the genome itself contains sequences that are repeated hundreds or thousands of times, those overlaps disappear into the crowd of repeats; it's like assembling a million piece puzzle where all the middle pieces look almost exactly alike.
All told, even with modern sequencing, assembling the genome took the team over 5 years.
And still, there is more work to do, both in terms of double-checking the genome draft and in terms of sequencing different tea tree varieties from around the world. "Together with the construction of genetic maps and new sequencing technologies, we are working on an updated tea tree
Germplasm Bank of Wild Species
Kunming Institute of Botany, Chinese Academy of Sciences
Prof. GAO Lizhi