Fig. 1
From: Precise engineering of gene expression by editing plasticity

Project overview. A Model architecture of Basenji2. The architecture of Basenji2 contains M convolution blocks and N dilated residual blocks in sequence. B Sequence-to-expression deep learning model with a long input. For each gene, genomic sequences were extracted 100-Kb upstream and 20-Kb downstream of its TSS. The Basenji2-long model successively contains seven convolution blocks (M = 7) and eleven dilated residual blocks (N = 11). C Model-based identification of CRE. For each genomic sequence, a deep interpretability method estimates a contribution score for each base and then obtains a contribution score vector of equal length as input. A peak-calling algorithm is used to identify candidate CREs from the vector. (D) Validation of candidate CREs. UMI-STARR-seq is used to measure the activities of model-identified candidate CREs. E Sequence-to-expression deep learning model with short input. For each gene, the proximal regulatory sequences were used as the input, including the promoter, 5’UTR, 3’UTR, and terminator sequences. The Basenji2-3K model successively contains seven convolution blocks (M = 7) and four dilated residual blocks (N = 4). F Theoretical guidance for gene editing. Editing plasticity estimates the expression changes of simulated deletions. Evolvability space estimates the expression changes of simulated single-nucleotide mutations and displays three distinct patterns. Both reflect the gene editing potential. G AI-guided precise editing scheme. Leveraging the tools of editing plasticity and evolvability space, AI designs precise editing schemes for genes with editing potential for precise regulation and crop genetic improvement with CRISPR-Cas9