Word Vectors for DNA

DNA2Vec results

Word Vectors

With the advent of natural language processing (NLP) techniques empowered with deep learning approaches, more detailed relationships between words have been unraveled. Word2vec is quite robust in discovering contextual and semantic relationships. Genome being a long text, is subject to similar studies to unravel yet to be discovered relationships between DNA k-mers. Dna2vec applies word2vec approach to whole genome so that DNA k-mers are represented as vectors. Using this approach, we aim to predict the mutation susceptibility based solely on DNA sequence in order to understand the underlying mechanism or dynamics of mutations in genomes. While focusing on sequence-basedprediction, regions, where the mutations occur, are taken into account to elucidate the predispositions accurately.

Assist.Prof.Dr. Alper YILMAZ

My research interests include genome grammar and NGS analysis.

Related