Posts

Showing posts from June, 2022

open horizons "Levenshtein Distance" Pattern Matching for DNA Sequencing Data

Image
                        Pattern Matching for DNA Sequencing Data Using Spring Batch and  Levenshtein Distance Author : Wadï Mami E-mail :  wmami@steg.com.tn/   didipostman77@gmail.com Date : 24/06/2022 DNA is a sequence of letters such as A, C, G, T. Searching for specific sequences is often difficult due to measurement errors, mutations or evolutionary alterations. Thus, similarity of two sequences using Levenshtein Distance is more useful than exact matches. So instead of Karp Rabin we will use  Levenshtein Distance or Jaro_Winkler_Similarity by using  Package org.apache.commons.text.similarity So Spring Batch + Levenshtein Distance or Jaro_Winkler Similarity = How Crispr cas9 Works due to (https://www.tudelft.nl/en/2018/tu-delft/mathematics-explains-why-crispr-cas9-sometimes-cuts-the-wrong-dna) Here the initial project https://didipostmanprojects.blogspot.com/2022/06/spring-...

Spring Batch Karp Rabin

Image
Pattern Matching for DNA Sequencing Data Using Spring Batch and Karp Rabin Author : Wadï Mami E-mail : wmami@steg.com.tn/ didipostman77@gmail.com Date : 17/06/2012 Abstract : Processing large volume of data has always been a major problem due to the  increasing volume of  the  data. Batch processing can be applied in many use cases. Among them why not Pattern Matching for DNA Sequencing Data. In this article, I am going to demonstrate batch processing using one of the projects of  Spring which is Spring Batch. Spring Batch provides functions for processing large volumes of data in batch jobs. In our case reading DNA file or database table and seeking for patterns I mean all the locations of the specified pattern inside a DNA sequence . Spring batch to process huge data : Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of...