lab 6 problem 1: dna. dna given a string with length n, determine the number of occurrences of some...
DESCRIPTION
DNA Substring is consecutive part of a string. Note that AG is not a substring of ACGTAC.TRANSCRIPT
![Page 1: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/1.jpg)
Lab 6
Problem 1: DNA
![Page 2: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/2.jpg)
DNAGiven a string with length N,
determine the number of occurrences of some given substrings (with length K) in that string.
For instance, String : ACGTAC (N = 6)Substring : AC (K = 2)Answer : There are 2 AC in string ACGTAC.
ACGTAC
![Page 3: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/3.jpg)
DNASubstring is consecutive part of
a string.Note that AG is not a substring of
ACGTAC.
![Page 4: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/4.jpg)
Brute-force AlgorithmFor each queryIterate through the entire stringFor each position in the string,
check the substring, and increment count
![Page 5: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/5.jpg)
DNA (70%)for (int i = 0; i < N; i++) {boolean found = true;for (int j = 0; j < K; j++) { if (text[i + j] != pattern[j]) { // character mismatchfound = false; break; }}if (found) counter++; }
![Page 6: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/6.jpg)
DNA (70%)We can answer one query in
O(N.K)Hence with Q queries, the time
complexity will be O(Q.N.K)Solution: For every query, we
check the substring with length K starting at index i
![Page 7: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/7.jpg)
DNA (100%)Java HashTable
![Page 8: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/8.jpg)
DNA (100%)Key: substringValue: Number of occurrences of
substringIterate through string once to
populate hashtable O(NK)Constant time for each query
![Page 9: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/9.jpg)
DNA (100%)ACGTACACGTACACGTACACGTACACGTACStore the substrings as key. AC, CG, GT, TA.
![Page 10: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/10.jpg)
DNA (100%)We will have:occur[AC] = 2occur[CG] = 1occur[GT] = 1occur[TA] = 1for (int i = 0; i < N – K + 1; i++) {
occur[hash(i, K)]++; // we increase the substring starting at index i with length K.
}
![Page 11: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/11.jpg)
DNA(100%)After we have built the table, we
can answer a query in O(1) By searching the hash table with
the query as the key
![Page 12: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/12.jpg)
AlternativeWhat if we do not have Java Hash
Table API?
![Page 13: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/13.jpg)
DNA – V2Implement our own hash table!Since K is very small, we can use
simple hash function and array as the table.
![Page 14: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/14.jpg)
DNA-V2Hash function?First, we map A to 1, C to 2, G to 3, T to 4. (we only have A, C, G, and T in DNA sequence).
![Page 15: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/15.jpg)
DNA-V2ACGTACACGTACACGTACACGTACACGTACWe only need to store the number related to the substring. AC = 12, CG = 23, GT = 34, TA = 41.
![Page 16: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/16.jpg)
DNA-V2We will have:occur[12] = 2occur[23] = 1occur[34] = 1occur[41] = 1for (int i = 0; i < N – K + 1; i++) {
occur[hash(i, K)]++; // we increase the substring starting at index i with length K.
}
![Page 17: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/17.jpg)
DNA (100%)After we have built the table, we
can answer a query in O(K) by calculating the hash value of the substring in that query (X)
Output the value in occur[X].
![Page 18: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/18.jpg)
Problem 2: Find Substring
![Page 19: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/19.jpg)
Find SubstringGiven 2 strings, Output 0: if a substring is not in
string1&2Output 1: if a substring is only in
string 1Output 2: if a substring is only in
string 2Output 3: if a substring is in both
string 1&2
![Page 20: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/20.jpg)
Find Substring (70%)Check the existence of a
substring in both strings to determine the answer.
You might notice that this problem is very similar to DNA problem, i.e. a substring is in a string if the number of occurrences is greater than 0.
Can be solved using the same technique for DNA(70%)
![Page 21: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/21.jpg)
Find Substring (100%)It is possible to reuse the solution
for DNAIf the number of occurrences of a
substring in a given string > 0, it means that we can find the substring in the string.
You need 2 tables, one for the first string and another one for the second string
![Page 22: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/22.jpg)
Find Substring (100%)For example, we have 2 strings,
i.e.ACGTAC and ACTGCAUse the same technique as the one in DNA
![Page 23: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/23.jpg)
Find Substring (100%)After we have built the table, we
can answer a query in O(1) E.g. check occurOne.get(“AC”)
and occur2.get(“AC”)
![Page 24: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/24.jpg)
Incantation-E
![Page 25: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/25.jpg)
TaskFind a interval (continuous
section)◦Contains all incantations◦Total length is minimal
{acer, wei, wei, acer, acer, jing, acer, wei}
![Page 26: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/26.jpg)
Idea{acer, wei, wei, acer, acer, jing,
acer, wei}Maintain the interval using a
queue◦Step1: Initially empty {[]acer, wei,
wei, acer, acer, jing, acer, wei}◦Step2: While the queue does not
contain all words, add words at the back of the queue {[acer, wei, wei, acer, acer, jing], acer,
wei}
![Page 27: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/27.jpg)
Idea◦Step3: While the front of the queue
is redundant, pop it out, and update the minimum total length {acer, wei, [wei, acer, acer, jing], acer,
wei}, min = 15◦Step4: if not reach the end of the list,
add the next word at the back of the queue, and goto Step3
◦Final Answer: {acer, wei, wei, acer, acer, [jing, acer, wei]}, min = 11
![Page 28: Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For](https://reader036.vdocuments.site/reader036/viewer/2022062600/5a4d1ba77f8b9ab0599c95e0/html5/thumbnails/28.jpg)
Time Complexity: O(N).
How to check whether the first word in the queue is redundant?◦Hashing to store the word’s
occurrence in the queue.