Write a code in python to identify matches and gaps (non-matches) between two aligned protein sequences. The output should be able to tell the positions where gaps are found and the number of matches and gaps.
Question
Write a code in python to identify matches and gaps (non-matches) between two aligned protein sequences. The output should be able to tell the positions where gaps are found and the number of matches and gaps.
Solution
Sure, here is a simple Python code that can be used to identify matches and gaps between two aligned protein sequences:
def compare_sequences(seq1, seq2):
matches = 0
gaps = []
for i in range(len(seq1)):
if seq1[i] == seq2[i]:
matches += 1
elif seq1[i] == '-' or seq2[i] == '-':
gaps.append(i+1)
return matches, gaps
# Test the function
seq1 = 'ACGT-ACGT'
seq2 = 'ACGTACGT-'
matches, gaps = compare_sequences(seq1, seq2)
print('Number of matches:', matches)
print('Positions of gaps:', gaps)
In this code, the function compare_sequences takes two sequences as input. It initializes a counter for matches and a list for gaps. Then it iterates over the length of the sequences. If the characters at the same position in both sequences are the same, it increments the counter for matches. If either sequence has a gap ('-') at the position, it adds the position (1-indexed) to the list of gaps. Finally, it returns the number of matches and the list of gap positions.
The test case uses two sequences with a gap at different positions. The function call and print statements display the number of matches and the positions of the gaps.
Similar Questions
Which of the following is NOT used for generating an optimal alignment of twonucleotide sequences?(A) Gap penalties(B) Match scores(C) Mismatch scores(D) Nucleotide composition
Which of the following tools is commonly used to identify conserved protein motifs in a set of sequences? BLAST Clustal Omega PROSITE Flye
Which one of the following is used for global alignment of two protein sequences?(A) Chou-Fasman method(B) Garnier-Osguthorpe-Robson (GOR) method(C) Needleman-Wunsch algorithm(D) Smith-Waterman algorithm
Which one of the following programs is used for finding distantly related (orremote) protein homologs?(A) BLASTN(B) BLASTX(C) PSI-BLAST(D) TBLASTX
You have a DNA sequence. You want to know which protein in the main protein database (“nr,” the nonredundant database) is most similar to some protein encoded by your DNA. Which program should you use? (a) BLASTN; (b) BLASTP; (c) BLASTX; (d) TBLASTN; or (e) TBLASTX.
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.