Find unique sub-strings and preserve sequence

1600 views python
2

I'm trying to write a method that takes a string, for example a DNA string and outputs the number with the sub string and preserves the sequence.

For example:

>>dna = AABBBGGGKKDDDD
>>substring(dna) #some method
>>2A3B3G2K4D

I'm guessing I can have an empty array, and then create a for loop that iterates through each and every letter and if it's the same letter, it does a count and then adds the letter in the end. I'm just not sure how to syntactically write it out. Any help would be appreciated :)

answered question

I would recommend that you give that idea a shot and see where it leads :)

1 Answer

6

Here is a quick example.

dna = 'AABBBGGGKKDDDD'


def get_sequence(dna):
    sequence = ''
    previous_c = ''
    count = 0
    for c in dna:
        if c == previous_c:
            count += 1
        else:
            if len(previous_c) > 0:
                sequence += '{}{}'.format(count, previous_c)
            count = 1
            previous_c = c
    if count > 0:
        sequence += '{}{}'.format(count, previous_c)
    return sequence


print(get_sequence('A'))
print(get_sequence(''))
print(get_sequence(dna))

Output:

1A

2A3B3G2K4D

posted this

Have an answer?

JD

Please login first before posting an answer.