# Find unique sub-strings and preserve sequence

1600 views
2

I'm trying to write a method that takes a string, for example a DNA string and outputs the number with the sub string and preserves the sequence.

For example:

``````>>dna = AABBBGGGKKDDDD
>>substring(dna) #some method
>>2A3B3G2K4D
``````

I'm guessing I can have an empty array, and then create a for loop that iterates through each and every letter and if it's the same letter, it does a count and then adds the letter in the end. I'm just not sure how to syntactically write it out. Any help would be appreciated :)

I would recommend that you give that idea a shot and see where it leads :)

6

Here is a quick example.

``````dna = 'AABBBGGGKKDDDD'

def get_sequence(dna):
sequence = ''
previous_c = ''
count = 0
for c in dna:
if c == previous_c:
count += 1
else:
if len(previous_c) > 0:
sequence += '{}{}'.format(count, previous_c)
count = 1
previous_c = c
if count > 0:
sequence += '{}{}'.format(count, previous_c)
return sequence

print(get_sequence('A'))
print(get_sequence(''))
print(get_sequence(dna))
``````

Output:

``````1A

2A3B3G2K4D
``````

posted this