Project Euler #300

Project Euler #300

In a very simplified form, we can consider proteins as strings consisting of hydrophobic (H) and polar (P) elements, e.g. HHPPHHHPHHPH. For this problem, the orientation of a protein is important; e.g. HPP is considered distinct from PPH. Thus, there are 2ⁿ distinct proteins consisting of n elements.

When one encounters these strings in nature, they are always folded in such a way that the number of H-H contact points is as large as possible, since this is energetically advantageous. As a result, the H-elements tend to accumulate in the inner part, with the P-elements on the outside. Natural proteins are folded in three dimensions of course, but we will only consider protein folding in two dimensions.

Assuming that H and P elements are equally likely to occur in any position along the string, the average number of H-H contact points in an optimal folding of a random protein string of length 8 turns out to be 850 / 2⁸ = 3.3203125.

What is the average number of H-H contact points in an optimal folding of a random protein string of length 15? Give your answer using as many decimal places as necessary for an exact result.

This was a rather cool one. Unfortunately it required a brute force method. Nevertheless without some optimizations (or – in my case – lazy dumb ass parallelization) it was impossible to compute without waiting long hours. Running time on a single CPU – 10 hours. On a 96-core grid – 6.25 minutes 😉 Oh, and I used my mpi::map() helper 😀

8 Comments

dgg32 says:

September 23, 2010 at 3:04 pm

Hi, Stanisław. brute force how? For this problem I don’t even have an idea to write a brute force program. Thanks for any hint.

Reply
admin says:

September 23, 2010 at 7:46 pm

You can imagine every protein chain in this problem as a free (i.e. rotation/translation doesn’t make it a different one) polyomino. If you find a way to generate all free polyominos of length 15 you’re halfway there. Then you can make a list of contact points for every generated polyomino and brute force your way through every possible HP combination of length 15. This way you evaluate 2^15 (32768) HP combinations against contact point lists of ~12500 polyominos (if you do it the proper way, I couldn’t generate less than ~100000 polyominos). For every HP combination you take the polyomino with contact list that gave the best score and the average of those best scores is what you’re supposed to give as an answer. There is also a subtle optimization possible during the contact point lists evaluation step but it’s not crucial so I leave it for you to figure out 😉

Reply
dgg32 says:

September 23, 2010 at 11:04 pm

Hi thanks for the hints. About the 2^15, I think we can half them due to the fact that the best configuration for HPPPHP is also the best for its reverse counterpart PHPPPH.

Just a small question for my full understanding, what do you mean by free polyaminos? Do you mean all the possible 2D configuration of a 15-letters string? If so, I think I can try dynamic programming to pick up those clump configs and drop all those straight lines.

I have read a paper that uses SOMs to run 12000 trials on a 20 aa string. Only 4 of them yielded the optimal answer: 4.

Thank you very much.

Reply
admin says:

September 24, 2010 at 10:19 am

Now that I think of it – ideally I mean all the possible self-avoiding walks of length 15 that produce different sets of contact points. No matter how you generate them, the most important thing it to limit their number, because then you will be testing them against 2^15/2 (obviously you’re right about HPHP having the same best folding as PHPH, etc.) proteins.

Reply
dgg32 says:

September 24, 2010 at 11:06 am

man, just want to say thank you again

Reply

Several days later, I finally have solved it in python. Too many hard code. Reading the threads in the forum, I could only understand their ideas, not their codes.

The only thing that I can’t live with is, I have to first build up a coordinate system and then the contact list. Did you have used a coordinate system? If not, could you tell me how to avoid it? Thanks.

So here is the coordinate part:

import copy
import math
#min:max
contact = []

n = 15

x_action = [1,-1,0,0]
y_action = [0,0,1,-1]

#set({point:(x,y)})
proteins = []
proteins.append({0:(0,0),1:(1,0),2:(2,0),3:(3,0)})
c = []
contact.append(c)
proteins.append({0:(0,0),1:(1,0),2:(2,0),3:(2,-1)})
c = []
contact.append(c)
proteins.append({0:(0,0),1:(1,0),2:(1,-1),3:(0,-1)})
c = [(0,3)]
contact.append(c)
proteins.append({0:(0,0),1:(1,0),2:(1,-1),3:(1,-2)})
c = []
contact.append(c)
proteins.append({0:(0,0),1:(1,0),2:(1,-1),3:(2,-1)})
c = []
contact.append(c)
#print proteins
for step in range(4,n):
    print step, len(proteins)
    lastAA = step - 1
    temp = []
    temp_c = []
    for i in range(len(proteins)):
        lastAA_pos = proteins[i][lastAA]


        for j in range(len(x_action)):
            tentative_pos_x = lastAA_pos[0] + x_action[j]
            tentative_pos_y = lastAA_pos[1] + y_action[j]
            #print (tentative_pos_x, tentative_pos_y)
            found = False
            temp_contact = copy.deepcopy(contact[i])
            for AA in proteins[i].keys():
                if proteins[i][AA] == (tentative_pos_x, tentative_pos_y):
                    found = True
                if math.sqrt((proteins[i][AA][0] - tentative_pos_x)**2 + (proteins[i][AA][1] - tentative_pos_y)**2) == 1 and AA != lastAA:
                    temp_contact.append((AA,step))
                    #print "found"
            if found == False and abs(tentative_pos_x) < 6 and abs(tentative_pos_y) < 5:
                #print (tentative_pos_x, tentative_pos_y)

                temp_protein = copy.deepcopy(proteins[i])
                temp_protein[step] = (tentative_pos_x, tentative_pos_y)
                #print temp_protein,proteins[i]
                temp.append(temp_protein)
                #print temp

                temp_c.append(temp_contact)




    proteins = temp
    contact = temp_c
print len(contact)

admin says:

October 13, 2010 at 4:27 pm

Glad you’ve made it. I’ve also used coordinates so can’t help you there. How much time does it take to compute the result using your code?

Reply
- Dr. Winderl says:
  
  April 27, 2022 at 7:39 am
  
  Genial, sehr genial.
  
  Reply

8 Comments

Leave a Reply Cancel reply