Tracy Whelen

posted by: Ms. Martin 20 August 2010 No Comment

Tracy Whelen (Garfield ’10) spent the summer after her senior year as an intern for the Institute for Systems Biology.  While there, she expanded on the programming skills she had gained in Creative Computing as well as on her biology knowledge.  Check out the chromosomal coverage tool she worked on!

Over the summer, Tracy learned about working with a large existing code base, producing graphics and working with several new languages and environments including Perl, SQL, C++, Unix and Vim.  She also worked with biology-specific programming libraries including BioPerl and an Ensembl API.

If you are interested in biology, you might be interested in following in Tracy’s footsteps.  Tracy said she’s more than happy to discuss her experience in greater depth, so please let Ms. Martin know if you’d like to get in touch with her!  In the mean time, let’s take a closer look at what Tracy accomplished…

Week 1

Learn Perl, learn Perl, learn Perl. By the end of Friday it felt like anything more I tried to stuff in my brain was going to bounce off of it like a ping pong ball, let alone going in one ear and out the other. However in 4 days I got through the six chapters of basic Perl, and I was starting to tackle understanding the BioPerl (special module) code for the graphics.

Week 2

Went from barely understanding the code to modifying a skeleton program so that it drew a graphic that charts observed proteins (downloaded data from PeptideAtlas) based on chromosomal coordinates. In other words, draw a scale the length of the chromosome by base pair length, and then match the proteins along that scale by their length in base pairs.

Week 3

Added tracks to show genes and karyotypic banding. The genes are so that you can compare and see if there are genes for which we have no observed proteins. Karyotypic banding is to show how base pairs relate to genetic locus of location. Also, from here hopefully I can read the genetic locus for proteins that don’t have chromosomal coordinates listed, and draw them by their genetic locus/band instead. It’s interesting learning a second programming language because of the comparisons between Python and Perl that I kept making. I prefer not having to initialize/have local vs. global variables like Python lets you do, but I like the regular expression weird character abilities of Perl.  Regular expressions in Perl use almost every single piece of punctuation with different special meanings.

Week 4

Added track to show which genes are seen in PeptideAtlas (aka, there is a PeptideAtlas protein on that gene). This should cover all the proteins, however it doesn’t. When I compare the coordinates of each they keep not quite lining up right. For some reason about half of the chromosomes don’t work (they spit an error instead of a picture). No clear pattern to this, so I’m investigating. Also made a real histogram of how many genes are present per 1 million base pairs. Apparently gene density has a connection with genes that express proteins. However this graph basically looks the same as what I already have, due to BioPerl’s staggering feature for overlapping track bits. Now I’m working on zooming in on a section of the chromosome.

Week 5

Got the zooming feature working. Already the user has the option of specifying a region to look at (ex start = >30,000,000). I used this to change what portion of the chromosome is shown in the picture.  Meant that I had to get it to read from those boxes, and then depending on what was (or wasn’t) in the boxes it modifies starting/ending locations. Had to use regular expressions to parse >30000000 (or some such) into two pieces – modifier and digits.

I solved why half the chromosomes wouldn’t draw. There was a karyotype stain type that I hadn’t noticed before, so the program was doing “if a…, if b…, if c… wait, where’s if x…?” Once I noticed this it was easy just to put in “if x…” I’m basically done with this program (all but the last little problems that slowly get discovered), so I’m starting to learn C++ in order to do some tidying work on another program. This next task will make it so that errors actually print useful information, such as ” the file name is not in the right format” or some such.

Week 6

C++ is slowly starting to make sense, and learning a third language is easier than the second language, which was easier than the first. It’s pretty different from Python and Perl in its syntax, and because it’s a compiled language. At least I don’t have to also relearn all the basics like what are strings and ints. Pointers are another new concept, combined with dealing with where in memory something is being stored or how much memory it uses. I guess the references in Perl are a little like pointers though. I’ve just done little test programs in C++ so far, although this afternoon things finally started to come together, and I’ve almost got a little program that checks the existence and permission to write for a file. Now I just have to figure out how to check writeability without opening and closing the file, and how to work this all into the big program. Note – by the end of the week I figured out how to check writeability without opening and closing the file.

As for the final details in my Perl program, I’ve just realized that intense zooming is problematic because it insists on drawing to the end of a karyotype band for the scale, and that the graphic makes it barf if there are no proteins in the specified range. However this morning I did fix it so that it doesn’t spit errors at you every time you try a non-human build.  I also wrote up a wiki page that explains how to access/make the protein graph, and then how to interpret it.

Week 7

Final week of my internship and today is Friday the final day. I’ve spent a lot of this week continuing to debug my protein program (Perl program). You never realize how many bugs can be lurking until you have to go hunt them all down. I think I got the last one though, and we should be able to try to rollout again today and hopefully not find any new problems. (Third time’s the charm?) A lot of the issues I’ve had to deal with have been what happens when people use more of the user input options than I had originally been dealing with. I’ve found that I can ignore most of them as long as they input the chromosome number. I’ve also managed to fix last week’s bugs. The no proteins in the range problem was connected to one that Terry (my boss, and the writer of much of the other code in this program) had to solve because it had to do with how the data table was drawn from the SQL database. I ended up being able to take out my SQL query after that because it was redundant. Basically Terry implemented the SQL query I was using onto the entire program. This past week or two I’ve also been cleaning up my code – getting the indents tidy, removing stray test print statements, etc.

As for C++, I got my test program to work, and now I’ve put that code into the real program. I need a refresher though on how to test this code.

Conclusion

I’ve learned a lot, and had an interesting summer. This was a really great chance to learn in a self taught format – I had to take the initiative to ask questions and use the web and people I know to find answers to my questions. The more you have to teach yourself, the better you get at it. Also, the more programming languages you know, the easier new ones are to learn. However, sitting at a desk staring at a computer for 8hrs a day isn’t something my eyes or back appreciated, so remember to get up and go walk around a little bit every hour or two! If you have any questions feel free to contact me – I would be happy to chat with you. Ms. Martin can give you my email address.

1 Star2 Stars3 Stars4 Stars5 Stars (17 votes, average: 2.88 out of 5)
Loading ... Loading ...

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>