Monday, August 17, 2009

Goals Met

Got the doctests all prettified.  They are located in biolib/src/mappings/swig/python/test, and called test_bpp_*.py, where the * is utils, numcalc, seq, and popgen.  I have notes for every class, and also for functions whose use isn't very clear.

Just checked over Hilmar's e-mail with the goals he set for me.  I was slightly mistaken--he wanted 60% of the classes wrapped, with 80% of those tested--not 80% with 60% tested as I thought.  I'll quote him directly:

1. Map for Python with SWIG 60% of classes and templates of all Bio++
modules

2. 80% of all methods in those classes and templates have complete doctests
with a clear description of what the method is supposed to do.

Now, I rechecked my list of objects.  It turns out that there are 416 classes, not 432.  That's because 16 of those are nested classes that SWIG can't handle anyway.  The number I have wrapped is 414--I could never get numcalc/LUDecomposition to work, and utils/BppString was also ill-behaved (and totally useless for this project, anyway).  That's 99.5%. The number I've tested thoroughly kept creeping up and down slightly as I edited my doctests, but it settled out at 208.  That's exactly half of the total that I have completely unit-tested.  So that achieves the 48% (80% × 60%), with a little to spare.

Am I happy with this?  Yes and no.  I think it's obvious that I managed to put out a lot over the last couple of weeks, and I'm satisfied with my progress.  But it's still short of the goals I set out to do.  Yes, I think those goals were ambitious.  When I set them up, I hadn't looked at the Bio++ code as closely as I should have.  I should have known that doing each library, wrapping and testing, in a week was too much.  But I said I'd do it, and I still feel responsible for it.  So there we go...

I'm also more in touch with Julien and the Bio++ coders, and I think I'd like to switch gears a little now, and maybe work with them to make the code a little more SWIG-friendly.  We'll see how that goes.  I also need to work with Pjotr to get everything incorporated into the whole biolib framework.  So there's still stuff to do.  This autumn I have a teaching gig at a local college, but that's not going to take all of my time.  So I'm going to keep working on bits and pieces of this.  Despite the last month of high stress, I still feel that this project is a good one, with a lot of potential to help scientists do what they need to do, with a minimum of pain.

In the meantime, it's past 3:30 AM, and I'm probably blathering.  So I'll sign off.

שלום

Sunday, August 16, 2009

Not Much to Report

Went through the utils doctest today, adding comments where I though aspects were unclear.  I'm pretty happy with that document.  Just need to do the other three now...

Also found a place in the doctest where I forgot to demarcate a class.  Thus I have 209 objects, not 208. Woo hoo.  Not that it's a big object--it's just an exception.

Saturday, August 15, 2009

Hilmar's Goal Met...

...at least technically. I'll get to that in a second. Of the 432 objects in Bio++, I have the vast majority of them wrapped (I think all but 3), and 208 of them doc-tested. Hilmar asked me to have 80% of them wrapped, and 60% of those documented.

Managed to uncover some of the formats by doing some fair involved web hunts. Did Clustal and DCSE. Both of those inherit from a couple other objects, so I could do those too. Also did one of the static "Tools" objects. So yes, I met the quota without "cheating" by unit-testing übersmall Exceptions. (Found out that the Clustal format that Bio++ does is out-of-date, but these can change so fast that that's not totally unexpected. More serious is the off-by-one error that causes it to chop off the first amino acid in a protein.)

Finally figured out why I hadn't been able to post to the various Bio++ message boards, even though I've been getting all their messages. I subscribed with my gmail address, when I was trying to send with my cs.wisc.edu address. It's sorted out now. Actually, I was thinking of doing some coding for them this fall. I'll have some free time, and I think I'll be able to do more good for this project there, working on Bio++ code to make it more palatable for SWIG. At the very least, I'm hoping that I can get all the compilable code into the .cpp files, rather than the .h files. That would make a lot of things easier for SWIG.

So...I think now I've technically met the goals that Hilmar set for me to get a passing grade. However, the doctests are ugly. Basically, they show a ton of examples using the code, with barely any explanation. But I need to make sure that these documents are useful. So...this weekend I'm going to work on beautifying them, so that they might help someone actually use this tool.

Friday, August 14, 2009

203 (So Close...)

Wanted to finish today, but didn't quite make it.  Did 15 objects today. Many of those were quite large, in fact (hence not quite making it).

I also managed to address the const vector problem. I managed to get constructors that returned null objects, but otherwise didn't crash. (At least, until you tried to use the null object--then it's segfault city.) It turns out that when you use %extend to create a new constructor inside of SWIG, it's not really a constructor. It's more of a pseudo-constructor, that looks like a constructor from inside of your scripting language. Thus, you need to explicitly return something--hence my null value. Once I added that in, it was golden.

So what objects am I going to unit test for my last 5? Ideally, I'd like to do more file-formatting stuff, which Bio++ uses to read and write files of specific types. Unfortunately, I haven't been able to find example files for most of those, and the descriptions I've found have been lacking. I asked Julien and company if maybe they had some example files--after all, they had to debug them using something. But, I haven't heard anything back.

Option #2 would be to just knock off some Exception objects inside the phyl library. This seems like cheating, somehow. I mean, Exceptions are quick and easy, and I could have my 208 objects within 20 minutes. But having those unit tested doesn't buy us that much.

Probably, I'm going to slog through the few functions remaining in seq, which involves a large number of static utility functions that do random things. Those will be slow to do, but it seems more responsible. After all, if I didn't have this deadline looming over me, that's what I'd do next. Having those tested is really worth something.

Thursday, August 13, 2009

188 Objects (of Beer on the Wall)

Did 28 today. Part of that is because I started working with popgen. Part is because I had a breakthrough with seq, that let me do several more objects there. The seq library is almost all tested, in fact.

Spent a couple of hours hacking around with the const/vector problem. Got a suggestion that I should try to create an extension to the VectorSiteContainer object, that would take vectors that aren't const. I tried several things...none of them worked. Not sure why--looking at the code, it seems that we don't need to store the original address of the vector anywhere. And yet, the vector is passed by reference. I really don't know why they decided to do that.

Tried to post to the Bio++ boards again, but I don't think it got through.

So...I'm at about 43.5% unit tested.  Hilmar's goal is for 48% to be done. I have 20 to go. I'm going to try to finish that up tomorrow, and then spend the weekend going over my doctests and expanding them, so that they're human-readable documents. Fact is, Bio++ needs much better documentation. And given that I've now deduced about half the methods (often the hard way), I might be the best person for the job.

Wednesday, August 12, 2009

Only 160

Six objects today. Not a good day.

Spent a couple of hours hacking around with VectorTools. This ended up not being a good idea. Well...yeah, I got a lot of it tested. And it is an important class. But it turns out that sections of the thing give SWIG indigestion when it tries to compile the wrapper files. This seems to be a really solid reason to put your code in the .cpp files, not the .h files. If these bits of code were already compiled, tucked away in their object files, I wouldn't be having these problems. Instead, it's in the header, where it gets entangled with SWIG, and SWIG starts having problems with certain templates.

And then I spent a fair amount of time today working with VectorSiteContainer. The main constructor to that requires a vector as one of its arguments. I tried for a very long time to get SWIG to wrap such an object, with no luck. The closest I could get was a vector, which isn't good enough. Every time I tried to add that "const" in there, the SWIG-generated wrapper file wouldn't compile. Finally, very late at night, I threw a request for help to the swig mailing list (and also CC'ed to the phyloinformatics people), but I don't expect a response soon.

Update--message came in, as I was writing that last paragraph. It turns out that this is a known bug in SWIG. Which means that it's not going to be fixed this week, which means that there are 27 classes in seq that I won't be able to test.  So...I'm going to have to move on to phyl and popgen.

On other piece of info, though. According to Bio++ docs, there are 432 objects. Multiplying this by 0.48 yields a little less than 208, so that's my goal. That means that I now have 48 to go.

Tuesday, August 11, 2009

154!

Did 5 yesterday (Sunday), and 37 today. So today was a good day. However, I fully expect things to slow down soon. I'm dealing with some issues that are segfaulting on me, and I don't know why.

Also finally managed to find a solution to the input stream problem, thanks to folks on the SWIG mailing list. This involves writing an interface for std:ifstream, whose constructor only needs the name of the file. Suddenly you've got an input stream, in a form that inherits from istream. It's really a very elegant solution.

If I'm lucky, I'll have the minimum number of objects tested by the end of Wednesday. Thursday or Friday is probably more realistic, though.

Sunday, August 9, 2009

Up to 112...

Another late night, and I'm a little more than half-done. Found a couple more bugs, of course.

Finally exchanged some e-mail with Julien, the principal coder on the Bio++ project. My e-mails to the biopp dev list were getting lost in cyberspace somewhere. Not sure what to do about that. Hopefully though, Julien can answer my questions (such as to where I should submit bug reports).

Saturday, August 8, 2009

92 Objects Unit-Tested

I've backtracked a bit on my counting method. Before, if a new object to be tested seemed to be abstract, I just called it abstract and counted it toward my total. This made me a little nervous though, and I realized that it also meant duplicated testing. That is, if Object1 and Object2 both inherit from abstract ObjectA, I would be testing ObjectA's functions twice. So I've redone things, so that I use say Object1 to test ObjectA's functions, under ObjectA. So there are a few abstract objects that have been removed from my total--but I think the count is more honest this way.

92 are done, after 5 days. 30 of those are from utils, done in the first 3 days. 62 are from numcalc. At this rate, I'll have ~250 objects after my 14 days have passed.

Tomorrow, I think I'm going to stop working with numcalc, since most of the things in there don't get seen by end-users, anyway. I'm going to start with seq, where some of the real meat is.

Friday, August 7, 2009

Guess What 4*4 Is?

Recounted the number of classes I've done, and then I did a fair number today.  Current count: 84 (though some of those are abstract, and don't need direct wrapping). I don't know if things will speed up or slow down after this. NumCalc is yielding some interesting little features, that are proving interesting to wrap.

I've found at least 5 bugs in Bio++. Most recently, it tried to tell me that 4^2 = 15. I checked this out by coding in C++, without SWIG. Yup--not a SWIG bug. NumTools.pow(4,2) = 16. (I believe the actual calculation is done with doubles, and that it's a rounding error.) I don't know where to report this. I've written the Bio++ dev mailing list, and gotten back zilch in response. I don't know if they're ignoring me, or if my messages just haven't gotten through.

In the meantime, these bugs are slowing me down. Every time I find one, I need to check that it's not my own idiocy with SWIG that's causing problems. That's aggravating. I have a lot of money on the line, and I'm being slowed down by careless bugs. (Of course, I can't be too critical. Bugs are part of the Game, and I have to recognize that I'm under a lot of stress right now.)

Thursday, August 6, 2009

Utils Done, Working on NumCalc

Stuck on the Matrix object right now, and it's not pretty. There are strange bugs that I can't pinpoint, such as that right now double-matrices can compile, but int-matrices can't. It's the same code--just templated. Grrr.

Current score: ~40 objects thoroughly unit tested, over the last 3 days. Extrapolating, that means that after 14 days I'll have 187 done. There are 446 total objects, and 48% of that is 214. I'm going to be working solid during this time.

Wednesday, August 5, 2009

Utils Almost Tested

Well, today I managed to do about 15 objects within utils (much more than that if you count every kind of exception to be a different object). So by simple geometric progression, that means that tomorrow I'll do around 225, on Thursday I'll do 3375, and so on...

Seriously, this is getting faster, and I've found some important bugs and usage issues. (For example, I now have Python IDing many more templated objects, and making meaningful interfaces into them.)  However, there's still a ton of work to be done. I'm honestly not sure if I'm going to make the 60% of 80% goal that Hilmar set for me. (That is, 80% of objects implemented, and 60% of those properly tested. Thus, 48% of the total number of objects should be implemented and tested.) This coming two weeks are going to be rough.

Tomorrow I finish utils, and start with numcalc. If I can possibly get through seq by the end of the weekend, I'll be in good shape.

Tuesday, August 4, 2009

Python Doctesting

The good news--testing hasn't revealed any bugs in the underlying code. SWIG seems to be doing its job admirably.

The bad news--I managed to thoroughly test exactly one object today. Granted, it's the biggest object within utils: TextTools. And it was my first. However, given that it took all day--this really is going to be a race to the finish.

Monday, August 3, 2009

Today Was A Good Day

I had forgotten how fun it is to code when everything works right out of the box. Today was such a day. It took me about an hour or so to get utils working in Perl.  numcalc was a little more difficult, but it followed just fine. So then I tried R, knowing it to be tempermental. SWIG documentation for R is pretty spotty, since I don't think anyone uses it except for academics. But I got everything compiled before too long. The difficult bit was figuring out how to actually call my functions, but I figured that out by reading raw machine-generated R code. So then was Ruby, and that just sailed along smoothly.

So, I now have utils working in 4 languages:  Python, Perl, Ruby, and R. Java doesn't work, and I still have my doubts about it because it doesn't wrap as neatly (because Java's not a scripting language). numcalc works in Perl, and all 5 libraries work in Python. Libraries that haven't been wrapped yet in those languages should be very easy to do.

So then of course Pjotr had to write to ruin my fun. (I exagerate...I'd already come to the same conclusion he did.) What needs the bulk of my attention right now is the unit testing and documentation. So that's what I'm going to start with in the morning. I read through a lot of Xin's work this evening, and we'll see how much I can incorporate that.

Sunday, August 2, 2009

Git & Java

Spent some amount of time this evening working with Pjotr, trying to get my cmake files up to standards. A lot of this involved git--git still frustrates me, but I do learn.

Also worked with wrapping files in Java. I'm unsure if Java will really work. As near as I can tell, SWIG's performance in Java is quite different from the way it does scripting languages. And one of the things that's causing a lot of problems is multiple inheritence. SWIG starts screaming bloody murder whenever I try to wrap some object that inherits from multiple other objects. It works by just not inheriting from more than one parent object--other parents are just ignored. Multiple inheritence is used a lot in Bio++. Hence, the Java environment will be lacking several methods, and also not able to perform some basic assignment operations. Is it even worth having it available in Java, if the library will be severely crippled?

I think tomorrow I'm going to try Perl. Hopefully, that'll be highly similar to the Python I've already done.