Monday, August 17, 2009

Goals Met

Got the doctests all prettified.  They are located in biolib/src/mappings/swig/python/test, and called test_bpp_*.py, where the * is utils, numcalc, seq, and popgen.  I have notes for every class, and also for functions whose use isn't very clear.

Just checked over Hilmar's e-mail with the goals he set for me.  I was slightly mistaken--he wanted 60% of the classes wrapped, with 80% of those tested--not 80% with 60% tested as I thought.  I'll quote him directly:

1. Map for Python with SWIG 60% of classes and templates of all Bio++

2. 80% of all methods in those classes and templates have complete doctests
with a clear description of what the method is supposed to do.

Now, I rechecked my list of objects.  It turns out that there are 416 classes, not 432.  That's because 16 of those are nested classes that SWIG can't handle anyway.  The number I have wrapped is 414--I could never get numcalc/LUDecomposition to work, and utils/BppString was also ill-behaved (and totally useless for this project, anyway).  That's 99.5%. The number I've tested thoroughly kept creeping up and down slightly as I edited my doctests, but it settled out at 208.  That's exactly half of the total that I have completely unit-tested.  So that achieves the 48% (80% × 60%), with a little to spare.

Am I happy with this?  Yes and no.  I think it's obvious that I managed to put out a lot over the last couple of weeks, and I'm satisfied with my progress.  But it's still short of the goals I set out to do.  Yes, I think those goals were ambitious.  When I set them up, I hadn't looked at the Bio++ code as closely as I should have.  I should have known that doing each library, wrapping and testing, in a week was too much.  But I said I'd do it, and I still feel responsible for it.  So there we go...

I'm also more in touch with Julien and the Bio++ coders, and I think I'd like to switch gears a little now, and maybe work with them to make the code a little more SWIG-friendly.  We'll see how that goes.  I also need to work with Pjotr to get everything incorporated into the whole biolib framework.  So there's still stuff to do.  This autumn I have a teaching gig at a local college, but that's not going to take all of my time.  So I'm going to keep working on bits and pieces of this.  Despite the last month of high stress, I still feel that this project is a good one, with a lot of potential to help scientists do what they need to do, with a minimum of pain.

In the meantime, it's past 3:30 AM, and I'm probably blathering.  So I'll sign off.


Sunday, August 16, 2009

Not Much to Report

Went through the utils doctest today, adding comments where I though aspects were unclear.  I'm pretty happy with that document.  Just need to do the other three now...

Also found a place in the doctest where I forgot to demarcate a class.  Thus I have 209 objects, not 208. Woo hoo.  Not that it's a big object--it's just an exception.

Saturday, August 15, 2009

Hilmar's Goal Met... least technically. I'll get to that in a second. Of the 432 objects in Bio++, I have the vast majority of them wrapped (I think all but 3), and 208 of them doc-tested. Hilmar asked me to have 80% of them wrapped, and 60% of those documented.

Managed to uncover some of the formats by doing some fair involved web hunts. Did Clustal and DCSE. Both of those inherit from a couple other objects, so I could do those too. Also did one of the static "Tools" objects. So yes, I met the quota without "cheating" by unit-testing übersmall Exceptions. (Found out that the Clustal format that Bio++ does is out-of-date, but these can change so fast that that's not totally unexpected. More serious is the off-by-one error that causes it to chop off the first amino acid in a protein.)

Finally figured out why I hadn't been able to post to the various Bio++ message boards, even though I've been getting all their messages. I subscribed with my gmail address, when I was trying to send with my address. It's sorted out now. Actually, I was thinking of doing some coding for them this fall. I'll have some free time, and I think I'll be able to do more good for this project there, working on Bio++ code to make it more palatable for SWIG. At the very least, I'm hoping that I can get all the compilable code into the .cpp files, rather than the .h files. That would make a lot of things easier for SWIG.

So...I think now I've technically met the goals that Hilmar set for me to get a passing grade. However, the doctests are ugly. Basically, they show a ton of examples using the code, with barely any explanation. But I need to make sure that these documents are useful. So...this weekend I'm going to work on beautifying them, so that they might help someone actually use this tool.

Friday, August 14, 2009

203 (So Close...)

Wanted to finish today, but didn't quite make it.  Did 15 objects today. Many of those were quite large, in fact (hence not quite making it).

I also managed to address the const vector problem. I managed to get constructors that returned null objects, but otherwise didn't crash. (At least, until you tried to use the null object--then it's segfault city.) It turns out that when you use %extend to create a new constructor inside of SWIG, it's not really a constructor. It's more of a pseudo-constructor, that looks like a constructor from inside of your scripting language. Thus, you need to explicitly return something--hence my null value. Once I added that in, it was golden.

So what objects am I going to unit test for my last 5? Ideally, I'd like to do more file-formatting stuff, which Bio++ uses to read and write files of specific types. Unfortunately, I haven't been able to find example files for most of those, and the descriptions I've found have been lacking. I asked Julien and company if maybe they had some example files--after all, they had to debug them using something. But, I haven't heard anything back.

Option #2 would be to just knock off some Exception objects inside the phyl library. This seems like cheating, somehow. I mean, Exceptions are quick and easy, and I could have my 208 objects within 20 minutes. But having those unit tested doesn't buy us that much.

Probably, I'm going to slog through the few functions remaining in seq, which involves a large number of static utility functions that do random things. Those will be slow to do, but it seems more responsible. After all, if I didn't have this deadline looming over me, that's what I'd do next. Having those tested is really worth something.

Thursday, August 13, 2009

188 Objects (of Beer on the Wall)

Did 28 today. Part of that is because I started working with popgen. Part is because I had a breakthrough with seq, that let me do several more objects there. The seq library is almost all tested, in fact.

Spent a couple of hours hacking around with the const/vector problem. Got a suggestion that I should try to create an extension to the VectorSiteContainer object, that would take vectors that aren't const. I tried several things...none of them worked. Not sure why--looking at the code, it seems that we don't need to store the original address of the vector anywhere. And yet, the vector is passed by reference. I really don't know why they decided to do that.

Tried to post to the Bio++ boards again, but I don't think it got through.

So...I'm at about 43.5% unit tested.  Hilmar's goal is for 48% to be done. I have 20 to go. I'm going to try to finish that up tomorrow, and then spend the weekend going over my doctests and expanding them, so that they're human-readable documents. Fact is, Bio++ needs much better documentation. And given that I've now deduced about half the methods (often the hard way), I might be the best person for the job.

Wednesday, August 12, 2009

Only 160

Six objects today. Not a good day.

Spent a couple of hours hacking around with VectorTools. This ended up not being a good idea. Well...yeah, I got a lot of it tested. And it is an important class. But it turns out that sections of the thing give SWIG indigestion when it tries to compile the wrapper files. This seems to be a really solid reason to put your code in the .cpp files, not the .h files. If these bits of code were already compiled, tucked away in their object files, I wouldn't be having these problems. Instead, it's in the header, where it gets entangled with SWIG, and SWIG starts having problems with certain templates.

And then I spent a fair amount of time today working with VectorSiteContainer. The main constructor to that requires a vector as one of its arguments. I tried for a very long time to get SWIG to wrap such an object, with no luck. The closest I could get was a vector, which isn't good enough. Every time I tried to add that "const" in there, the SWIG-generated wrapper file wouldn't compile. Finally, very late at night, I threw a request for help to the swig mailing list (and also CC'ed to the phyloinformatics people), but I don't expect a response soon.

Update--message came in, as I was writing that last paragraph. It turns out that this is a known bug in SWIG. Which means that it's not going to be fixed this week, which means that there are 27 classes in seq that I won't be able to test.  So...I'm going to have to move on to phyl and popgen.

On other piece of info, though. According to Bio++ docs, there are 432 objects. Multiplying this by 0.48 yields a little less than 208, so that's my goal. That means that I now have 48 to go.

Tuesday, August 11, 2009


Did 5 yesterday (Sunday), and 37 today. So today was a good day. However, I fully expect things to slow down soon. I'm dealing with some issues that are segfaulting on me, and I don't know why.

Also finally managed to find a solution to the input stream problem, thanks to folks on the SWIG mailing list. This involves writing an interface for std:ifstream, whose constructor only needs the name of the file. Suddenly you've got an input stream, in a form that inherits from istream. It's really a very elegant solution.

If I'm lucky, I'll have the minimum number of objects tested by the end of Wednesday. Thursday or Friday is probably more realistic, though.