Michelle’s Carpet Bag

We all have one. Reach inside, pull something out. Sometimes it’s useful, sometimes… not so much.

Wordle March 13, 2009

Filed under: Uncategorized — texasmichelle @ 2:21 am

I just found a neat little tool that’s not really useful in any way, just a bit of fun. I came across it while browsing through SoulPancake, Rainn Wilson’s new site. It lets you easily create an interesting word collage. Here’s what I put together in all of 2 minutes on wordle.net:

Wordle: About

 

Word Counts in Latex March 2, 2009

Filed under: Uncategorized — texasmichelle @ 7:44 pm

This is a function I need so often, I went in search of a quick and easy way to do it. With only two commands, you can display an accurate word count of your document, creating a pdf along the way & killing two birds with one stone.

The friendly commands are pdflatex and ps2ascii.

Here’s an example of the script I created:

  #!/bin/csh

  set source_file = ./document_name

  # Compile file & create pdf
  pdflatex $source_file.tex > /dev/null
  if ( $? > 0 ) then
    echo "Failed in pdflatex"
    exit 1
  endif

  # Convert pdf to ascii & count number of words
  echo "Word count in $source_file.pdf:"
  ps2ascii $source_file.pdf | wc -w
  if ( $? > 0 ) then
    echo "Failed in ps2ascii"
    exit 1
  endif
 

Sentence Alignment in Machine Translation February 26, 2009

Filed under: Uncategorized — texasmichelle @ 9:15 pm

Today was a big day, the culmination of months of reading, researching, and rehearsing. All for an hour’s presentation. Thankfully, it’s over with – I’ve never been a fan of public speaking. Here are a few highlights:

Aligned data is important for machine translation. It’s used to train the models that decide how to convert one language into another. The larger the volume of accurate alignments we can feed into our machine translation systems, the better the parameter estimations and the more accurate the results. Sentence alignment is that all-important first step in machine translation. Before words & phrases can be aligned, bilingual texts need to be broken down into bite-sized chunks. 

Fortunately, there’s a very high correlation between the lengths of sentences in different languages. For example, there’s a .991 ratio of lengths between English & German. This allows us to use Bayes Theorem to estimate possible alignment points. A high level of accuracy (~96%) can be achieved by looking solely at character lengths and ignoring the actual words themselves.

An even higher level of accuracy is achievable if we incorporate lexical information into the mix. By modeling our data with a Poisson distribution instead of a Gaussian and implementing various methods of search pruning, we can narrow down the search space so that we’re only spending computing time on alignments that are likely to be correct. By combining a modified sentence-length-based model with a modified version of IBM’s Model-1, error rates of less than 1% can be achieved. Some results are even better than hand-aligned data. Not bad!

Sources: 

 

Publishing to Multiple Sources February 22, 2009

Filed under: Uncategorized — texasmichelle @ 9:01 pm

I spent an hour or two this past week setting up my environment for publishing my electricity data. Once I was publishing to one source, I set to work publishing to Pachube. Surprisingly, this was as easy as unzipping the beta app for the CC128 direct from Pachube and typing in my API code and feed link. It was pain-free and worked like a charm. Until, that is, I tried to restart my original feed. Looks like the drivers that turn my USB port into a virtual serial port don’t magically allow multiple connections.

However, Dale Lane was kind enough to pass on some code that subscribes to my mqtt feed and sends the relevant chunks to pachube, piece by piece. This is such a simple and logical idea, I’m ashamed I didn’t think of it sooner. It’s great that I can now publish to two sources, but what’s even better is that it means the possibilities are limitless. I could have any number of scripts running, subscribing to my feed and posting the data to different sources. I’m getting giddy at the thought….so many ideas!!!

 

CC128 Setup Details February 20, 2009

Filed under: Energy Monitoring — texasmichelle @ 7:03 pm

In my excitement yesterday, I forgot to post the steps involved in getting my monitoring set up. It was mainly a matter of installing different pieces of the framework.

The first snag I ran into was connecting the CC128 to my PC. I stupidly assumed I’d have a cable to fit lying around, but this one’s a bit tricky. It’s RJ-45 on one end and USB or serial on the other – not exactly available at your local PC World, so I ordered it separately directly from CurrentCost. Likewise, the drivers were pretty difficult to find. None of the links on the CurrentCost site resulted in a successful Vista installation, so I searched elsewhere. I finally found a good install package and could continue.

I then downloaded the perl code linked on the Homecamp wiki. Once I had unpacked it, I updated the win_currentcost.bat file to reflect the new baud rate of 57600 and my local COM port. Andy has included a very detailed readme file that explains what’s necessary before running the script.

First off, I downloaded ActivePerl and installed it. Then, I got a hold of the Really Small Message Broker and installed and ran that. I also needed the Microsoft Visual C++ 2008 Redistributable package and the corresponding SP1 for that to work.

At that point, I was ready to test communications from my virtual serial port. For some reason, the serial port support on Windows doesn’t always work, so Andy recommended firing up HyperTerminal first to kickstart communications. To my surprise, I couldn’t find it anywhere on my PC. Apparently, it no longer comes with Windows. I obviously haven’t missed it until now. I found a good download and gave it a try. Lo and behold, XML came spitting out of my monitor – great! This turned out to be the magic ingredient, and Andy’s script worked beautifully afterwards. He then set up a page for my feed, which takes the information passed via the message broker and plots it into a plethora of colorful graphs.

For now, Andy’s code is only parsing real-time data. The schema has changed for history data, so the code needs a bit of tweaking before it can pass the right information to the dashboard. Once that’s set up, I’ll post the link to the consolidated page.

All in all, it wasn’t nearly as complicated as I expected. There’s enough documentation that I was able to do most of the configuration myself. When I got stuck, one of the many early-adopters had an answer for me. The Home Camp team has done most of the work already, making it relatively easy for people like me to get started quickly.

 

First publish from CC128 February 20, 2009

Filed under: Energy Monitoring — texasmichelle @ 12:18 am

Ah, the sweet taste of success. It’s that high you get when your setup finally works. The feeling of triumph when you’ve accomplished something challenging, small or large. In my case, it’s the pretty red graph that showed up, confirming I was, in fact, publishing my energy use online. A big thank you goes out to Andy Stanford-Clark for his technical expertise.

However, like any addiction, it succeeded in making me want more. Next up: publishing to pachube and twitter. Hooray!

 

The Genius that is The Joel Test February 18, 2009

Filed under: Development — texasmichelle @ 1:16 pm

One thing I love about my brother, Bryce, is that he encourages my geek side. If it weren’t for him, I might not have found out about Joel Spolsky. His site, Joel on Software, is a work of pure genius. The first article I read, The Joel Test, rings true in a hundred different ways. He puts into words so many aspects of software development that I find it hard to put my finger on. He so eloquently condenses the most important things about the everyday business of software development into a 12-point checklist, which I’m finding very useful in my current search for employment after graduation.

My favorite section of his article relates to quiet working conditions for knowledge workers. He tells a story of Jeff & Mutt, two programmers who potentially sit next to each other. The simple math involved perfectly describes why this is such a bad idea. Having spent nearly two years in a room with 400 other people in the heart of Canary Wharf, I can sympathize that peace & quiet is an absolute necessity and not an area to skimp on. It ends up costing so much more than the real estate expense.

Another part I had to laugh at is when he talks about the constant pestering from sysadmins about disk space usage. His whopping 220 MB usage in no way justified the weekly reminders. He makes the valid point that, given the cost of drive space these days, the 10 minutes it would take him to clean up his disk space would cost more than the toilet paper he uses.

But it’s not just The Joel Test that is such genius. The rest of the site is chock full of helpful resources for better codes of practice, including, but not limited to, job listings with a yes/no answer to each of his 12 requirements.

 

FOWA Tour ‘09 in Cambridge February 17, 2009

Filed under: Conferences — texasmichelle @ 1:39 pm

Yet another reason I love twitter: the Future of Web Apps Tour ‘09. Despite having attended FOWA London the last two years and staying fairly up-to-date with what’s going on in the webosphere, it somehow slipped through my radar that a mini-FOWA would be taking place right here in Cambridge.

The randomness of twitter never fails to intrigue me. Several times a day I check my feed and it’s a rare occasion when I don’t find an interesting link, news story, or conversation. This time, it was a reply by Keir Whitaker informing another user that there were still early bird tickets left in Cambridge. I can’t even remember why I started following him, but knowing that Keir works at Carsonified, the very talented conference organizers (among other things), my hopes started rising. I did a quick search on FOWA Cambridge and booked a ticket as fast as I could type in my credit card details.

I’m sure I would have eventually found out about the conference without twitter. And the fact that Wil Harris from Channelflip & Richard Moross from Moo are speaking would have caused me to book on the spot. But I would have paid full price!

 

Playing it Forward with Akoha February 15, 2009

Filed under: Fun — texasmichelle @ 9:33 pm

A recent episode of net@night revealed a new service called Akoha. The interview with Austin Hill described the idea for the real-world game: members play decks of cards, each involving small acts of kindness. When a card is completed, it is passed on to the recipient of the act of kindness, who then plays it on someone else. You can either buy a deck of cards yourself, or use the card number that someone has given you to sign up on the site.  The more cards you play, the further you advance in the game. You can track the movement of cards you’ve passed on and interact with other users via the akoha community.

Perhaps the most intuitive thing about this service is the blending of real and virtual life in an uncomplicated way. Sometimes doing something nice can seem ridiculous, but having a physical card to pass on gives you a good excuse. Anyone who receives a card can join the online community and play along. I’ve ordered a starter pack ($8.75 for shipment to the UK, $5 to the US) and am interested to see how people react. I’m not sure if it will catch on here, but I’m encouraged by the prospect of making the world a better place, one card at a time.

 

Home Energy Monitoring with CurrentCost & MQTT February 13, 2009

Filed under: Energy Monitoring — texasmichelle @ 2:21 pm

Just before Christmas, a busload of us from the Lab descended upon IBM Hursley, courtesy of women@cl. The long drive meant a whirlwind tour once we arrived, including demonstrations in the RFID & retail innovation labs. There were a number of presentations, one of which particularly piqued my interest. Andy Stanford-Clark, an IBM Master Inventor, enthusiastically described his home setup: sensors in his attic attached to mouse traps for monitoring disposal & cheese replacement needs, a virtual replica in Second Life that corresponds to actions in real life (turning on lights, speakers, etc), even a twittering house. He roughly described how the process works. He has a CurrentCost monitor for keeping track of electricity usage. This is connected to a machine running linux, which parses the XML from the monitor and publishes the results to the web in various formats. He uses the MQTT messaging system to publish web graphs for historical and real-time analysis, power his Second Life house, and, of course, send data through the Twitter API for a convenient way of keeping track of how much electricity is being used. Alerts are tweeted as and when he decides; for example, when electricity usage has been unusually high for a certain length of time.

Needless to say, I was inspired by his creativity and exploratory approach, so I started planning my own home system. I purchased a pre-sale version of the new monitor a few days ago, the CC-128, which looks to have some pretty robust features. While I’m waiting for it to arrive, I’ll begin installing the necessary framework. There is code publicly available for use with CurrentCost monitors, which includes detailed instructions for setup and use. Andy himself created the Perl version, which I plan on using since it includes the MQTT functionality. Once I get his code up and running, I plan on extending the existing Python code to incorporate similar functionality. I’ve been wanting to try out Python for a while now, and this is a good opportunity.

The code in the repository has been created by users like you and me, not CurrentCost. There is a group called Home Camp that meets regularly to discuss setup, new releases, and related topics for greener living via technology. I plan on attending the next meeting in April to get a good overview and iron out any issues I may have during setup. This seems to be an active, helpful community that I’m looking forward to getting involved in.