Archive for the ‘Uncategorized’ Category

I recently had to work with data visualization in JavaScript. Of course the obvious choice was to go with d3 for the visualization, but I needed something for the data manipulation. Preferably something declarative and having substantial experience with Linq using something similar came to mind. This post is meant as a getting started guide for using query-js. query-js is the npm module that resulted from my efforts.  query-js is a series of methods that lets you perform sequence operations on arrays and sequences.
Instead of giving a theoretical description of the module I’m going to work through a few simple examples on how to use some of the more important methods.

There’s a lot of public data available using REST APIs and often the data is in an easy to use JOSN format. One of these sources is EU statistics on Gross District Product. GDP is the district version of the Gross National Product or in other words an indication of prosperity in the district.

To get the data we would need to do a http request to the end point of the service and we are also going to use query-js (surprise!) so start by installing both request and query-js and the requiring both as well

 

npm install request
npm install query-js

var request = require("request"),
 Query = require("../query.js"),
 url = "https://inforegio.azure-westeurope-prod.socrata.com/resource/j8wb-jxec.json?$offset=0";

The URL that I sneaked in is the URL for the service endpoint. It returns an array of objects. Each object is in the format o

{
  "ipps28_2011" : "221.7",
  "nuts_id" : "BE10",
  "nuts_name" : "Région de Bruxelles-Capitale / Brussels Hoofdstedelijk Gewest"
}

The first property (ipps28_2011) is the actual GDP figure. The second one (nuts_is) is an identification of the district, where the two first letters are the country identification. With those information let’s see what it would require to get all the districts, and find all countries that have at least one poor region.

//get the data
request.get(url,function(error, response, body){
 var data = JSON.parse(body),
     query = new Query(data),
     //group by the first two letters in the district code aka the country
     byCountry = query.groupBy(function(d){ return d.nuts_id.substr(0,2); });
});

In the code we firstly request the data and parses it and then we can start on the Querying part. The first query we perform is to group by country (or actually by the first two letters of the district identification). The result of that is an object that can either be treated as a regular object or as a sequence of key value pairs. In other words all the sequence operations of query-js are available. So we could find the values for the nordic countries like this


    var nordics = byCountry.where(function(country){
        var countryId = country.key;
        return countryId === "SE" || 
               countryId === "FI" ||
               countryId === "DK";
    });

That would filter out all values for Sweden (SE), Finland (FI) and Denmark (DK). Norway is part of the Nordics but are not part of EU, so no data for them.

We could also look for all regions in the lowest category (ipps28_2011 < 50)


    var lowIncomeDistricts = query.where(function(gdp){return gdp.ipps28_2011 < 50;});

or what if we wanted to get all countries with at least one poor region?


     var countryWithLowIncomeDistricts = byCountry.where(function(country){
          return country.any(function(gdp){ return gdp.ipps28_2011 < 50); });
     });

That uses another of the sequence operations that query-js provides. the. any(predicate) method. It will return true if at least one element in the sequence satisfies the predicate. So in this case it will return true if at least one district in a given country has an ipps of less than 50.

now with a bit of query-js dirt under our finger nails, let’s tak a slightly more complex task. How about we find the average ipps28 for all nordic countries in the EU? We’ve already seen how to get the values for all Nordic countries so that should be easy. However this time we want to have all entries in the same collection instead of grouped by country. Then we’d want to extract the ipps28 from each of item and lastly we’d want to compute the average of them all. In list form:

  1. Filter on country
  2. extract ipps28
  3. compute average

We are going to use where for the filtering. Extracting data from an object is a projection, so for step two we are using select and there’s a method for computing the average.


   var avg = query.Where(function(gdp){
      var countryId = gdp.nuts_id.substring(0,2);
      return countryId === "SE" || 
             countryId === "FI" ||
             countryId === "DK";
   }).select(function(gdp){
      return parseFloat(gdp.ipps28_2011);
   }).average();

You should be able to see the three steps from above well represented in the code however we can shorten it a bit if we’d like. THe execution will actually be exactly the same in both scenarios (aka the performance will be the same)


   var avg = query.Where(function(gdp){
      var countryId = gdp.nuts_id.substring(0,2);
      return countryId === "SE" || 
             countryId === "FI" ||
             countryId === "DK";
   }).average(function(gdp){
      return parseFloat(gdp.ipps28_2011);
   });

As you can see, the only difference is that the projection is now passed to the average method instead of having a specific projection step.

There are many ways to skin a cat. We’ve looked at two different approach for finding the average of the GDP in the Nordic countries of EU. There’s a third one that will let me introduce another important method, namely concat.


    nordics.select(function(country){ return country.value: })
           .concat()
           .select(function(gdp){return gdp.ipps28_2011;})
           .average();

concat comes in three flavours. We will look at two of them. One that takes no arguments and concatenates the elements of the sequence (the elements themselves have to be sequences) and the other one is used below and is a short hand for first projecting and then concatenating. THe performance of them is the same since the second version is implemented based on a select and a concatenation.


    nordics.concat(function(country){ return country.value: })
           .select(function(gdp){return gdp.ipps28_2011;})
           .average();

There are still more ways to skin this cat. Often you would want to concatenate then do a project of the elements of a sequence of sequences and concatenate the result. The method selectMany does just that. Instead of iterating over the elements of the sequence it iterates over the elements of the elements of the sequence and produces a new sequence. So the above could also be written as


    nordics.select(function(country){ return country.value: })
           .selectMany(function(gdp){return gdp.ipps28_2011;})
           .average();

First we have to project the sequence we already have, because what nordics holds a sequence of key-value pairs where the value is another sequence. By selecting the value of each we end with a sequence of sequences. On which we then call selectManhy on and we then get the average of the projected values.

We can actually shorten this slightly. selectMany can accept two projections. The first one then being for the value of the outer sequence and the second one being for the elements in the inner sequences. THat is the above could also be written as following


    nordics.selectMany(function(country){ return country.value: }, function(gdp){return gdp.ipps28_2011;})
           .average();

and since select has a shorter form we can rewritten slightly. If the argument provide to select is not a function but a string, select will treat it as a simple projection returning the value of the property with the name given by the string. That is

sequence.select(function(e){ return e.value;});

is semantically equivalent to

sequence.select("value");

and since selectMany internally uses select we can write our example as follows


    var averageForDistricsInTheNordics = nordics.selectMany("value", "ipps28_2011")
                                                .average();

(*) This is likely going to change, so that you will have to explicitly add them.

Thou shalt not lie

Posted: March 5, 2013 in Uncategorized

Let is be said that I’m not pro piracy and I’m very much against skewing facts to suit your own cause.

The title is the ninth commandment and though not a strong believer my self I do believe in fact that we need certain rules for society to function. The ability to detect when people are cheating is actually by nature viewed as extremely important, important enough that our logic genetically is shaped for this.

Even since the dawn of Napster we’ve had stories about music and movies and how that has impacted the earnings of the Music and Movie production industries. How markeds are affected or not affected by external changes, in this case the emergence of new technology is pretty hard to prove and often is just speculated. However once in a while the empirical evidence points strongly in one direction but when that direction is counter-intuitive it becomes easier to disregard it.

The evidence that supports the musicians did not suffer dues to piracy is abundant. Take the research article “The Swedish Music Industry in Graphs, Economic Development Report 2000 – 2008″  which concludes that in that period earnings of the musicians grew 35%. As a whole the industry earnings stayed at the same level in that period, with the record companies as the only users. The copyrights organisations send more money to the rights owners not less. But the record companies suffered a massive set back from 1700 mio SEK to 800 mio SEK and just who has wage wars against file sharing? well some musicians but mainly the production companies. The study is not on the global marked but only the swedish marked but when talking about piracy a notable marked since the most know torrent tracker the pirate bay origins from Sweden.

What happened in those 8 years? IFPI has an idea of what made have changed the marked (from the 2004 report):

“2003 was the year we proved that consumers would pay for digital music – it is absolutely clear there is a market. This has injected a huge confidence-booster to labels, to investors and everyone who is looking at it as a business to get into.”

Not only is it interesting that IFPI acknowledges that there’s a huge marked potential but it’s also interesting that the year is 2003. The first major online digital music stored opened April 28, 2003. Not only does IFPI acknowledge the potential, the success of the iTunes music store (later renamed to iTunes store) is obvious

iTunes sales graph

iTunes quarterly sales report

within the time frame of the report iTunes store had sold more than 8000 songs according to IFPIs own statistics at Q4 in 2003 only 275,000 tracks were available (which was an increase from 210,000 the quarter before that). It is in the same report noted that

“An IFPI survey conducted in Australia, Germany, UK, USA and Canada showed that total music consumption (legitimate physical formats plus all digital downloads) actually increased by 30% between 1997 and 2002. Total online consumption of music – the vast bulk of it unlicensed – amounted to approximately eight billion tracks” or the same number of tracks sold by iTunes alone from 2007-2009. and finally the report also notes that the number of simultaneous files on FastTrack system dropped 33% from April 2003 to January 2004 and postulates that this was due to RIAA targeting the users of this network”

Interestingly enough April 2003 was when iTunes was launched.

Skipping a few years a head to 2008 where the first beta of spotify was released it was soon realized that was a serious competitor to pirated music and a study by Swedish Music industry supported this and in  2009 it was found that the number of people who pirate music has dropped by 25 percent and the sharp decrease coincides with a the interest for the music streaming service Spotify.

Could it be that the recording industries slow adaptation to change is the real culprint? At least that wouldn’t be the first time. The same industry proclaimed that radio would kill music but three years ago tried to make FM receivers in cell phone manadatory. The radio didn’t kill the music on the contrary radio has been one of the best places to advertise music. Record labels rutinely supply and will even pay to have music played at the right time, some contradictory to what the industry proclaimed radio revitalized the recording industry when the industry adapted to the change. The above cited study that found that music pirates buy ten fold as much as there non-pirate counterparts also found that the two of the main reasons the sales have gone down are that with digital downloads you no longer buy bulks of music aka albums but cherry pick the numbers you want and simple apathy. Spotify and Netflix and the like have shown how the industry can adapt to the change and the latest numbers show that people follow. Just as the availability of digital downloads (iTunes and others) had an impact on piracy so does the increased availability that spotify brought to the marked in 2008. In general supporting that the key factor is not the price. It’s not that it’s free that’s the reason why so many have turned to piracy but the availability of the desired products (E.g. hit single vs. Albums with hit single).

Do I think piracy might have played a part, it’s intuitive that it has and sounds reasonable yes but on the other hand it’s also counterintuitive that you can make money by giving away your products but the history of open source software proves otherwise and it’s just as intuitive that  no one would accept to buy 14 liters of yoghurt to get the one flavor you really liked and 13 you kinda liked but never really got to consume. Anyone in that situation would go to where you could just by the 1 liter you desired. Would that be visible on the yearly revenue of the diary industry? of course it  would!

As I started out by saying do I support piracy, no I don’t it’s illegal. Do I to some extend think the law that defines copying information as flawed, yes I do. Do I think it’s the main reason to the problems the recording industry have had? Not really, I think it’s the availability (or lack thereof) that’s the real reason. A statement that’s supported by Mike Lang, CEO at Miramax: “The music business is suffering because it allowed too few players to flog content, presumably fearful of their content being stolen. Encouraging Apple’s iTunes at the expense of others is effectively strangling the industry” and with spotify and others it would seem that the recording industry it self are starting to pay attention to that.

No one knows what the revenue statistics would have been if the recording industry instead of spending billions on fighting the tide of changed had spent those money on adopting to change, listening to the demands of their customers? But with the latest numbers that shows an industry in progress largely due to services such as Spotify and Netflix history repeats it self. Radio didn’t kill the music and neither will the internet. Who’s knows perhaps sometime in the future music might die and history tells us the most likely cause will be the industries lack of will to adapt to change.

 

Things that makes me go hmm

Posted: May 30, 2009 in Uncategorized
Tags:

One of my favorit bloggers Eric Lippert wrote recently a post he called “Things that makes you go hmmm”
I’ve written a few hmms in other post bu t just stubled across another one today.
In c# it’s perfectly legal to declare a constructor protected in a sealed class!
Kind of strange. How would you instatiate that class (without using reflection)

During my development of ProS I’d had to dig into Reflection.Emit again and one thing that keeps puzzeling me is the way the class in Reflection.Emit is designed.

Take for an example Typebuilder, the corner stone of creating a new class using Reflection.Emit. It inherits from Type however most, if not all methods it inherits from Type is overriden with a version that throws an exception explaining that that particular method is not available on TypeBuilder. The same goes for MethodBuilder, PropertyBuilder and all the other builders That all inherit from xInfo.

The design decision is a hefty violation of the liskov substitution principle (LSP) and for just that reason I’d say they need a really good argument for it. It basically results in yoou having to build a framework for reflection on your newly created types and members to be able to create a compiler. It’s a trivial but time consuming process to make that reflectio possible and well since it’s trivial I can not find that really good argument for violating LSP.

Needless that Im a huge fan of designing by contract, that’s basically my driving motivation for creating ProS, and devolping ProS I hope Im not violating LSP. If I am that is going to give me even more grey hair when I get to testing the plugability. The supporting structure of the plugability is LSP.

Im working on a language at the moment, which i’ll blog more on later. During the development I came to a point where I needed to actually create assembly files to test the result of the compiler. I’ve writte a few compilers using Reflection.Emit before and I’ve always come to a point where everything seems to be working fine. Using reflection I can see all the methods and all the classes but when I write it to file. There’s absolutely nothing but the manifest.

I seem to forget the trick everytime. If the module name is different from the filename it just wont work. I have actually no clue to why that is. But I had the problem today and with one single change namely making the module name the same as the file name, I solved the problem. If you don’t need to save your assemblies to disk, your basically free to choose the module name.

One of my collegues had an intereseting error report from our test suit today:

Error: “Expected 3 but was 3”

At first glance it might seem that everything is as expected!! actually it’s not but finding where the error is might be a little tricky. There’s not a lot of information to start with.

The problem is a some what shaky implementation of the unit test framework we’re using. The line of code that reports the error looks like
ASSERT_EQ(++errorCount, errorHandler.Errors.Count);

I personally don’t like the ++errorCount argument but would expect it to be fine except ASSERT_EQ is a macro and the argument is used once for comparison and once for outputting and hence incremented twice

Moral: If you’re not 100% sure what a macro does and you need it to work on a value pass a value. Only pass expressions when the macro needs an expression and not a value.

Changing habits

Posted: February 13, 2009 in Uncategorized

I’ve been reading a few posts on http://www.hanselman.com/blog/ lately under the category “I suck”. A typical Scott way of putting things and very educational.

It’s so easy to just say every one else are stupid and if they would just listen it would be so much easier and I do it a LOT.

So of that is due to my job requires me to analyse every one else job and find where they can improve. It’s a lot easier to improve what people are doing badly than improving what they are already doing well so I focus a lot of my work on the first but that’s just part of the reason. It’s a lot less costly to find some one elses faults thatn your own or at least so it seems.

I used to work in a company that made different measuring devices and our tag line was “If you can’t measure it, you cant’ improve it” I used to use that a lot a few years ago when I was envolved in pro sport but after retirring from that I’ve kinda forgotten it again. Reading Scott’s post reminded me of the value of measuring yourself once in a while.

Part of my work is changing peoples habits and one of the things I need to improve is the way I communicate what they should do differently. If you take a look at some of my posts you’d see that I’m quit capable of telling people what not to do! that’s seldom a productive way to change habits.

Try saying the following to your kind one day “Don’t put your toast in the VCR” (only works if they actually knows what a VCR is and you actually have one 🙂 ). The NLP community would call that an embedded command. The visual cortex is not capable of visualizing negations so all the kind will visualize is “put your toast in the VCR” the chances of that visualization helping you reach your goal of no food in the VCR could be improven if you told your kind to eat the toast or asked “how does you toast taste”. After all they can’t answer that if the toast is already in the VCR.

The point is that Im sure I would have more success in changing peoples habits by telling them what to do instead of telling them what not to do.

I personally reacts badly to people telling me what not to do. If my girl friend tells me I spent to much time with my computer and probably end up getting made if she says that to many times. Not because I wouldn’t gladly prioritize her needs. I just hate guessing what they are. If however I got questions like “Wanna go to the movie with me/Sit in the couch and chat” or what ever she would want to do she can’t do because Im working/playing poker at my computer my reaction would probably be a smile and “sure honey”

I might be more extreme in my reactions to being told what not to do. It easily pisses me off but Im sure that Im basically no different from others.

If you want people to do things differently tell them what you want them to do, not what you want them not to do. So instead of telling my self not to tell people what they shouldn’t do I think it’s time I told my self that I should tell people what they need to do.

and as a final note never tell people the should do somthing you want them to do and never say they shouldn’t do something if you really don’t want them to. Say youi would like them to do something. Why? imagine yourself standing in front of an open fridge look at your favorite chocolate cake and a brocolli. You’d probably be thinking I should eat more brocolli that would be so good for me when you look a the brocolli while when you look at the cake you’d think I really shouldn’t have more cake. Im betting you rarely ends up in the couch enjoying a big chunck of brocolli. Usually our “shouldn’t”s are rewarded and our “should”s are followed by punishment. 

Don’tget people to  associate want you what them to do with punishment by saying they should instead use words as “need”, “wish”, “like to” and similar

Being part of an embedded project team I was originally looking forward to
a lot of complex technical challenges but as time passed I realized, as
often is the case, that the success of a project is a lot more about
people, communication and motivation than about hardcore tech skills and
plain competencies.

To set the scene
The project team I’m part of consists of 18 persons where only a few had
done agile development be-fore they got introduced to in halfway through
this project.

The main feedback the development group is getting and has historically
been getting is from the system test group.

That feedback could in the past be boiled down to “We’ve discovered X new
faults and the following Y faults block our progress” repeated a few times
a week whenever the situation changed.

Over say 14 time boxes where the predominant feedback have been “What
you’ve done is not good enough for us to continue with our system work”
even the most hardcore developers I know would have suffered a blow or two
to their motivation. After all we all want to hear “Good job” once in a
while.

The above description is a bit exaggerated but looking at how people
behave and talk I’d say it’s reasonably close to the essence of how
people feel.

So what I originally thought would be a technically challenging architect
task has turned into a motivational task more than creating UML diagrams
and code structures.
And when it comes to being an architect the way every new idea is
presented needs leave a feeling appreciation and motivation in the
involved developers.

The first of two main focuses we’ve looked at so far is communication
within the project.
The system group now emphasizes how many observations (positive word for
bug) the developers have resolved and the testers have verified and no so
much how many unresolved bugs the system has.

First day they did that even the Head of the department was pleased with
the “progress”. I’ve put progress in quotes since there weren’t any
progress from Friday to Monday but the nonexistent progress was valued
because it was perceived in a different manner.
Knowing that you’ve solved 250 bugs and there’s only 40 known issues just
sound a lot better than “There’s still 40 new/unresolved observations”.

The second one is changing how people think about testing and finding
bugs. At present people fear bugs they are considered feedback of poor
performance and hence people test to prove that the code works (which is a
mathematical impossibility within finite time)

This time box we’re going to have a war game of testing. We’ve got 3
development teams and they are all delivering code 4 days before the end
of the iteration and after that deadline, every group is permitted to
write unit tests and unit integrations test to the code of the other
groups. The group that end up finding the most bugs wins the great prize.
The war game has a few objectives seen from a project perspective.
1. We want to change how people write tests. They should be written to
find bugs not to “prove” that the code works.
By actively rewarding the act of finding bugs we’ll change the focus from
“proving that it works” to “finding what doesn’t”
2. We want the coverage up and since there’s two ways of winning: Either
write a lot of tests finding “all” the bugs yourself or write a lot of
unit tests finding a lot of bugs in everyone elses code we’re sure to get
a higher coverage

As secondary objectives it’s worth mentioning that focusing intensely upon
writing unit test while at the same time being educated in writing unit
test hopefully makes the developers realize which code structures are more
robust than others and finding some of the qualities of testable code.

For me it’s back to holding my thumbs, crossing my limbs and hoping it all
works. Luckily I’ve played the games before so I know they do, and I guess
all there’s left to say is:”Let the games begin”

Single Responsibility

Posted: September 17, 2008 in Uncategorized

A few days ago I was debating some architectural changes with one of the developers in the team Im currently working in. I had to make it apparent to him what values it would give to our project if we adheared to the single Responsibility Principle.

After giving the theory behind the principle ago we hadn’t really progressed. However he gave me an idea that made it possible to convey the importance of the stated principle.

Imaging we have a class that hold the outside temperature and that same class can communicate via radio with the nearest weather station.

This class is used in an observer pattern, so some where in our application other parts will be updated (I.e the screen reflects the latest changes in temperature)

Our class has three states

valueAccepted
communicating
idle

valueAccepted is set whenever we get a new temperature reading and reverted to idle and the temperature reading is invalidated  by the first read thereafter.

communicating is the state when we are communicating with the weather station.
After testing the first pilot of the code it’s realized that everything works 100% as expected and every one is happy.

So far so good but the class has at least two reasons to change. The communication protocol with the weather station changes or the temperature functionality changes.

To see why this might be a problem let’s change the communicatio protocol slightly. The only change is that every time we sent a command we need to accept a value. This causes one of the developers in our team to sent the object into valueAccepted state.

He thereafter tests that the radio communicatio works again and since the only change had to do with the radio funtionality no one thinks of testing the temperature reading before the new versio is released.

Shortly after it’s realized that the temperature readings are invalid during radio communication.

Had the single responsibility principle not been violated the radio communication would never have been able to alter the temperature flow.

I’ve been thinking about blogging for some time now and today turned out to be the day i started doing that.

I was reading this post by Scott Hanselman which is what gave me the idea to the name of my blog.
Since this is going to be the first blog on my behalf I thought way not explain why I named it “Failing fast…” after all most people try not to fail at all.

A few years back I was inline skating a lot trying to make it to the national team only to end up seeing people i had raced (and for a big part of them beated) the entire season. I got angry real angry but I also decided that the year after I had to make it a lot harder not to bring me a long. So I hired a coach to help me with the mental aspects. That turned out to change how I look at tasks in general not just in sports.

On his business card it said “The will to succeed has to be grater than the fear of failing”. Easier said than done but just thinking about that did help and some where down the road I figured out that most of my successes were build on a heap of failures. Or more exactly they were build on what I learned from failing.

Back in the world of skating I had realized that I was holding my self back. I was affraid of what would happen if I was betting everything on one chance to win, I might spent all my energy and wind up last or I might actually win. By not trying I was however ensuring I wouldn’t win. Winning in sport involves gambling, you cannot win on national level with out gambling, no one is that good.

However the more experience you get in gambling the better you will get at it, you will eventually learn when to start your sprint for the finish line, when the try and break away and when just to sit back and let some one else do the work.

Sometimes what seems as the thing not to do might just be what you should be doing. One of my favorite interview replies came from Jacob Piil after winning a stage in The Tour de France. A journalist asked him why he made his break away with 6km (~4miles) to go in a strong head win (that’s usually a very bad idea). The reply was simple “My legs were hurting badly so I thought the others were hurting as well, so they would probably not follow”.

I’m sure somewhere in the back of his mind, a little voice was telling him that it was suicide but having tryed a lot and failed a lot. He knew he’s own body well enough, to know when to change the rules of engagement to succeed.

My point is, that there’s much more information in failing than in succeeding and trying to avoid to fail is as easy to do and as prosperous as trying not to eat. It will hurt and at some point be the end of you.

In the end we all want to succeed but it’s inevitable that we will make errors on the way to success. The faster we fail the sooner we can correct the errors. That realization made me change the way I conduct my work to make errors stand out as fast as possible instead of trying to hide them and as of today that realization has become the basis of the name of my blog.