So now that I’m working on Bothan for the remainder of my fellowship I am puzzling through what it is that distinguishes Bothan, and I figured it worth doing so out loud / in the open. Let’s consider how it is defined within the Roadmap

Bothan is a simple metrics storage and visualisation tool. Amongst other things, it powers the ODI dashboards. It provides a REST API for storing and retrieving time-series data, as well as human-readable views which can be customised and embedded in other sites, allowing it to be used for building dashboards. It is designed for open publication of metrics data and includes licensing metadata for simple generation of an Open Data Certificate.

The USP of Bothan is all there in the first line – a storage AND visualisation tool. There are plenty of OSS Dashboards out there but not many of them are also concerned with providing a way to store (and thus publish) data in addition to visualising said data. So let’s dig into this dual aspect of Bothan a little more.

storing and retrieving time-series data

Labs created the bothan-deploy repository to make the process of publishing data on the Heroku platform as painless as possible. The one button deploy sets up a Bothan instance on your Heroku account (you must sign up for heroku first obviously) and avails of the mLab sandbox database plugin. This will support up to 500,000 metrics, or put another way 500000 data points or entries of data. That’s how and where the data resides.

Bothan creates a RESTful API for whatever data you’re storing – which means that retrieving data from Bothan adheres or conforms to a recognised way of ordering data stored on the web. Bothan’s well crafted RESTful architecture means it interoperates with Zapier like a breeze.

human-readable views

That sentence might seem a little off to the non-developer but it makes sense when considering that everything written so far underscores Bothan’s machine-readable credentials. Bothan’s menu of human readable views seems to have emerged from the dashboards that Labs created for internal use by the ODI, with lots of the same ‘views’ repurposed within Bothan. To me this indicates a preference of emphasis for Bothan that factors into how Bothan co-exists alongside or competes with other dashboards. The menu of visualisations is limited – and there is perhaps a benefit to that in terms of the virtues of constrained interfaces (this seems to be a theme in Labs – Comma Chameleon has been analogised as markdown for CSV, implicitly signalling the utility of a constrained interface). It could also be a problem in terms of retaining users – if the visualisation they desire is not present in the menu maybe they’ll move on to another dashboard service. I see two rebuttals to that. First, by being a storage and visualisation platform Bothan has uses beyond simply visualising the information. Secondly, a limited menu of visualisations strikes me as an interesting way to kick start user contributions to the issue backlog: suggest a visualisation based on the data you want to store.

designed for open publication of metrics data

This is definitely the most interesting proposition of Bothan, the delineation of metrics data as a category of open data unto itself. Technically speaking there’s a distinction between time-series data and tabular data, exemplified in Bothan being a separate product to Octopub despite being kindred in spirit: both serving as ways for someone to take their first steps publishing open data. But I’ve also gotten the sense that metrics data may be a more palatable material to publish than other sorts of data. The best angle I have in on that is that metrics can represent an abstraction of an underlying dataset and as such metrics could be published as open data while preserving the underlying datasets status (or preferred status) of closed data. But honestly this is the aspect of Bothan that remains the most underspecified in my estimation.

That’s why I’ve written this post – to see if anyone else has ideas on how this functionality of Bothan can be put into dialog with wider open data dynamics. I’m hoping to delineate the ambition of or for Bothan to help guide my work on it.

Triage is one expedient way to increase the healthiness of a given repo, so that’s the first task I have set myself for Bothan. This has involved the creation of a new subset of labels within the Toolbox and the Bothan repository: Priority, Severity and Estimate. The former two are used as means of allocating the urgency with which an issue needs to be resolved. The latter – Estimate – is employed to gauge the level of effort or difficulty a task might entail.

Assigning these labels to the Bothan repository is definitely a trial run exercise. At present Bothan is still in the Alpha stage meaning that most of the issues filed against the tool emanate from developers who have worked on the code. This provides me with a good sandbox to trial a label triage workflow. What it does not give me is any experience of how to use labels to manage a community, multi-author, state of play on a codebase.

The severity label is intended to be used exclusively with bugs. So that’s how I began my sweep of the repository, filtering through to the bugs and allocating a critical, serious or moderate label. The objective in doing so is to get a sense on which bugs need to be tackled, remedied, or resolved first – its a prioritisation exercise. After sweeping through the bug tickets in this fashion I was left with 4 to look into.

However I deem that working on Bothan is not just limited to the main repository – the two client libraries developed by ODI Labs also warrant attention IMHO. So I’ve exported the label scheme in use within bothan to both these repositories.

After expanding the labels and doing an initial triage sweep this I decided to be more strategic in my use of issue assignment within my Github account. In doing so I discovered that I was just as guilty as the other developers on the codebase for leaving issues which I had closed open. Moreover I had assigned myself to tickets that I’d never made any progress on – including some open since 2015 (ouch!). I figure it best to lead by example, so I’ve reduced the tickets I’m assigned to. Henceforth I will just tackle issues from Bothan and any issues that relate to the Toolbox level of specificity.

When I reviewed outstanding issues assigned to me I had need of the priority labels – these labels are a more generic way of labelling priority and will apply to all tickets that are not overtly bug related. Administrative labels such as ‘specification’ and ‘requires-replication’ have also proved helpful ways of ordering the work to be done.

The objective for the final third of my placement is to work on a single tool from the Toolbox. In doing so I’ll improve the healthiness of the tool in questions and – if possible – nurture the beginnings of a contributor community. I’ve been encouraged to consider the latter an added bonus rather than an objective I should be disappointed about not achieving – given the time available to me.

‘Healthiness’ is a proxy term for what I have to date termed ‘other-developer-ready‘. Overall I understand it as making a given repo more hospitable to newcomers.

Last week Olivier T provided me with some excellent mentoring and frameworks for deciding which tool to focus the remainder of my placement on. Heading into the meeting the choice of tool was between CSVlint and Bothan. During the meeting Olivier threw in a curve ball that he would also let me work with another organisation with a more established framework for doing OSS development, briefly bumping the number of options in front of me to 3.

By using a ‘criteria’ matrix, and reflection on the act of using that matrix to make a decision, the number of options were whittled down. That left Bothan and a possible secondment to another OSS team as my options. Earlier in the day I’d heard OKFN praised for their ability to foster an OSS community so they served as the placeholder for the secondment option.

Having slept on it, the pick is Bothan. My decision has less to do with feeling weird about OKFN secondment and more to do with wanting to work on Bothan. It has a mix of what I need in terms of coding skill. It would be great for the CV in terms of demonstrable familiarity with relevant technologies. During last week I’ve done some reading on dashboards in preparation for an internal client meeting and I’ve realised how much I’ve enjoyed immersing myself in the theory around them. On a doctoral level working on Bothan gives me praxis-based familiarity with data-viz that can inform my thesis without generating material for it (a stipulation of my CHASE funding) by existing perpendicular to my research area.

My choice of Bothan is significant in another aspect – a secondment to OKFN’s OSS community would hypothetically be the ideal venue to address the questions that drove my placement. In declining that option I’ve realised I need to understand what I’ve done to date with regard to those questions. I initially pitched my placement as ‘the ODI is the best place to learn about these things’ and I feel comfortable contextualising my experience to date as a rebuttal to my initial assumption. I’m essentially bounding what I could possibly learn about OSS development through the affordances and context of the ODI at the time I was here. This is what I’ve got thus far.

On the Toolbox, what I’ve done to date is scaffolding. This was essential maintenance that all the Toolbox repos were in need of. Said scaffolding was the most that could be done in the face of limited resources and a dormant community. Along the way I’ve had intimate access to decisions taken around the undesirable circumstance of one of those OSS tools being forked from the Toolbox. There’s been a lot to learn from all that. I’ve spent two thirds of this placement assisting and observing and I feel comfortable with the decision to inject a different tempo into the final third of my time here. Moreover the feedback I’ve received from Comma Chameleon’s OSS contributor @chris48s when I’ve submitted changes has been so helpful and I’d really like to get to the point where I can be the person offering such helpful feedback to newcomers. I firmly believe that spending a concerted duration of time with one tool will provide me the best opportunity to get to that point.

Finally there’s the matter of what I can achieve with Bothan. As I see it the work for the next two months will enable me to learn how to use the product roadmap(link) as a means of soliciting feedback on Bothan’s direction. It also enables me to trial ways of capturing user feedback that can be extended to other tools in toolbox. I hope to learn the difference between indicators of community engagement versus indicators of product adoption. But going beyond the obvious outcomes of working on Bothan there are other engaging questions. How does Bothan compete with the many dashboard services that are proliferating? Bothan is a little like Comma Chameleon in terms of being an odd-one-out: it’s concerned with time-series data where all the other toolbox applications address tabular data. While I can see how Bothan is analogous to Octopub as an easy ‘first steps’ experience with publishing open data, it’s relationship to the below aspects of the ODI’s mission is less straightforward:

  • Getting Data To People Who Need It
  • Meeting Data User Needs
  • Contribute to a strong, fair and sustainable Data Economy

I’m looking forward to determining what Bothan can do for those mission objectives. I also think Bothan has potential – it was the tool which most staff were curious to learn more about after I presented my Toolbox responsibilities at an internal meeting. There is a curiosity to get hands on with data dashboards that expresses itself in people of different proficiencies and this is something I hope I can tap into by improving the documentation around what Bothan can do. Equally what Bothan can’t do will be instructive to the user needs of dashboards in general

So what’s up next? I will direct you to watch this space (and this space) for Bothan developments.

James Smith has just authored a post entitled the Rise and Fall of ODI Labs. It’s a first hand account of an important backstory that’s laid behind my placement, and if you’re interested in dynamics affecting open source software I’d really recommend you go read it before digesting any more words here, as I won’t be summarising it: Jame’s post fills in the lived reality of those dynamics and you can feel them best by reading it from source.

I initially applied to re-join the Labs team I’d worked with in 2015. Late last year James broke the bad news to me but explained that if I still wanted to work with the ODI that there was now an even more pressing imperative for a Code Fellow. The nature of the role would change, as Labs no longer possessed the capacity to take on new work.

As my placement has progressed I’ve been piecing together the time period that occurred between my two stints here. Best I can tell Labs disbandment came at a particularly inopportune time: 2016 had seen them build up a head of steam, it was in this period that Octopub and Bothan were pushed to the state they are currently in.

I see the story that James outlines as essential context to the questions I’m aiming to explore through this placement – what I sensed of that backdrop was a motivating factor in taking the placement: I could see a clear narrative arc. There was an opening, a necessity perhaps, for the ODI to employ OSS contributor models to support the tools they’d developed because there was no other funding to sustain the development team that had authored them.

What I’ve gleaned from ‘anecdata’ is that what happened to Labs is an unavoidable dynamic of Open Source Software development: a steady stream of funding is often hard to come by.  That’s why I hope to dig into the Fall of Labs for lessons to be learned before my placement finishes.

I mentioned in some previous posts how ODI Queensland have assisted on the label definition for the repositories. We collaborated on an initial document determining what labels to follow, and Stephen Gates alerted me to several helpful resources in that regard. This week they had further feedback on the initial draft I made of the labels for the ODI Toolbox.

In this post I’d like to reflect on the act of, utility of labelling. This is because I am due to make a new addition to the labels in use based on the plan for my final two months. I’ll be focusing on one of two tools and one of the first steps is issue triage. This is a task assisted by the labelling work done to date but a task that necessitate the addition of some new labelling categories – requiring some form of prioritisation label system. I can see two ways into that – severity of issue and difficulty of issue.

I discern that the major benefit of triaging these issues is to reduce the graveyard like appearance that typifies several of our long running tools. Long unresolved issues, in addition to open PRs, strike me as two environmental cues that might make a repository seem less hospitable to newcomers.

example of inconsistent prefixing

Stephen Gates has provided me with feedback on the current state of the labels and it affirms what I’ve suspected while proceeding on the label taxonomy solo: labelling or tagging or any act of categorisation struggles to totally escape subjective preferences. Stephen picked up on inconsistencies in the formatting I’d applied across the labels, whereas to my eyes that inconsistency had some internally consistent logic. I hope that documenting the decisions mitigates this somewhat. Another consistent problem I find with labels is the tendency for them to multiply and thus exceed their filtering utility. This has proven the case in terms of the functionality category I’ve created for the Toolbox:

the functionality category

 

I think I’m finding it hard to decouple from how I relate to tagging in Evernote. I use tags in Evernote only nominally to organise things into collections – my principal reason for using it is that Evernote’s search functionality is dreadful and I need some other anchor to navigate through the 7 years of accumulated notes therein. The result is that my Evernote tags have long since surpassed the rationalisation threshold that Stephen notes. If you’ve got any thoughts or pointers on how to utilise GitHub labels I’d love to hear them!

For now the living documentation of the labels continues to be updated as needed. I’m also working out some scripting (leveraging the Github Label Manager package) to ensure that the Toolbox stays the most current resource for labels across all 6 of the applications and 3 of the 10 ODI’s libraries.

As part of the final few issues I clear on Comma Chameleon before setting my attention on other repositories I’m implementing @chris48s’ suggestions on Javascript code style guide. Up until now I’d been able to point all the Toolbox README’s towards the Github recommended Ruby Style guide but that didn’t work for the Javascript black sheep of Labs Output.

Thankfully there was plenty of discussion to guide my next steps, which I’m grateful for because I sensed that Javascript could have a lot more competition for what amounts to a code style guide. I’m also pretty interested to work on implementing a style guide across a project I knew to be rather ad-hoc in its development. But I’m also personally interested in the larger value of code style guides, surmised in this blog post as …

Most developers have experienced looking at a very old piece of code that they wrote and not having any idea why they wrote it. It’s not that your memory is bad, it’s just that you make so many of these little decisions while writing code that it’s impossible to keep track of them all.
Writing code against a style guide outsources that information into the code itself.

That first paragraph nails so much of my coding experience prior to learning the ins and outs of rigorous version control management, as acquired during my initial internship with the ODI. Having been introduced to programming through the infamously flexible perl I am intrigued that discipling the way coders write their code functions analagously to how VCS functions. Both seem to be ways of dealing with the at-scale, multiple author, nature of software development.

Conceptually of course this is interesting. Let’s see how stimulated my brain is after dealing with the graft of discipling the Chameleon against the style guide.

Nearly immediately I can see how style guides belong in the same ambit as refactors and VCS. Pointing my IDE of choice towards StandardJS immediately unveiled numerous unused variables. As I progressed through the Comma Chameleon codebase I sensed that this was no trivial task – large amounts of the code was being flagged as problematic by Standard. Like LOTS of code.

Screen Shot 2017-07-14 at 16.45.17

Screen Shot 2017-07-14 at 16.45.34

 

Screen Shot 2017-07-14 at 17.17.06

Thankfully the standard output of standard can be piped through grep to catch commonly occurring errors. In this way I was able to work on errors that the Webstorm IDE didn’t have an automatic autocorrect suggestion for.

If there’s one area where Standard seems a little suspect, or perhaps shallow is a better term, it’s the “Expected error to be handled” warning. For all the functions that threw this they were resolved by simply adding `if (err) throw err`, even in cases where the library in question (for instance, request) would handle errors differently. I later learned of the library snazzy which does something of comparable utility in far prettier fashion.

I decided to treat this ticket as an exercise in discrete commits. The biggest problem flagged by incorporating standardjs is it’s attitude to globals. Rather than attempt a refactor of the existing comma chameleon I availed of the fields which standardJS reads from the projects package.json. While this is essentially kicking the can down the road I also think it’s the correct thing to do – the changes that might result from bringing the codebase into line with this aspect of standardJS might be larger than they initially appear.

In doing this I also realised that this modularisation process is an interesting responsibility for the style guide to assume – it’s outsourcing the information, arguably the decisions, about modularisation into the style guide. The style guide becomes opinionated about how to manage the namespace of the app. On reflection this is no bad thing as whenever I return to a node project after some time this is invariably the aspect of the language that I have to spend the most time getting my head around again (but that might just be a quirk particular to my grasp of programming languages, modularisation was something that clicked with me only after a lengthy immersion in pidgin programming in perl and then proper object oriented encapsulation via Java).

All that said it still isn’t a style guide to adhere to slavishly, some things that it flags as erroneous are actually valid – proceed with a discerning mind.

This was definitely a “worth” exercise though. It’s unveiled a lot of gaps in test coverage in Comma Chameleon which need to be addressed. Moreover its opinions on style have definitely pointed to some code smells. I am in two minds as to whether this style guide should be built into the test command but I’m including it for now as it seems like standard is employed in a similar way to test suites. I didn’t complete every step of the ticket, in part because there are some things I didn’t know how to solve. I’ve bumped those queries to the relevant boards, but if you can help please provide me some pointers here.

The ever excellent Stephen Gates has alerted me to a feature of Github which I can’t wait to incorporate into the Toolbox: Github Code Owners.

This new feature automatically requests reviews from the code owners when a pull request changes any owned files.

There have been a few occasions in the last four months where I’ve felt overwhelmed trying to keep atop 6 different repositories. I’ve increasingly felt like, in the long run (i.e. after I am no longer full time at the ODI again), a sane way of keeping atop the tools will be assigning one ‘point’ member per repository. Ideally this person would be really familiar with that one code base, possessing a deep or intimate knowledge of how the app works and what the Design Principles of the app are. Such knowledge would equip that person to do code reviews in the most expedient manner possible, and ensuring regular code reviews seems like a prerequisite of building an OSS community.

The new Github feature will let me codify said owners within the same folder hosting the other collaboration infrastructure. I particularly dig the ability to specify code reviewers on a language basis.

So far my placement has talked about the Toolbox a lot. One of the outcomes of the Comma Chameleon redux process has been the opportunity to pay greater attention to all the libraries that ODI Labs produced that undergird the toolbox. This was something of an epiphany when I spoke to James about it last week, and I’ve wanted to communicate it somehow ever since. There’s two tools in the toolbox that make that clear.

But first off lets shift the nomenclature, indicated as necessary by the clumsy redundant phrasing in the preceding sentence. I think it’s justified to consider the Toolbox as comprising the application layer of ODI Labs output. But beneath that there is (what I’m calling) a library layer – either self-contained libraries that support the functionality higher up the chain, or client libraries that extend the utility of the application.

The best example is CSVlint.rb, the ruby gem that powers CSVlint.io web service, and additionally provides functionality to Comma Chameleon and Octopub. This gem has spawned related libraries, such as csvlint.sh (the library that packages up the ruby gem for use with Comma Chameleon) and integrates with a suite of “CSV-to-*” libraries, e.g. csv2jsoncsv2rdfcsv2schema. The former two libraries are part of the CSVLint’s implementation of the CSV on the Web Standard, whereas the latter provides support for the json-table-schema standard. That latter standard is currently subject to a big push from Open Knowledge Foundation and is now simply referred to as TableSchema.

A more recent example are the developer libraries (js, ruby) produced to complement Bothan. These libraries are in the same vein as the Zapier integration the team built: they provide other ways of interacting with the Bothan Platform.

As the first steps to making this crucial strata more visible I’ve added the CSVlint gem to the roadmap. Bringing this element of the work that ODI Labs completed to greater visibility strikes me as really relevant to one of my placement objectives, namely

how open source software development responds to the fast changing dynamics of the contemporary software ecosystem?

Each of these libraries are components of the overall open source software ecosystem, maybe even the nascent open source open data software ecosystem. The demonstrate the ‘build one thing well that can itself be built upon’ design principle that guided ODI Labs under James Smith’s stewardship. Understanding how these libraries interoperate with other software, and eventually live or die, is a relevant contribution to appreciating the overall ecosystem