Notes from Greg Hislop

TOS Holds a Sprint to Develop FOSS Courses

29 June 2018

A team of faculty involved with TeachingOpenSource (TOS) are working on four Free and Open Source Software (FOSS) courses that will be available under a Creative Commons license.

Instructors always struggle to find enough time to develop and revise teaching materials, and often cannot teach new topics unless well-developed materials are already available. The TOS project (http://teachingopensource.org) and the associated sites such as http://foss2serve.org have created over 100 learning activities that can be used by faculty interested in teaching open source. Even so, teaching resources have not been organized into well-packaged modules or full courses, and there are still some open source topics for which there are no teaching resources.

A group of faculty have been working to make four FOSS courses available as complete packages. For each course, an instructor has committed to completing the following tasks:

Develop full course materials – These materials will include: the course syllabus, presentation slides, assignments and in-class learning activities, tests or quizzes, etc.
Teach the course – These are full-term courses offered to undergraduate students.
Conduct evaluation of learning – This may include a pre-course and post-course survey, and summary of direct learning, e.g, as shown on exams or tests.
Report on results – The instructor will provide a basic description of the result of the work including demographic information about the course section, instructor analysis of the course results and suggested revisions.
Make course materials available – The syllabus and course materials will be linkable from teachingopensource.org or foss2serve.org so that the materials can be downloaded. All materials will be licensed under a Creative Commons license.

The faculty team recently gathered at Nassau Community College in New York for a course materials sprint. The instructors for each course presented an overview of their course, and discussed challenges and approaches that had been particularly successful. The time together provided an excellent opportunity to exchange ideas and provide helpful suggestions for each of the courses.

Two of the courses have already been offered to students, a third is being offered over this summer and the final course will be offered in the fall semester. We expect to have materials from the first courses available by the end of summer, with the others to follow after they have been offered and packaged.

Participating institutions were: Nassau Community College, New York University, Rensselaer Polytechnic Institute, Dickenson College, Towson University, Western New England University, and Drexel University.

TOS is a member project of the Software Freedom Conservancy.

We want to thank Google for the funding for this course sprint provided by the Google Open Source Programs Office.

Some Tech Firm Perspectives on Hiring

29 December 2017

I recently attended a meeting with local tech companies to discuss the talent pipeline in this region, how they were approaching hiring, and what they saw as needs and opportunities . There were about 30 companies and a handful of recruiting agencies represented at the meeting. Several points of the discussion jumped out at me:

Describing the tech landscape – One of the technical recruiters, who worked for a large technical placement and services firm, offered his version of tech hiring, essentially lumping all the tech jobs into three categories:

Mobile/Web – and this was mostly about mobile
Analytics – which included big data and data science topics
Operations/production/support/infrastructure

I was interested in the absence of any mention of application systems and, in particular, back end processing, in this view of talent needs. Similarly, one of the large consulting services firms was present, and in discussing what they wanted from recent graduates there were two items that clearly dominated what they want: Agile methods and Web and mobile development.

Overall, I was pleased at how these perspectives line up well with our current degree programs and intended directions at Drexel, but I was surprised at how strong these trends showed in hiring.

Valuing open source software knowledge – I mentioned my interest in teaching students about open source and having them participate in HFOSS communities. I was pleased that this was well-received by those present and people from a number of organizations made a point of discussing this further with me during the day. They clearly saw open source knowledge as valuable both for product familiarity and for the process and tools knowledge that comes from FLOSS communities.

Hiring entry-level employees – Many of the companies are only hiring experienced people. While I can understand that impulse, and know that it’s difficult to do anything else for small companies, I was surprised at some of the larger companies still in that mode. For example, one company that has about 650 employees admitted that they still never hire recent graduates. That seemed very short sighted both in terms of supporting local talent development, and in keeping new ideas and perspectives flowing into the company. Unsurprisingly, this same company also mentioned concern that the average age of their IT staff was higher than they wanted.

Opening city offices to appeal to younger workers – Several companies mentioned using offices in the city as part of a strategy to attract younger workers. One company had specifically opened a city office for this purpose. This seems like a smart move, especially for companies with most of their operations in the outer suburbs. It also provides a clear demonstration of the significant difficulty companies are having hiring enough tech talent.

Open Source and Plagiarism: What do we Teach Students?

26 December 2017

Some members of our HFOSS team recently submitted a proposal for a discussion session at a computer science education conference. The proposal was to bring together instructors involved with teaching Humanitarian Free and Open Source Software (HFOSS) and other faculty who might be interested in introducing open source to their own students. The proposal was rejected, which was disappointing but not a big problem since rejections are a normal part of life in academic conference participation.

But I was quite surprised by the comments of a reviewer who had concerns about the proposal. The concern centered around open source and potential plagiarism by students. Essentially, the reviewer was concerned that if we teach students about open source that we would be enabling them to cheat by copying freely available source code from FOSS projects. The reviewer was hesitant about the very notion of teaching open source, and felt that a session on open source could not be held without discussion of student cheating it might encourage.

Now, like any instructor, I understand concerns about plagiarism. Cheating in many forms is far too common in higher education, and all instructors are constantly looking for ways to detect and prevent cheating. But from my perspective, there is no easy jump from concerns about plagiarism to concerns about teaching students to participate in open source. In considering this connection, it seems that there are several relevant points:

First, there’s not a shred of hope for a strategy that suggests that we can keep students from plagiarizing code from the Web by not teaching about open source. It’s true that most students don’t really know much about how open source actually works. But they all know that there is lots of source code on the Web – and that it’s as close as their preferred search engine. Teaching open source isn’t necessary for them to plagiarized, if that’s what they choose.

Second, open source provides an excellent opportunity to discuss intellectual property, attribution, and correct and incorrect use of all that code available online. Academic integrity has clear parallels to professional attitudes we want our students to develop. Wouldn’t it be better to make that parallel clear to students as part of helping them develop as professionals?

Third, the reviewer’s comments carry an implication that student work should be individual and done in isolation. Students learning to program do need to spend time writing small programs from scratch, but that needs to be just part of the picture. Students also need to develop understanding and skills reflecting the reality that most software products are team efforts and most software work involves working with large, existing code bases. Unless we prepare our students for that reality, we will turn them loose at graduation with major deficits in their professional preparation.

Moreover, this perspective on professional work is best introduced early in the curriculum. The computing profession is struggling to attract under-represented groups, including women. Research shows that the social aspects of team-based computing work are attractive to women students. Introducing students to open source communities early in their education shows signs of appealing to women students. The humanitarian aspect of HFOSS also seems to have appeal to women and under-represented groups.

HFOSS participation offers excellent educational opportunities for our students. For faculty like this reviewer who hesitate because of cheating concerns, we need to be very clear that teaching open source will not raise the risk of cheating and provides a great opportunity for learning, including opporunity to address academic and professional integrity.

Celebration of Women Faculty in Computing

25 July 2016

The Grace Hopper Celebration of Women in Computing feels a bit like a mirage for any computing faculty member who has been involved in trying to increase the number of women majoring in computing. Our efforts tend to produce very modest results, so going to Hopper feels somewhat disorienting. There really are large numbers of undergraduate women in computing! Of course the concentration at Hopper is an artificial effect, but very fun nonetheless!

The recent ACM election however created a result that isn’t a mirage. At the organizational level, the newly elected officers are almost all women. They are:

President: Vicki Hanson
Vice President: Cherri Pancake
Secretary/Treasurer: Elizabeth Churchill
Members-at-Large: Gabriele Anderst-Kotsis, Susan Dumais, Elizabeth Mynatt, Pam Samuelson, Eugene H. Spafford.

And for those of us with special interest in computing education, this election cycle is a double-header of good news. The newly elected SIGCSE officers are also almost all women, and, like the national officers, the whole group is highly accomplished. They are:

Chair: Amber Settle
Vice-Chair: Judy Sheard
Secretary: Sue Fitzgerald
Treasurer: Adrienne Decker
At Large members: Michelle Craig, Briana Morrison, Mark Allen Weiss
Immediate Past Chair: Susan Rodger

This would be a great year to introduce women majoring in computing to ACM, including pointing out who is running the organization.

Microsoft… and Hard… and a little bit Open

23 October 2015

tags: open source software, software industry

Skimming Microsoft’s most recent annual report provides some interesting insights. Among them:

While many people are inclined to view Microsoft as a mature (and perhaps not that relevant?) company, the financial results continue to be quite impressive. In rounded billions, Microsoft has grown from total revenue of $70 billion to $94 billion over the last 5 years. The most recent year over year growth is 8% – not a rocket ship, especially for a tech company. But at this scale, that equates to a revenue increase of over $6.7 billion – an amount that dwarfs the total revenue of all but a small number of tech companies.

A second point is that the “soft” part of the name continues to be joined by “hard” as in hardware. Of the current $94 billion, $17.7 billion is in the “Devices and Consumer Hardware” category. Not a dominant segment on Microsoft’s scale, but viewed by itself, $17.7 billion would make a pretty respectable hardware company! And it’s a long way from the days when the occasional bit of hardware like the Microsoft mouse was all that came from Microsoft’s corner of Redmond.

On a different dimension, browsing the annual report doesn’t uncover any declaration that MS is going become an open source company and abandon the proprietary model. No surprise there! But there is some recognition of the undeniable role of open source in today’s software ecosystem. The Shareholder’s Letter from CEO Satya Nadella includes this comment:

“We made tremendous progress in our developer engagement this past year by delivering new innovative services to developers across all device platforms and also opening our technology to create more participation from open source communities. “

And also this one:

“I am excited about the growth opportunities ahead. We’ve broadened our addressable market by building and acquiring technologies in key growth categories like security, data and analytics, while also delivering greater integration and support for Linux and open source technologies such as Docker containers and Hadoop. “

Other parts of the report correctly name open source products as competition for a range of MS offerings, so of course we’re looking at a mix of cooperating and competing. But that’s a step in the right direction.

IT Student Participation in Humanitarian FOSS: Product vs. Services

7 November 2012

I recently co-organized a one day workshop to bring together faculty who teach in Information Technology (IT) degree programs. The workshop was a pre-conference event for the annual ACM SIGITE conference held in Calgary. The workshop was a small gathering made possible by some remaining funds from the NSF-supported HumIT project. We had a great group of faculty, shown in the photo below, who brought their enthusiasm and expertise to create a day of interesting discussion and exploration of possibilities for student participation in humanitarian FOSS projects.

HumIT 2012 Workshop – L-R Front: Heidi Ellis, Sandra Gorka, Nannette Napier; L-R Back: Sam Chung, Evelyn Brannock, Greg Hislop, Dale Rowe, Jacob Miller

The attendees were all experienced IT faculty and they had the usual dual reaction to the idea of students participating in humanitarian FOSS projects. On the one hand, they were intrigued by the excellent educational opportunity and potential motivational boost for students. On the other hand, they were cautious about the learning curve, complications, and overhead for instructors who try to integrate student humanitarian FOSS participation into their classes.

One particularly interesting part of the discussion was whether student participation was easier or more difficult for IT students compared to students in other computing majors. One of the workshop participants, Jake Miller of Pennsylvania College of Technology, suggested that it might be more difficult for IT students to participate because they would be providing services for the FOSS project rather than making contribution to the product. Heidi Ellis, co-investigator for the HumIT project, and I have raised similar conjectures about IT student participation. But what is really the issue?

The fundamental issue here seems to be whether it’s easier for a new FOSS participant to contribute to the FOSS product or to services related to the product. As an instructor trying to plan student participation, there’s great appeal to finding possible contributions to the FOSS project that are manageable within the constraints of a course. This seems to translate into factors including the following:

Size – Something big enough to be interesting, but not so big that students can’t grasp the task and complete it within the time frame of a course
Dependence – Something that is clearly part of the project, but not strongly interdependent with other tasks or parts of the system
Schedule – Something that is needed as part of the project’s forward progress, but not on the critical path of project activities

It’s easy to think of tasks related to the product that have the desirable characteristics for each of these factors. For example, fixing non-critical bugs or developing a plug-in or generally free-standing module. On the services side, the concern seems to be that more of the IT services tasks have less desirable characteristics for student work. Size seems relatively manageable. The concerns seem to center around the other two factors above. Dependence may be more of an issue because many services tasks seem to have a greater need for knowledge of context. It also seems that more of the examples people think of for IT are items that have time pressure associated with them (e.g., providing support services to users).

The bottom line is that there definitely seems to be an additional layer of hesitation among IT instructors about whether student participation in FOSS is manageable, and I think there is merit to that opinion. But I also think that there is still plenty of room for IT students to engage in humanitarian FOSS work. We’ve had some initial success in this area, and I think broader opportunities exist. But clearly we need to provide additional demonstration of success to help IT instructors understand the opportunities.

How Big is “Big Data”?

6 October 2012

“Big Data” is getting a lot of attention lately as a key computing area for the coming years. Even the White House has gotten involved with this year’s announcement of a Federal Big Data initiative. But exactly how big is “big data”? It’s a moving target of course, shifting with our growing ability to generate, store, and process ever larger volumes of data.

IBM 2314 Disk Drives

The IBM 2314 Disks, introduced between 1965 and 1970 were a technical wonder in their day. But it took a whole row of large appliance sized units to crack 200 MB, and the “big data” of that day was mostly stored on tapes and accessible only via slow sequential processing of carts full of tape reels. Megabytes clearly qualified as big data.

Today I can beat that string of 2314 disks by an order of magnitude with a USB stick for under $20. Clearly the economics are radically different. But where does that leave the qualifying level for “big data”?

Wikipedia, that font of modern knowledge, provides an interesting perspective. A quick browse of the entries for gigabyte, terabyte, petabyte, exabyte provide all the scale we need without even worrying about a yottabyte. The system and storage examples in the Wikipedia entries are informative:

Megabytes clearly don’t make a blip on the Big Data horizon. The Big Data of yesteryear is a routine unit for the size of individual files today.
Gigabytes can be covered with examples of modest amounts of image, audio, or video data that most computer users deal with routinely. A few music CD’s or the video on a DVD breaks into gigabyte territory. There’s not much here that will impress as Big Data.

Terabytes are just one step up the scale, but things start to get much more interesting. The examples deal with data capacities and system sized from the last 10 to 15 years. They include the first one terabyte disk drive in 2007, about six terabytes of data for a complete dump of Wikipedia in 2010, and 45 terabytes to store 20 years of observations from the Hubble telescope. Clearly, at this point we start to be entering “big data” territory.

Petabytes start to move beyond the range of single systems. Netflix stores one petabyte of video to stream. World of Warcraft has 1.3 petabytes of game storage. Hadron Collider experiments are producing 15 petabytes of data per year. IBM and Cray are pushing the boundary of storage arrays with systems in the 100 – 500 petabyte range.

Exabytes examples start to leave systems behind and mostly describe global scale capacities. Global Internet traffic was 21 Exabytes per month in 2010. Worldwide information storage capacity was estimated at 295 exabytes of compressed information in 2007. On the other hand, astronomers are expecting 10 exabytes of data per hour from the Square Kilometre Array (SKA) telescope, although full operation is not projected until 2024.

750 Gigabyte 3.5” disk

So this scale would seem to clearly put gigabytes in the yawn category and probably below the threshold of Big Data. Terabytes clearly qualify, and probably will account for much of the Big Data efforts at the moment. Petabytes cover the really impressive data collections for today and really seem to contain the upper boundary of what even the most ambitious Big Data projects will be able to handle. Exabytes are rushing at us in the future, but mostly beyond what anyone will be able to address in the next few years.

So the bottom line: Big Data today has moved beyond Gigabytes. It is squarely in Terabytes and edging up into Petabytes. Exabytes and beyond is in the future. And we still don’t need to try to comprehend what a yottabyte is.

Image Credits

2314 disk – Scott Gerstenberger. Wikimedia Commons.

750 Gigabyte disk – Christian Jansky. Wikmedia Commons.

Engineering: Pipes, Wires, … and Software

18 June 2012

Software engineering will always be an uncomfortable fit with the traditional engineering disciplines. One of the key issues is the fact that all the other engineering disciplines create physical artifacts but software engineering does not. This difference means that the basis in physics and chemistry shared by all the other engineering disciplines is simply not relevant to software engineering.

This week I had a graphic reminder of this gap when attending the annual conference for the American Society for Engineering Education, where I presented several papers related to software engineering education. The exhibit hall was filled with vendors selling engineering education products, many of which involved equipment or scale models of large artifacts like bridges. Reflective of the relatively minor presence of software engineering at the conference, there were no vendors in the large exhibit hall who were positioned to support software engineering education.

This minor representation for software engineering reflects a national problem. Federal projections indicate that we should be graduating about five to seven times the number of computing majors that we are now graduating. Software engineering majors should be a key part of that group. “Software engineer” has even topped the list repeatedly in recent years as the best career opportunity available. Any yet the number of undergraduate programs in software engineering nationally is in the low 30’s and most of those programs have small numbers of students.

The lack of software engineering majors is a looming national economic problem. It’s a problem for the other engineering disciplines too. While browsing the exhibit hall at ASEE, I couldn’t help but note the extensive integration of software with all those displays of engineering equipment. With almost every exhibit, like the one shown to the right, there was a laptop or tablet that was used to provide controls or models or processing. In a profession where concern for attributes like reliability and performance are typical, the software engineer in me was inclined to guess that all that software was likely to be a weak link in many of these products. Until we start to take the challenges of software engineering more seriously, software will remain a weak link in engineering artifacts and beyond.

Data Science: What Data?

15 April 2012

Data science, data analytics, and big data are all topics that have a rising buzz in the last few years. As with many “new” tech topics, much of what these terms encompass is not new at all. There clearly are ties to existing activity in areas like data mining, decision support systems, business intelligence, visualization, etc. So what’s new and why the new terms and growing buzz?

One key to the shift in discussion clearly is the data itself. There are several categories of data that are simply exploding in size and importance. In trying to get your head around data science, it seems useful to categorize the types of data involved. My current mental model is that there are three broad categories of data that seem relevant to the discussions of data science. They are:

Human Generated Data

The volume of data published on the Web by individuals is truly one of the amazing features of our time. And the publication rate and variety of this data continues to accelerate. For anyone interested in what people are doing and thinking, this is a total game changer. Some examples of data in this category are:

Clickstreams and navigation histories of Web activity
Tweets – person to person message interactions
Facebook, Linkedin, and semi-public records of people’s lives and interactions
Citizen science – data gathering in support of science by interested non-scientists

Device Generated Data

There have been devices that generate massive amounts of data for decades, with areas like medicine, lab science, and aerospace providing ready examples. But the number and type of devices that create large data streams accessible via the Web is rising sharply. Projecting forward to fully instrumented intelligent infrastructure implies that the history of device generated data is barely a trickle compared to the future. Some examples of data in this category are:

Scientific devices – e.g., medical and molecular imaging
Sensors – intelligent infrastructure
Video and audio capture – traffic cams; security cams

Newly Accessible Data

As more and more of the world’s data shifts online, there are legacy data sources that take on new meaning. Much of this is data that was previously paper or computerized but off the Net. It includes data that may have been previously available, but that was prohibitively expensive and time consuming to access and aggregate. Examples of data in this category are:

Real estate transactions
Legal filings
Price data

——–

The iSchool at Drexel has active research efforts that address a variety of topics related to data science. Our degree programs increasingly address these topics too. And clearly the development of education for data science has just begun.

Evolution of Open Source and Commercial Providers for Learning Management Systems

10 April 2012

The recent announcement by Blackboard (Bb) that it was acquiring two Moodle service providers was quite interesting to anyone who follows open source in higher education. Over the years, Blackboard has emerged as a market leader in the Learning Management System (LMS) arena, through both product development and acquisition. At the same time, Blackboard has attracted considerable heat and a large dose of scorn for a patent the company filed and tried to enforce. That patent was viewed by many to be an attempt to corner the LMS market and to claim invention of many LMS features in use well before Blackboard’s supposed date of invention. Coverage of the long story and eventual Blackboard loss in the courts can be found here. Particularly for fans of open source, this sort of behavior does not make Blackboard an admired company, and acquisitions in the Moodle niche are much more likely to raise eyebrows than cheers.

It’s interesting however to see how Blackboard explains this latest move. It’s also important to note that Blackboard recently returned to being a private company after trading publicly for some years. That switch may have provided increased flexibility in strategy formation.

Blackboard’s strategy already includes multiple learning platforms due to acquisitions. The company has also broadened its scope beyond the LMS niche to address a range of educational institution application needs, including a push into areas like student services. Finally, Blackboard also grows by providing services, not just software. Taken together this means that accommodating the open source world makes sense for Blackboard in two ways:

Enterprise sales – In the push to cover the education enterprise, Bb will sometimes be sole provider for an institution across the whole Bb product line. But much more often, like any enterprise vendor, Bb will sell some applications and need to co-exist with products from other vendors in other applications. Open source is just another flavor with which to co-exist.
Services – To the extent that Bb is a service provider, large open source projects like Moodle and Sakai create a business opportunity. Blackboard clearly is moving to be a service player for both of these open source communities.

So, in spite of the history that seems to make Blackboard an unlikely candidate for good citizenship in open source communities, it’s not hard to see a business case for moving in that direction. And this step in the evolution of Blackboard makes an interesting case study for the continuing evolution of open source as a significant, not to be ignored, part of the software industry. Of course, the case study is still being written. And open source advocates who have followed Blackboard over the years will be excused if they want to wait to see how this plays out!