A Programming Primer for Counting and Other Unconventional Tasks

Editor's note: This book is in its early, early alpha stage. It began as a list of projects to demonstrate the variety of uses for programming. I then started including tutorials on basic concepts and have kept wavering on how to make everything stick together. It's now a quagmire in that I spend more time exploring tangents and ideas than actually finishing the writing. So this first release is just meant as a milestone to keep on track. Everything in this book is subject to revision, correction, and complete overhauling.

Feel free to send feedback via dan@danwin.com, @dancow or @bastardsbook, or the blog.

Programming is for Anyone

Pilots lining up


Two phrases make the point of this book:

"If you count something you find interesting, you will learn something interesting."

1 + 1 = 2
- The minimum level of math skill needed for an interesting program

Counting, according to Dr. Gawande, is one way a doctor can be more than "just a white-coated cog in a machine." While he was a medical resident, Gawande didn't have a research grant or laboratory but he decided to count the times that surgical instruments were forgotten inside of patients. These mistakes, which could cause serious damage, happened about once for every 15,000 operations.

But when Gawande refined his counting to track the circumstances of the accidents, he found a deeper, more meaningful story: most of these mistakes occurred when an operation took a drastic turn, like when an appendectomy reveals a cancerous tumor:

The numbers began to make sense. If nurses have to track fifty sponges and a couple of hundred instruments during an operation – already a tricky thing to do –it is understandably much harder under urgent circumstances or when unexpected changes require bringing in lots more equipment.

Our usual approach of punishing people for failures wasn’t going to eliminate the problem, I realized. Only a technological solution would—and I soon found myself working with some colleagues to come up with a device that could automate the tracking of sponges and instruments.

The Bastards Book is not about Ruby – Ruby just happens to be one of several popular and accessible programming languages to choose from. But this book is also not just about programming. Programming just happens to be one of the best ways to do the kind of counting – and the exploration, analysis, and critical thinking associated with it – so that you can be, as Gawande puts it, "a scientist in this world."

The math behind counting is simple. But the process is hard and tedious and so we often settle and accept generalized numbers, reports and press releases because we just don't have the time or ability to look any deeper. So programming becomes an increasingly necessary skill as the world becomes more digitized and its information more accessible.

It's hard to think of a purpose more broad than counting. Or simply, information. This book aims to demonstrate the diverse and practical uses of programming and how if you have any kind of interest in anything, you'll find a use for programming. And the math involved won't have to be more complicated than 1 + 1.

Fishing on the Hudson

The state of self-taught programming

If you're not a programmer now, you might have dabbled in it or even studied it in college before deciding it was too esoteric and abstract to be of use for whatever you do in your life today. I felt the exact same way and only learned programming because it was part of my degree's curriculum. And at the time, I would've never recommended it to anyone who didn't want a career in programming.

I was wrong then, but I would be especially wrong now. In even just the last few years, the amount of resources and tools available to learning programmers – and more importantly, the number of things to do with programming – have increased by a staggering degree.

  • Programming languages are more human-friendly.
    Early computer languages were optimized for early computers. With today's processors, languages can have far more built-in features that drastically reduce the physical tedium and memorized minutiae needed to write powerful programs. Because programming languages don't need to be as efficient for machines to process, they've become much more efficient for humans to work with.
  • The amount of data and possible applications is much more diverse and plentiful.
    Nearly every piece of vital information is digitized – and even if when it isn't, there's an app for that. Anyone who deals with any kind of information today will find a reason to program.
  • The tools are still free. And they keep getting better.

    No other field exists like programming, in how a determined beginner can immediately do the things that master programmers worked for years to hammer out – because those programmers then gave out their work for others to reuse. The code I've written for this book uses the most basic parts of the language, yet I'm able to demonstrate tasks as complex as web-scraping, facial-recognition, and data-visualization, thanks to the hard work of other programmers.

    In the music industry, you have to get your own instrument and you can be sued for reusing even just two seconds of another song. In programming, the best tools are free (and are constantly improved upon). And using thousands of lines verbatim from someone else's code is not just legal, but a best practice.

  • The community keeps growing

    Just as they give away their code, programmers are generous in helping and teaching others. Entire books are posted online for free. In forums like StackOverflow, expert users practically compete with each other to help out even the most novice programmer. There is almost no question too obscure or basic that can't be found through Google.

But despite the wealth of incentives and resources, I won't bullshit you: programming will still be the most involved and demanding of any thing you've ever learned to do on a computer. But, even if the path might be long and difficult, it is neither lonely nor narrow.

New York Summer Streets

Ideals and pragmatism

Five years ago as a frustrated programmer, I would have never encouraged anyone to go into programming, let alone try to teach it. So it's amazing to me that 15 years ago, Steve Jobs – who was not really a programmer – said this about computer science:

In my perspective ... science and computer science is a liberal art, it's something everyone should know how to use, at least, and harness in their life. It's not something that should be relegated to 5 percent of the population over in the corner. It's something that everybody should be exposed to and everyone should have mastery of to some extent, and that's how we viewed computation and these computation devices.

Jobs advocated computer science well before the Internet became ubiquitous and unavoidable. And so his belief that programming was important and enlightening for everyone is far more real today. But as much as I'd like to also tout the high-minded virtues of programming, it wouldn't be very honest of me.

smash my head

Because I haven't stuck with programming out of abstract ideals of understanding the world around me. I just think it's depressing and dull to have to use computers without programming. And I'm someone who quit computer engineering because I wanted to get away from computers.

It's assumed that programmers must love computers. But consider how an average person uses a computer for work: mindlessly following a chain of tedious tasks learned at a training session years ago that involve navigating a seemingly random sequence of pop-up menus and buttons from mouse-click to mouse-click, interrupted only by inexplicable error messages and frequent calls to the help desk.

A routine chore might include clicking through a 30-page online database and copying parts of it into a spreadsheet. By hand, this repetitive task might take an hour. As a mediocre programmer, you might spend 59 minutes – not all of them at your keyboard – to design and program a web-scraper that does that same task today and for every other day.

But even if you never look at that program again, here's the key advantage: For 57 of those minutes (the other two minutes being used to actually type the code), you were actively thinking, researching, and learning how to solve a problem. You may never reuse that exact code. But you've become a better programmer and thinker. The next program might take just 5 minutes to think about and write.

Whereas if you had spent that hour just copying-and-pasting, dragging, clicking, redoing the times that you didn't properly drag-and-click, you've only gotten better at...just copying-and-pasting. Until you get carpal tunnel syndrome.

I don't know if programmers necessarily love computers more than the average user. We might just have fewer reasons to hate them.

The importance of being interested

Hallway and staircase

Most of my professional career has been non-technical jobs but I still found ways to practice coding.

As a newspaper reporter, I occasionally filled in on the night cops desk. This consisted mostly of chatting with the dispatchers and taking the radio scanner to dinner so that if something big happened, you could scramble and get the story before the paper went to press.

Many nights were pretty slow though. With the court system closed for the day and people going about their evening activities, there wasn't much to report on if those evening activities didn't include a major crime. However, the county jail's website did provide an up-to-the-minute log of everyone booked in the last 24 hours. In theory, it could be another source for late-night news.

But in practice, the site's interface made it nearly unusable for that purpose. It presented the user with a simple list of inmates' names, the time they checked in, and a link to more details of the arrest. My enthusiasm quickly waned after clicking through to the tenth booking record in a row that involved only an outstanding warrant or DUI.

So during a couple quiet shifts, I wrote a simple script to crawl the jail website. After hitting "Run" and going to get a coffee, I'd have a spreadsheet of every new visitor to the county jail, listed in a way most useful to me: each inmate side-by-side with his or her alleged deeds so that I could skim for "assault", "murder", "aggravated," and so forth.

And since it was a spreadsheet, I could easily sort the entire inmate population by age, bail amount, and even weight. Not being a full-time cops reporter, I didn't have a good grasp of the bail schedule. But now I could easily make comparisons and see how an alleged armed robber might have to put up $20,000. But someone else might have her bail set at $1,000,000 for a non-violent offense. Since bail is a consideration of both current charges, past criminal history, and flight risk, it's possible there were arrestees with interesting circumstances who didn't strike an information officer as being newsy enough to announce.

Was creating this spreadsheet impossible without programming? No. But I didn't have a group of interns to order around. And if I had outsourced the work, I would have missed an even bigger story.

One of the less-heralded effects of programming is that even its tedious steps are valuable. They force you to slow down and notice the details that are easily looked over for seeming irrelevant. When all I knew was mouse-clicking to view the jail records and manually copy them, I only looked at names and crimes because that was what I thought was interesting. When I took the time to write a web-scraper, I noticed what could be interesting.

In particular, the jail records included a reference number for each inmate. That number was a unique identifier that was used also by the county's court system to track not just when an inmate would have her day in court, but all of her past court appearances and resolutions, too.

I had ignored the reference number because it was extra work to write down. But it was trivial to collect with an automated web-scraper – along with every other data field. And now I instantly had a key into an entirely new source of data. With a little more programming, my scraper automatically could visit the court system's website and collect past criminal histories.

What started out as a tool for trivial breaking news was now something that explored a much bigger picture of the justice system, potentially shedding light on issues of recidivism and other outcomes of sentencing practices.

When non-programmers hear of such seemingly expansive analysis, they think that the programmer must be a genius. But it's not genius as much as a natural revelation that comes from being used to seeing the logical big picture, nevermind the details and minutiae. Programming is an art that sparks curiosity in its practitioner and, as a (huge) bonus, also provides an efficient way to investigate and satisfy that curiosity.

PCSO jail
A random sampling of the mugshots collected from the Putnam County Sheriff's jail history, the subject of a later chapter

So what did I do with this mine of criminal justice data? Virtually nothing. I checked in occasionally to see if anyone had any giant bail amounts or simply to find out who was the heaviest or oldest person in jail. Crime wasn't my main beat and so this was just a programming exercise for me. In a chapter of this book, I demonstrate how to scrape another jail system. But I draw only superficial conclusions about the data, because I know even less about that jurisdiction. The collage of inmate facial expressions above, which I stitched together with a Ruby script, is the peak of my creativity.

When it comes to making meaningful interpretations, pure programming skill can't make up for lack of institutional knowledge or passion about a subject. A programmer may have the tools and data easily in hand, but that doesn't mean he or she will know what to look for. And non-programmers can have the insight and instinct for investigation. But the inability to even conceive of the tools leaves them just as blind.

The problem with being just a non-programmer with an interest – or a programmer without an interest – is that it's not enough to know what you don't know, hoping that you can just hire someone to fill the gap. Because you still won't even know what you don't know, and more often than not, neither will your partner.

As programming is increasingly more relevant to our everyday lives, I think everyone should learn it to some degree, though I can't say that everyone should study it in-depth, because it's still a intellectually-demanding pursuit that requires patience and devotion. But I also think there are many out there who have so far avoided programming for the wrong reasons, thinking that it was too narrow and difficult for their purposes. If they could only see how it might relate to their passion, and how it would vastly improve their work and their ability to make an impact, they would gladly make the leap to learn programming.

Go to the Table of Contents

What this book covers

It's hard for non-programmers to think how they would ever use programming because to them, programming is just writing instructions for computers. And programmers have a hard time telling non-programmers why they should program for that exact same reason: knowing how to write "computer instructions" basically means knowing how to affect and improve every possible way that you have ever used computers for.

My high school newspaper teacher always said, "Show, don't tell." So this book was originally going to be just a collection of practical examples and projects to sell non-programmers on the uses of programming and how to get into it quickly.

But I've found that novice and non-programmers sometimes don't get much from seeing a wall of code. Or worse, they blindly copy it, not thinking about how such code runs or why it was written the way it was. It may work for them the first time. But when they try to do harder and more creative tasks – which is the point of learning to program – everything falls apart and they quit coding because it's all just a jumble of electronic voodoo.

So the first part of this book contains a bare-bones run-through of what I consider to be the most important programming fundamentals. I skip important topics so that there are fewer concepts to juggle at first. And I believe that once you get to the point that you can make code useful, you'll naturally go back and learn those important fundamentals on your own.

I've included a collection of some of the best online and free resources to supplement what this book lacks. If the Bastards Book inspires you enough that you decide to ditch it for a more thorough and intellectual treatment of programming, the sooner the better, in my opinion.

Data programming

So out of all the wildly imaginative ways that programming can be used, what does this book focus on? For lack of better term, I will refer to it as data programming, which involves the gathering, organizing, and analyzing of data in all its forms.

It sounds like a decidedly unsexy topic but that's kind of the point. If you're learning from scratch how to program, there are concepts common to every kind of programming you'll ever do. So I don't think it's worth yet being occupied with making pretty graphics for a game or picking fonts and writing text for a web app that no one will use yet.

In data programming, the criteria for success are pretty clearcut: you either got the data or you don't. And if you have it, you can just use it for whatever your non-programming job is. And that will only encourage you to learn even more programming.

Data is simply content. And whether you move on to learning how to program a social networking site, an iPhone game, or a workflow-management system for your company newsletter, you will always need to be a good "data programmer."

Go to the Table of Contents

F.A.Q.

Do I have to have any programming experience to understand this book?

No. This book assumes you have little to no programming experience.

What is in version 0.1 of this book?

This is a rough draft that includes the general scope of the programming concepts I hope to cover. But in the process of writing it, I frequently changed my opinions about its style and organization, so expect some inconsistency. Future iterations of this book will include many more examples and projects.

Possible incompatibilities

I wrote this book using Ruby 1.8.7. This was mostly because if you're using Mac OS X version 10.6 or higher, your system comes installed with 1.8.7. Assuming you don't have Ruby on your system yet, when you get to the installation chapter, I recommend just installing Ruby 1.9x (whatever the latest version is). The main difference is that Ruby 1.9 has better performance and features. But you may encounter errors with the rare 1.8<=>1.9 compatabilites until I do a re-check of the code.

Regarding Windows systems: I tested the book's example code on a cheap Windows laptop. However, I did not have time to test this website in Internet Explorer.

I highly recommend that you download Firefox; not because it's necessarily a better browser, but because it has tools and plugins that are extremely helpful in many of the projects in this book.

What kind of computer and software will I need?

This book assumes most readers will be either on a Windows PC or Mac. Linux users will have different installation procedures but should otherwise be able to follow along just fine.

The Firefox browser has plugins that are used in the later chapters of this book.

That said, I wrote this book on a Mac and have barely used a PC in the last two years. There may be incompatibilities that I haven't anticipated. However, the popularity of Ruby has sparked a huge community of support and technical solutions for installing and using it.

Why Ruby?

There's no best programming language. Ruby happens to be the language I work with the most. That said, Ruby is a great for a starter language. The characteristics that make it popular among expert programmers are especially beneficial to beginners:

  1. Its syntax is very readable, even for non-programmers.
  2. It is currently very popular, which means there's a lot of other Ruby programmers who are developing Ruby code libraries and are willing to help others learn Ruby.
  3. It is comparatively easy to install and start programming with. For C/C++ programmers (the language I learned in school) interested in trying out Ruby, the Ruby official website says this: "...your head will spin at how rapidly you can get a Ruby program up and running, as well as at how few lines of code it will take to write it. Ruby is much much simpler than C++ – it will spoil you rotten."

Steve Yegge, a notable writer and programmer currently at Google, wrote up a nice comparison guide of programming languages. He had this to say about Ruby:

I learned Ruby faster than any other language, out of maybe 30 or 40 total; it took me about 3 days before I was more comfortable using Ruby than I was in Perl, after eight years of Perl hacking.

Ruby has by my easiest language to learn, too. But it's also the most recent language I've learned and I've learned far fewer languages than Yegge has.

What do I miss out by learning Ruby instead of [some other language]?

Ruby is particularly popular because of Ruby on Rails, one of the most popular frameworks currently behind dynamic websites. However, as a beginner, you won't be able to build a database-backed website right away.

Among professional programmers, Ruby is not an ideal choice for tasks in which processor-performance is the primary concern. Because Ruby is designed for writing code quickly and easily, the code itself does not execute as fast as a lower-level language. But if you're not a programmer yet, this is as trivial a concern as a horse-and-buggy operator worrying that taking drivers ed in a Camry is going to limit his/her Formula 1 career.

Python is the language that gets most compared to in Ruby. It is also an excellent, readable language. The main differences between them arise from the libraries they have for higher-end tasks. Python is the older language and thus has useful and mature libraries – such as SciPy and NTLK – that I don't see a good alternative for in Ruby. But Ruby has Ruby on Rails, which has been the hot web framework.

How is programming different from being a power user in [insert software package]?

"Power user" is defined as someone who has used a program and read its manuals long enough to know its ins-and-outs, including all the keyboard shortcuts, undocumented features, and eccentric behaviors.

Becoming a programmer is certainly harder and more frustrating than being a power user. The latter only requires years of experience in the software to eventually memorize its rhythms and quirks. But you can program for a decade and be constantly confused if you've failed to make the effort to understand the logic and concepts.

The difference is that as a programmer, you at least have the option of learning the concepts. Whereas someone who uses Excel for a decade, as I have at this point, will still have not the slightest clue as to how something as simple as entering text into a cell is done on a logical level.

Most of the time, you just don't need to know things at this level. This streamlining of the nitty-gritty is what makes Excel popular and extremely useful for commonplace tasks. I use it several times in this book to display data.

But when you need to do something different than what Excel's creators have anticipated, then some of its features can be harmful. For example, I'm sure it's useful to Excel's core audience to auto-convert "Jan 0" to "Jan-00". Unfortunately, when integrity of data is absolutely essential, it can be frustrating having no insight at all into the logic that triggers this conversion.

Below is a chart demonstrating the unpredictable behavior of Excel's auto-conversion feature:

  • Column A: The original values as pasted into a new Excel spreadsheet
  • Column B: What Excel converts the original values to, automatically upon import.
  • Column C: The result of forcing Excel to treat Column B as raw text with the Format Cells... command.
  • Column D: For fun, this is what happens when you reconvert the result of Column C back into Excel's Date format. If the original values in Column A actually were dates, better hope they were referring to the year 2011!
Excel's curious conversions of date values

An Excel power user will know that before doing an import, you can preset the fields in a new spreadsheet to treat everything as text. Or the very least, if you enter in each value with an equals sign and wrapped in quote marks (i.e. ="Jan 0") will force Excel to leave the values alone.

The programmer will attempt to hack together a workflow that avoids using Excel. If the project requires the final product to be in Excel, the programmer's workflow can at least move the data to Excel at the last step, after all the vital data-work has finished.

In the short run, when limited to this one specific task, the power user will win. It's just quicker to use a workaround within Excel's environment, which is flexible enough for such hacks if you have enough experience with the software. But when the task involves more steps, such as sharing the spreadsheet among collaborative users who are entering their share of the data at different times, can you be sure that they all performed the following hack?

The programmer has the flexibility to hack the workflow to the project's specific needs. And if the project requires other kinds of tasks beyond data-entry, the programmer has the skills to develop hacks for those too. The Excel power user, however, will find that the ="your value" trick doesn't work in many non-spreadsheet situations.

To paraphrase the Law of the Hammer: when all you know is Excel, everything looks like a spreadsheet. So the benefits from learning how to program is directly proportional to the depth and diversity of the data universe you aspire to inhabit.

What blogging/content-management software does this book use?

The actual content and HTML was almost entirely manually typed, which is why some parts will look different than others. I use Ruby on Rails (which is overkill for this kind of thing) to build out the templates. A custom script converts the RoR site to static HTML which I upload to an Amazon S3 bucket for fast loading.

This site started from the HTML5 Boilerplate and 1140-grid templates but has since badly mangled them.

The blog for the books is just a WordPress installation.

Why "Bastards" in the title?

Mostly because it alliterates with "book." But I like that it emphasizes how I don't think of this as a legitimate programming book. It moves quickly past the theory in favor of getting to the "fun stuff," but only to encourage the reader to invest time in going deeper.

So don't be fooled into thinking that reading it is enough to make you a respectable programmer. My intent is to show you how amazingly useful programming can be – even at a bastardized level – so that you can be confident that it's worth the effort to learn at a serious level.

About the author

My name is Dan Nguyen and I'm a journalist in New York. You can contact me at dan@danwin.com or @dancow. The footer of this site contains links to my personal blog, photostream, and other socialite things.

I wrote and self-edited this book. But this book's content owes more to the countless developers who donate their time and effort into creating powerful software and keeping it free. Without their selfless contributions, this book could never attempt to make programming approachable for aspiring programmers. Likewise, thank you to the many Wikipedia volunteers who have built a vast datasource for programmers and non-programmers to explore.

Also, thank you to the journalists at Investigative Reporters and Editors, the National Institute for Computer-Assisted Reporting, including my colleagues at ProPublica, for continuing to produce and inspire important journalism.

Go to the Table of Contents