Working with Other People’s HTML – a Reflection

Get ready for another stream of consciousness.

In a recent work project, I had the task of creating two Chrome extensions. Each one would start at a specific origin page, scrape some data from said origin page, collect form input from the user, and then open a pre-populated Jira ticket template in a new tab using Atlassian’s documentation.

That’s where the similarities ended. I won’t bore you with too many of the differences regarding why I needed two different extensions in the first place, since they fundamentally do the same thing. Let’s just wave our collective hands at it and say “permissions, etc.”

And now, on to the reflection. The first extension I built relied on a pervasive internal tool that was built about a decade ago. Although this tool is integral to the daily process of many of my company’s client-facing staff (among other departments), its functionality and layout was built more like Rome than like Rome’s children. That is to say – the core hasn’t been changed in as long as the tool has been around, and new features, UI changes, and other tweaks, took place right on top of the tangled ball of code yarn that is the internal workings of the tool.

As a result of the tangled alleys and ad-libbed features of this tool, data scraping felt like the most Macgyver’d, jerry-rigged process. None of the data I worked with had any labels aside from the text that happened to precede it. And the oddest challenge turned out to be white space – did you know that node.nextSibling returns text elements too? I discovered this while pulling my hair out wondering why the contents of node.nextSibling returned either white space or undefined in my javascript! Not even StackOverflow uncovered this little nugget for me – I reached all the way to this thread on ars technica’s forum to get to the bottom of it. (Shout out to user Liam’s post from 2001 that recommended creating my own “getNextValidSibling” function!)

I slogged my way through creating a separate function to scrape each datum I needed, as they all worked just a bit differently. Some were hidden in anchor tags, some were inexplicably surrounded by parentheses or unexpected white space, and some just hid among preceding text with annoyingly similar names. No IDs in sight. Walking out of that coding adventure, I found myself feeling like I had just wandered out of the woods, having followed no path. It was almost unexpected when I realized “hey, that’s the last function, and I’m pretty sure it’s doing what I want!” Phew, what an journey.

And now, on to the next extension. It relies on scraping data from a much more recently created tool, with robust documentation, QA, and systemic/structural changes where needed. You might already know where I’m going with this. The new tool, while displaying similar information to the previous one, had its HTML laid out like a manicured garden. Not an attribute-less <td> tag in sight! Each item I needed to scrape had a neatly titled “data-qa” attribute, so I was able to resolve my scraping needs with just one function, receiving the value of the data-qa attribute to return the needed text. It was like Marie Kondo had been through there – nothing unnecessary and everything in its place.

What I’m getting at here is pretty intuitive. (In Russia, we might say “Поймет не только взрослый, Но даже карапуз.” – it’s understandable by not only an adult, but even by a toddler.) Working with neat and tidy code is a pleasure! Pass it on to the next reviewed if and when you’re able, to save them wandering in the forest. That, or document your API, whichever you like best.

Disclaimer: My code is still hideously un-navigable, and I promise I a working on it. At least there’s a few comments in there, right? …Right?

Footnote for the initiated: enjoy this blast from the past. Don’t stand, don’t jump, don’t sing, don’t dance, in construction zones or where there’s a hanging load.

Published by perlbeforeswine

When I'm not in the office, I can be found hiking in the nearest temperate rainforest or hiding indoors from the heat with my dog, Oliver.

Leave a comment

Design a site like this with WordPress.com
Get started