Isoxya plugin: Link Checker 1.0 release

2019-07-19 · computing

This post was originally published on the website of Pavouk OÜ (Estonia). On 2020-06-12, I announced that Pavouk OÜ was closing. The posts I wrote have been moved here.

We’re pleased to announce open-source Isoxya plugin: Link Checker 1.0—one of many possible plugins for the flexible Isoxya internet data processor and web crawler. This helps with link checking, useful in SEO for validating large lists of URLs. The very first release, this marks a new phase within the history of Isoxya, as we’ve decided to release the plugin open-source (BSD-3 licence).

Open-source plugins, proprietary engine

We’re aware that to claim to be doing something truly different in SEO is one thing, but that has to be backed up by actions. Whilst the Isoxya engine itself is closed-source and proprietary, the true power of the pluggable design is by interfacing with all manner of other programs, industries, and use-cases. We’ve written almost all of Isoxya, including this plugin, in Haskell—a compiled, statically-typed programming language which makes it possible to eradicate entire classes of bugs confidently, which giving awesome performance. But since the Isoxya data-processor and data-streamer plugins use simple JSON interfaces, it’s possible to use other languages such as Python, Ruby, Clojure, Java—or something else entirely.

All just to read an HTTP Status Code

It’s easier to build something great if you have some guidance getting started. So, whilst what this plugin does is painfully trivial—reading the HTML Status Code parsed and sent by the Isoxya engine, and returning it as the processed data to stream over an Isoxya Pipeline—it provides a solid, runnable implementation to reference whilst building bigger, better things. We’re very much aware that this plugin could’ve been accomplished with a few lines in a single file, but we’ve chosen to split it out into separate modules using Snap—a framework and web server, complete with processing using Aeson—the fast Haskell JSON library. To this end, we’ve also open-sourced a module providing the Isoyxa Pickax interface, which hopefully makes it much clearer how the power of Isoxya can be exploited to create more complex plugins. We’ve also packaged everything using Docker, which we use extensively at Pavouk, integrated with our continuous integration servers.

Something wicked this way comes

What’s next? Well, this is just the beginning (okay, far from a beginning, but a new phase, at least). We hope to build on these tools to create more complex plugins, over time providing a plethora of composable options for crawling and extracting SEO insights and other types of data. We also have plans to create a new Isoxya Pipeline for Elasticsearch, which will stream data directly and place the full power of industry-standard data-analysis solutions such as Kibana at your disposal. Stay tuned for more about this.