Isoxya plugin: Spellchecker 1.0 release

2019-08-10 · computing

This post was originally published on the website of Pavouk OÜ (Estonia). On 2020-06-12, I announced that Pavouk OÜ was closing. The posts I wrote have been moved here.


We’re pleased to announce the very first release of open-source Isoxya plugin: Spellchecker 1.0—providing spellchecking to SEO and other internet-related data-processing activities. Using this in combination with the proprietary Isoxya engine, it’s possible to spellcheck entire websites, even if they have millions of pages. Docker images are available, and similar to Isoxya plugin: Link Checker, released a few weeks ago, we’ve decided to release the plugin open-source (BSD-3 licence).

Increase the quality of your pages

By developing this plugin, we’re uniting two strong interests of ours: large-scale internet data-processing, and human languages. We’ve lost track of how many times we’ve noticed spelling errors or basic typos in publications online—sometimes on the homepages of major newspapers! Whilst these high-traffic pages typically get corrected swiftly, such basic errors present a less-than-perfect view of your brand, and can distract from your core messages. But correct spelling online isn’t just for pedants; high-quality written communication can affect ease of indexing by search engines, and potentially also affect quality scores. Whilst we can’t directly prove this last point, given that most such algorithms are trade secrets, when you think about it logically, it makes good sense.

High-quality, open-source spellchecking

The spellchecker backend is Hunspell, the same spellchecker as is used in LibreOffice, Mozilla Firefox, Mozilla Thunderbird, Google Chrome, and various other proprietary programs. That means we’re in good company, and can offer consistency with spellchecking results. Suggestions come from the Hunspell and MySpell dictionaries, enabling us to support a large number of languages easily, and paving the way for future expansion.

Languages

English (BrE, AmE)

[
  {
    "paragraph": "GloBal heating is increesing droughts, soil erosion and wildfires while diminishing crop yields in the tropics and thawing permafrost near the Poles, says the report by the Intergovernmental Panel on Climate Change.",
    "results": [
      {
        "correct": false,
        "offset": 1,
        "status": "miss",
        "suggestions": [
          "Global",
          "Glob al",
          "Glob-al"
        ],
        "word": "GloBal"
      },
      {
        "correct": false,
        "offset": 19,
        "status": "miss",
        "suggestions": [
          "increasing",
          "screening",
          "resining",
          "cresting",
          "resisting"
        ],
        "word": "increesing"
      }
    ]
  }
]

Czech

[
  {
    "paragraph": "Česka republika si v rámci Evropské uni udržuje velmi dobrou kondici trhu práce.",
    "results": [
      {
        "correct": false,
        "offset": 37,
        "status": "miss",
        "suggestions": [
          "inu",
          "ni",
          "unie",
          "unii",
          "unií",
          "unci",
          "učni",
          "unik",
          "upni",
          "usni",
          "unit",
          "utni",
          "uhni",
          "ani",
          "oni"
        ],
        "word": "uni"
      }
    ]
  }
]

German

[
  {
    "paragraph": "Die 140 millionen Euro teure Expedition Richtung Norden beginnt dann am 20. September. Sie fuhrt das Team von Norwegen entlang der sibirischen Küste Richtung Pol. Vor Ort werden Messungen im Meerwaßer, im Eis und in der Atmosphäre vorgenommen.",
    "results": [
      {
        "correct": false,
        "offset": 9,
        "status": "miss",
        "suggestions": [
          "Millionen",
          "millionen-",
          "-millionen",
          "Billionen"
        ],
        "word": "millionen"
      },
      {
        "correct": false,
        "offset": 192,
        "status": "miss",
        "suggestions": [
          "Meerwasser",
          "Meterware"
        ],
        "word": "Meerwaßer"
      }
    ]
  }
]

Spanish (European)

[
  {
    "paragraph": "El planeta necesita un cambio del modelo alementario para combatir la crisis climatica",
    "results": [
      {
        "correct": false,
        "offset": 42,
        "status": "miss",
        "suggestions": [
          "alimentario",
          "suplementario",
          "complementario",
          "parlamentario",
          "argumentario"
        ],
        "word": "alementario"
      },
      {
        "correct": false,
        "offset": 78,
        "status": "miss",
        "suggestions": [
          "climática",
          "climatice",
          "climatiza",
          "climaticé"
        ],
        "word": "climatica"
      }
    ]
  }
]

Estonian

[
  {
    "paragraph": "\"Vedelkütusega raketimootori katsetuse ajal toiimus plahvatus ja seade vottis tuld,\" ütles ministeerium.",
    "results": [
      {
        "correct": false,
        "offset": 45,
        "status": "miss",
        "suggestions": [
          "toimus",
          "toismui",
          "toiaimus",
          "toieimus",
          "toisimus",
          "toiilmus",
          "toiuimus",
          "toioimus",
          "toimimus",
          "toiihmus",
          "toihimus",
          "toiäimus",
          "toiimbus",
          "toimunus",
          "toitumus"
        ],
        "word": "toiimus"
      },
      {
        "correct": false,
        "offset": 72,
        "status": "miss",
        "suggestions": [
          "vettis",
          "voltis",
          "nottis",
          "kottis",
          "võttis",
          "vtotis",
          "votits",
          "vttois",
          "tavotis"
        ],
        "word": "vottis"
      }
    ]
  }
]

French

[
  {
    "paragraph": "Journees du chat, des toiletes ou de la Résistance : comment et par qui sont elles décrétées ?",
    "results": [
      {
        "correct": false,
        "offset": 1,
        "status": "miss",
        "suggestions": [
          "Journées",
          "Ajournes",
          "Séjournes",
          "Journades"
        ],
        "word": "Journees"
      },
      {
        "correct": false,
        "offset": 23,
        "status": "miss",
        "suggestions": [
          "toilettes",
          "toilâtes",
          "toile tes",
          "toile-tes"
        ],
        "word": "toiletes"
      }
    ]
  }
]

Dutch

[
  {
    "paragraph": "Ongeveer 680 ilegale inwoners van de Amerikaanse staat Mississippi zijn woensdagavond na politie-invallen gearresteert.",
    "results": [
      {
        "correct": false,
        "offset": 14,
        "status": "miss",
        "suggestions": [
          "illegale",
          "legale"
        ],
        "word": "ilegale"
      },
      {
        "correct": false,
        "offset": 107,
        "status": "miss",
        "suggestions": [
          "gearresteerd"
        ],
        "word": "gearresteert."
      }
    ]
  }
]