Building an OCR Service With TesseractJS in AWS Lambda
Tuesday November 21 2017

The past few days I was trying to make TesseractJS work in AWS Lambda so that I could do some OCR (Optical Character Recognition) on some images I had stored in an S3 bucket. However I am a bit new to NodeJS and I was running into some difficulties getting it to work in the Lambda environment. In this post I am going to go through some of these issues and how I solved them.

TesseractJS is a OCR library written in pure JavaScript. It can recognize the text in images, as well as provide information about the location of the paragraphs, lines, and words in the document.

We will be using a NodeJS 6.10 runtime in AWS Lambda. And I will be deploying the service with ClaudiaJS.

Downloading the TesseractJS Files

When running TesseractJS to recognize an image, TesseractJS will automatically begin downloading some files, which include tesseract language files, a core library file, and a worker file. These are all files that TesseractJS requires in order to correctly run.

The problem occurs when trying to download these inside AWS Lambda, since Lambda only allows writing to the /tmp/ directory, you will get an error like this in your logs:

Error: EROFS: read-only file system, open 'eng.traineddata'


Creating an Upload Progress Dialog in Java
Thursday November 16 2017

Many Java file upload tutorials teach you how to show the current progress of a file upload using a JProgressBar in the user interface, usually placed in the main JFrame.

However I was trying make this much more visually appealing, like the way WinSCP does it:

SCP File Upload

This will require the use of another Swing class: JDialog. Basically the flow of the program will be like this:

  1. Main class with main method calls SwingWorker class.
  2. SwingWorker is in charge of doing the actual upload. It will also create an instance of a JDialog to show the current upload progress.
  3. JDialog class will contain the progress bar and other useful upload information.


Querying Nested Documents in ElasticSearch
Monday October 23 2017

The other day I was using ElasticSearch to build an index that would contain book documents. These book documents would then contain many pages, each page indicating its page number in the book, and its content as text, extrated from its own PDF file.

I wanted to perform some queries on a specific book so that ElasticSearch would return all the pages in the book which contained a certain word in its text field.

Nested Documents

ElasticSearch 5 supports nested documents. These are internally separate documents that will belong to a certain parent, in this case, a book. Before creating any books, we must first define a nested object mapping for the index:

PUT /my_index

  "mappings": {
    "book": {
      "properties": {
        "pages": {
          "type": "nested",
          "properties": {
            "id":    { "type": "integer"  },
            "text": { "type": "text"  },

Notice how the mapping is created using a PUT request at the index level.


Umask Permissions in a Puma Production Environment
Wednesday October 18 2017

Recently I was having an issue with a Sinatra application deployed in a staging server. The application was deployed with Puma and Nginx in the following location:


This web service would then try to access some files in another directory on the server, mounted as an SFTP directory. The Sinatra app would open these files and generate some new files from them, depending on the HTTP request received.

The problem was that the operation would fail due to a permissions issue. I was baffled since I had set read and write permissions to the directory and the files in it.

Umask: The Problem and Solution

It took around two days to find the culprit: umask. In Linux, umask acts as another set of permissions for processes and cannot be set for directories, basically speaking.

I realized that this probably meant that the process running the Puma application server had a umask configuration that was not allowing the generation of new files.

I decided to test this if this was the case. In the Puma documentation, I found an option to change the permissions of the UNIX socket using umask:


Easy Notification System in Rails Part 3
Tuesday September 12 2017

Read part 1 and part 2 of this series

In this post, we will be sending automatic e-emails every time notifications are created.

Creating the Mailer

We will work with one mailer that will send e-mails for every notification that is created. We can generate our mailer with this command:

rails g mailer NotificationsMailer

Our mailer will contain an action for each notifiable type that works with notifications in our application. In this series, we've been using comments and posts as examples.