Since 1909, scientists of the National Institute of Meteorology of Brazil (INMET) have collected more than 3 million pages of weather records and information about climate change across the country. Their impressive record include climatological data from the sultry Rio de Janeiro, the roaring Iguazu Falls, the gloomy forests of the Amazon and the misty Sao Paulo. However, there is a common problem that these scientists must address to save collected information from rapid obsolescence: the digitization and quality control of weather observations collected over a 100-year period.
Brazil, the largest country in South America, sits within the tropics. Its territory boasts three types of climate: equatorial, subtropical and tropical. The development of almost all branches of the Brazilian economy, and especially of agriculture, depends on a variety of weather conditions. Therefore, it is important for specialists to analyze and accurately predict possible changes in weather conditions. Forecasts are also needed to ensure the safety of aircraft, pilots and passengers, to protect ships and sailors, for the proper organization of fisheries and the development of tourism.
Deploying technology to rescue the past
Thankfully, new capture technology like ABBYY FlexiCapture Engine can be deployed to restore the archive of weather observations that might otherwise be destroyed forever. Such weather history can help anticipate possible climate changes and make decisions on adjusting the country’s agricultural and industrial policies. INMET has been painstakingly collecting data on precipitation, wind, relative humidity, pressure since 1909. Day after day, its specialists registered this information and recorded it in their diaries of observations.
Until recently, these documents were kept in paper form. Archives with records were scattered across various cities in Brazil making it impossible for scientists to analyze or work with the documents. Added to this are books and notebooks that were kept in warehouses with no suitable conditions for the storage of historical documents. Some of such records were spoiled either because of the damp air or the abundance of insect pests, putting the institute at greater risk of losing some of its valuable records.
In 2010, INMET decided to digitize its entire archive of weather observations – 3 million pages or 4 billion characters – including notebooks, books and microfilms. The first step was collecting all records stored in different cities across Brazil. This was achieved in 2011, with help from the Institute’s employees, who transported the documents to Brasilia and housed them in a 1500-square-meter storage area in the INMET building. During the next stage, the institute’s specialists began to process and the restore the records, some of which were badly damaged. The final stage before digitization involved cataloging of all records and making it easy to find a diary of weather observations in the repository.
Forward, to digitization
INMET kicked off its digitization efforts in 2012, by inviting Flexdoc, a Brazilian company which develops software for processing and storing documents. However, rather than using modern OCR technology, Flexdoc relied on outdated manual OCR to convert weather observations into electronic form. The company also developed templates indicating what data from scanned documents should be entered into the system. Flexdoc then sent scans to a group of operators in India, who manually inserted valuable data from the images into the template. The result was over 20 types of brochures with information about the weather, each in at least 6 different page format, and with more than 150 fields.
To streamline and simplify this process, Flexdoc turned to ABBYY FlexiCapture Engine in 2014. The first advantage is that the FC Engine enabled workers to use multiple scanners and feed results into one solution. Therefore, Flexdoc employees scanned pages from the weather observation diaries using 12 ATIZ BookDrive PR and Plustek OpticPro A360 scanners. They then teamed up with INMET specialists to check the quality of the scanned images after which the results were imported into the system based on the ABBYY FlexiCapture Engine. Flexdoc employees provided the templates developed by ABBYY FlexiCapture for processing the documents, while ABBYY’s OCR technologies helped define and overlay templates for documents, find necessary fields in them and extract data. In decrepit documents or hand-written records, OCR technologies could not always recognize a field requiring digitization to be performed manually by Flexdoc employees.
Making everyone happy
The full digitization of the archive of weather observations collected for 100 years took just three years to complete. In addition to creating a repository in paper format, all digitized information are securely kept on high-performance and fault-tolerant server in Brazil.
Today, the digital versions are publicly available on the INMET website where they can be accessed through registration. The weather information is being is used by scientists from INMET, students, as well as companies that need to analyze the climate conditions in different regions of Brazil. It also serves as the basis for the creation of analytical models on climate evolution and weather prediction – which is the key responsibility of the scientists at the Meteorological Institute.
By Elizaveta Titarenko