Skip to content
Insights

DATA CONVERSION SOLUTIONS​

  • October 11, 2022
  • Data
Home

Do you need a sophisticated data conversion solution
that employs the best of technology and human insight?

For over 20 years, MGPS has performed harvesting, source content format/ structure conversion, locale transcoding and recoding, and entity extraction/mining of mono and multilingual web and hard copy content as part of our standard services for complex localization programs.

Unparalleled Resources Supporting over 200 languages and dialects, our team is uniquely positioned to not only maintain top-level expertise on the relevant technologies, but also to design and apply ideal solutions for processing specific languages and locales.

Technological Expertise Many of the government and commercial programs we support include both human and fully automated harvesting of globally hosted web content including text, graphics and multimedia.

We work closely with our clients to devise the requirements for data functionality, data structure/schema, target locales, data ingestion and processing environments. Clear definition of the conversion requirements and relevant technologies is a key component for the successful generation and delivery of content in the target format.

Comprehensive Services Current projects involve data mining, entity extraction, and complex conversions into various flavors of XML, including XLIFF, TMX, TBX, DOCX, XLSX, XML, RESX, etc., with custom schemas for both mono and bilingual content. 

Quality Assurance When we employ fully automated technology processes, our quality assurance team always verifies the quality of output and coordinates closely with the technology team to optimize technical solutions, such as DTDs, filters, tools and procedures. 

Data Enrichment Data enrichment is a key phase in our production workflows. During this phase, our content experts populate the final products with key metadata using the data structure defined in the earlier stages of a project. Our team employs cutting-edge technologies and procedures that enrich the data included not only in the deliverables, but also in the recyclable content databases.

image7

Project Summaries

MGPS converted a voluminous, hard-to-find, Asian language dictionary from hard copy to Multi-Dictionary Formatter (MDF) and enriched XML so the output could be shared by government agencies. The 3,000-page dictionary contained more than 150,000 lexemes or 4 million total words.

MGPS performed clean-up of dictionary content scraped from the web and MDF linguistic tagging to create Unicode-compliant files. We exported the merged content (60,000 lexemes and their definitions) to create an easy-to-use and aesthetically pleasing dictionary. 

MGPS processed foreign language directories containing hundreds of pages and a wide variety of metadata types by scanning/OCRing or converting source files into a database-ready format for ingestion into a Translation Management System for processing. Our unique workflow maintained the directory hierarchy in a bilingual, database-friendly format; and identified and externalized repeated content to maximize future leveraging. This approach allowed us to optimize efficiency and ROI, resulting in productivity gains over time of 300% (per hour).

For more information, please contact: sales@mpgpublicsector.com
FEDERAL MISSION INSIGHTS​

Explore our latest research and expertise

MGPS Becomes Bronze-Level Sponsor of OSINT Foundation

April 10, 2023
MGPS has partnered with the OSINT Foundation, the professional association for open source intelligence (OSINT)...

GLOBAL TALENT SOLUTIONS​

October 10, 2022
Do you need a compliant solution to hire talent overseas in support of U.S. government contracts? ManpowerGroup Public...

MGPS News from the 15th Biennial Association for Machine Translation in the Americas Conference

October 6, 2022
Konstantine Boukhvalov, MGPS Human Language Technology (HLT) Director, Eileen Block HLT Engagement Manager and Mark...