爱译网logo 智能,研发,应用,推广  
           
Member name: Password: 注册
中文版
英文版
www.aitrans.net
AITRANS.NET--A HIGHWAY TO THE PALACE OF INTELLIGENCE AND WEALTH FOR TRANSLATORS AND READERS。让计算机模拟人的翻译思维,存储专业人士的高智力翻译成果,共建共享专业型智能化知识库,建立国际性智能翻译知识库标准,实现高质量的全自动机器翻译。
 
关于爱译网
客户服务
翻译论坛
下载专区
Home > AI Introduction > Translation Technology >Computer TM
 

Computer TM

 

Due to professional translation fields covering a huge amount of translation materials, while the range is relatively narrow, focusing on one or a few professions, such as politics, economy, military, aerospace, computer, communications, and other professions, and each has their own professional translation company or department. This will inevitably bring about different degrees of repetition of translation materials. According to statistics, in different industries and sectors, such information repetition rate ranges from 20% to 70%. This means that at least 20% of the translator’s work is wasteful duplication of effort. Translation Memory technology is to proceed from here, first eliminate the duplication efforts of translators, thereby improving efficiency.

 

The technical principle of Translation Memory is that users utilize the existing the original texts and the translations, and establish one or plurality of translation memories. In the translation process, the system will automatically search in the translation memory for the same or similar translation resources (such as sentences, paragraphs). Give reference translations, allow users to wasteful duplication of effort and only focus on the translation of new content. Translation memory learns and automatically stores the new translation in the background, and is becoming increasingly “smart” and efficient. Almost all Translation Memory manufacturers will tell the user: With a translation memory, you never have to translate the same sentence twice!

 

A well-known MT industry experts have said that MT has done the work that people are willing to do, but has not done a good job; TM has done the work that people are not willing to do! Sure, who wants to spend time on repetition work, and who will be happy to let a technology immature machine to replace their work? TM technology, actually aids translation, that is, “Computer Aided Translation”, referred to as CAT. And compared to MT, the biggest advantage of TM technology will be the high-quality example sentences stored in the database and the improved translation process control system. So to speak that TM is the only truly large-scale translation technology realizing application on the Industrialization

 

The functions that can be achieved by TM roughly are as follows:

 

1. Translation Process Translation memory products will automatically “memorize”every translation the user translates, and when translating a new sentence, the system searches translation memory, compares and matches the sentence and the translation units in memory database, pick the translation unit closest to the of original text and give a reference translation. Users can accept the translation, or can do some changes; the modified translation will be automatically saved to the memory database for later use. Vocabulary and sentence patterns of professional fields are relatively fixed, when the user has accumulated a few memory databases of certain scale, will meet more repetitive sentences, and the translation becomes easier.

 

Translation Memory products also support network sharing memory database function. That is, when many people translate at the same time, they can share a translation memory via LAN, each online translator can use the work of others in real time.

 

2. Automatic Database Creating Users who have accumulated a large number of translation materials before using TM products, can use the automatically database creating tool provided by TM products. The tool can automatically analyze and match the original text and the translation to correspond the original text and translation taking the sentence as the unit. After the user has done some adjustments and proofreading, the tool automatically generates a standard translation memory file. All the information the user can be recovered by the tool to efficiently and quickly build translation memories. These databases will be further supplemented and improved during constant use.

 

1. Translation memory (TM) Overview

 

In the process of software and website localization translation, there are a lot of duplicate contents in the data files that need to be addressed. Because the contents are frequently updated, which are based on the contents of the previous version, only add a few new contents or do a little amendment to the original content. It is necessary to make full use of the translated contents of the previous version without the need of re-translation.

 

How to effectively reuse these translated contents? TM technology is a practical means, which uses segment and translation memory to improve translation efficiency. The translation database using Translation unit as data unit establishes corresponding links between each sentence of source language and each sentence of target language. When translators use TM aiding translation tool to translate, the translation tool will store the latest translation contents to translation memory. As to the contents to be translated (such as words, phrases, sentences, paragraphs), it first searches in the translation memory for matching contents, and automatically provides the closest translation.

 

Specifically, when the translation contents are of 100% match, the relevant translation in translation memory will be directly inserted to the translating text; when the match rate is lower than 100% and higher than a set threshold (fuzzy matching), the translation memory tool will prompt the appropriate translations to translators for reference and they choose the closest translation and after simple editing, complete the translation; When the match rate of the translating sentence is below the set threshold, the sentence is dealt with as new content, and no translation tips are provided, translators need to translate manually and these new translated contents are automatically stored in the translation memory, to facilitate future search and reuse.

 

With the enriching of the translation contents, the capacity of translation memory increases. In the translation process, the translator and the computer achieve effective man-machine interaction and translators do not have to translate the same contents, just focus on translating new contents. The accuracy of translation memory and the consistency of same content translating can be guaranteed. Let the computer become “smart” and “liberate” the translators, which are the goal of TM technology.

 

2. Translation Memory Exchange (TMX)

 

Translation Memory (TM) is one of the technologies widely used computer-aided translation (CAT) field, with the TM technology translation efficiency can be improved significantly, and content consistency can be ensured. Because of the wide variety of CAT softwares developed with TM technology, the storage formats of translation memory content vary widely. In order to facilitate translation agencies and translation memory data exchange between CAT tools, an open standard called TMX has been successfully applied to the localization and translation industry.

 

3. History of TMX

 

With the economic globalization, the industry of software / website localization and globalization has developed rapidly. There are more and more localization tools and TM tools development with TM technology. But these tools are developed by different manufacturers, and each has its own file data storage format. In addition, to a localization service institution, it often provides localization translation services for different projects of different clients or same customers. Because different clients and different projects require different localization tools, file data of each localization tool often lacks in exchangeable standard format, so it is difficult to reuse the previously accumulated translation memory resources.

 

Clearly, the standard format of translation memories needs urgent unity, so setting the standard of translation data exchange has become the top priority of the localization/globalization industry. It can make service providers within the industry, customers, and tools developers enhance the unity of information processing, achieving win-win business. Driven by growing market demand and translation memory technology, TMX standard comes into being.

 

The initial discussion of the TMX standard dated back to June 1997, participants attending the Localization Industry Standards Association (LISA) Conference, including localization customer, tool providers and localization service providers. A discussion was held on localization tools not compatible with translation memory data, after the conference a special body was formed by these members, which was called OSCAR (Open Standards for Container / Content Allowing Re-use), and the TMX specification is one of the most important results of OSCAR.

 

4. Features Summary of TMX

 

TMX, developed by OSCAR organization which is belonged to LISA, is independent of all manufacturers, is an open XML standard, used for storing and exchanging translation memory (TM) data created for using computer-aided translation (CAT) and localization tools. TMX goal is to slow down the translation memory data exchange between different tools and / or translation agencies, and to reduce or avoid the loss of important data in the exchange process.

 

TMX’s goal, under the premise of ensuring translation contents, is to set a neutral data exchange standard for different localization and translation tools, and now there are more and more localization translation tools in the market providing support for TMX standard.

 

According to an industry survey by OSCAR organization, translation memory resources have become increasing strategic assets of the localization / globalization services, in a way the value has been up to over millions of U.S. dollars and plays an important role in international business. TMX standard provides the function of preserving the assets of these companies, so they are not any loss caused by the update of the market and technology, and not bound by the specific computer-aided translation tools.

 

5. Structural Interpretation of TMX

 

(tmx) is the root element of TMX document, and (tmx) element includes (header) and (body) these two elements.

 

(header) contains the document’s metadata, in addition to (header) property, also can store document-level information in (note) and (prop) element, using (ude) element listing any user-defined characters.

 

(body) is the collection of translation unit ((tu) element), organizing translation unit with fragments, and this collection has nothing to do with the organizing order. Text fragments are included in the translation unit ((tu) element), and each (tu)element contains one or more translation unit variables of (tu)element. These translation unit variables respectively are the translations is in different languages of the same translation unit. Each (tuv) element contains fragments and given language-related fragment information. Store actual text in (seg) element, all the formatted information inherited from source document is stored in inline, and (note) and (prop) store each specific (tuv)-related and information.

 

The size of fragments is not limited, usually a phrase, a sentence or a paragraph. In most tools using TMX standard, the fragment size is unitized on a sentence. Each fragment of TMX contains many optional elements, storing fragment format information of font changing, hyperlinks etc. TMX also defines the footer and the index entry etc.

 

Fragment includes many identification content elements: (bpt), (ept), (it) and (ph), these elements can encapsulate the embedded code of the original language. Add additional identification which is not relevant to embedded code in (hi) element. (sub)elements used for encapsulating the embedded code can separate embedded text.

 

6. Implementation Level of TMX

 

According to the different requirements of implementing TMX, TMX contains two kinds of implementation levels which are primary and secondary levels, respectively supporting plain text and content markup.

 

In the supporting plain text only first level of TMX, the data in each fragment element () is plain text not including any content identification. Typically, if the data to be processed does not contain any embedded code, implementing the first level only is enough. In this case, because the format and other information in the text fragments have been lost, so translation content with fuzzy matching can be acquired. As to documents requiring rich formats, this processing is far from enough.

 

In the second level of TMX supporting content identification, localization tool supporting the second level allows text fragment to contain embedded code. In this level, TMX usually retains the following information: the text fragment contains embedded code; the location information of embedded code within the text fragment; some perfectly designed tools often record the type of embedded code such as bold or links etc. In order to achieve an exact match, most localization tools provide support for the second level. These localization tools support the functions of the second level of TMX. Using TMX content identification can only use TMX documents regenerating the translation versions of original document.

 

7. Development and Authentication of TMX

 

TMX is developed and maintained by OSCAR, a LISA Special Interest Group. The group’s main responsibility is to continually improve the feature contents of the standard, organizing TMX authentication, authorizing TMX logo and promoting the application of TMX in localization and globalization industry.

 

TMX is a continuously updated standard; the latest version released by OSCAR is 1.4b, released in October 2004. Compared with the previous versions, it updated the data format contents of TMX and added some new features.

 

While developing TMX standard, OSCAR Group, to ensure that the products of relevant tools developers met TMX specification, introduced the corresponding authentication mechanism. The various localization tools developed by different localization and translation tools developers must pass the TMX specification audit performed by a LISA designated third party laboratory, so these tools can have the TMX logo.

 

Passing TMX standard certification has become a symbol of becoming a by leading technology products, and it is the prerequisite in getting bigger market and more users. There are many localization and translation-aided softwares on the market passing TMX certification. To achieve the reuse and exchange of translation memory data resources, please choose TMX certified localization and translation-aided tools.

 

COPYRIGHT 2010 AITRANS, ALL RIGHTS RESERVED. 京ICP备9035536号

Hotline:86-010-82893875    E-mail:info@aitrans.net

registration number:京ICP备18027361号-2