Reports of my death are greatly exaggerated – Translation Memory

  • 05 March 2011
  • by Peter Reynolds

Translation Memory (TM) has taken to wearing a Mark Twain white suit to honour the number of experts who predict it is already dead.

The latest comes from the LISA standard summit at the beginning of March where Jaap van der Meer of TAUS gave what seems to have been a very well received keynote. I was not at the conference and the information I got is second hand through twitter. What has got me writing this is the claim that within 5 years translation memory will no longer exist. Several gurus have also re-twitted the claim that TM will not exist in 5 years as it will all be corpus based.

One of the problems with not having been at the conference is it is difficult to understand what exactly was being said. Are they talking about segment based translation memory being replaced by corpus based translation memory? At present there are advantages in both which is why memoQ allows users to use both. It may be the case that segment based TM becomes less used in 5 years but that does not mean the death of TM.

They could of course mean a machine translation corpus and be suggesting that statistical machine translation is going to replace translation memory in 5 years. I do not think so. I remember at the Idiom WorldSummitt in Dublin in December 2006 Jaap was asked by Reinhard Schäler about the claim that MT will be useful commercially on a widespread basis within the next five years. Reinhard, despite his youthful looks, has been around our industry for more than five years and he said he had heard this claim made before and it was always five years away. Jaap told him he thought it was now three years. At the ATA conference in Denver last year Jaap was asked how long it would be before he would allow a surgeon operate on him using equipment where the medical instructions were translated using MT. Jaap is a wise man and would prefer a human translator in those circumstance but you will be happy to know that Jaap predicted that widespread commercial use of MT is going to happen soon. Whether it is in three years or five years it is always sometime in the future.

While there is greater use of MT commercially it is not yet widespread. There is far greater savings being made the use of TM technology than through MT. At present the only way for MT to replace TM is to dramatically reduce the accepted quality of commercial translation.

The problem I have with statistical MT is what many of translation industry gurus see as the Holy Grail, the very huge corpus. We are in situation where companies like Google have huge corpus. At the standards summit Jaap also said that the Internet was becoming a huge TM in its own right. Many in our industry believe that the bigger the better for this corpus and that at some stage within the next 3 years the corpus will get so big that we will have good quality statistical MT.

I am not so sure and I believe there is an optimum level with statistical MT where you get the best results. When you go beyond this level the quality of translation is reduced. There are more possibilities for errors in both the corpus and how systems manage it and even for creative sabotage. Google give translators and everyone else the opportunity to provide a better translation. While this will usually improve it, there is an opportunity to sabotage the corpus or at least make it funnier. Use Google Translate to translate ‘Quid Pro Quo’ from Latin to English for an example of this.

I think Jaap and others are right to suggest that translation technology including MT will develop faster than it has done. However, I think they are ignoring the biggest driver, business. The recession has made the translation industry more focussed on the bottom line. Technology needs to be able help translate more words for less but at the same quality. The technology which is at present doing most here is TM. It is my opinion that there is a translation industry groupthink which believes that TM is as far advanced as it can be. I do not believe this and memoQ's use of segment based and corpus based TM is an example of an important innovation as is the use of functionality for collaboration which many TM tools have added. It is my opinion that we can still get more out of TM technology and this is not yet a stale or dead technology. Clearly there is more happening with MT and there could be very interesting developments over the next 5 years. However, the groupthink of statistical MT good, rules based not so good is dangerous. The rules based and hybrid approaches are very important and I hope that there is a lot of development in these areas. An area often forgotten is terminology management. Use of smart technology to manage terminology is in my opinion, the area where the greatest business improvements can be made. There is an initial investment with terminology management but there are real gains to be made by preventing errors in a translated text. It is my opinion that this is where the most efficiency could be achieved.

The next five years will be interesting but I would advise anyone looking at improving the bottom line by introducing translation technology to makes sure they do not ignore TM and to also look carefully at terminology management.