Ola Bini is a Swedish developer working for ThoughtWorks. His daily job includes working on JRuby, starting up a Swedish ThoughtWorks office and mucking around with Java and Ruby. In his spare time he spends most time on his language Ioke, working on one of several other open source projects or reading science fiction. Ola has presented at numerous conferences, such as JavaOne, Javapolis, JAOO, RailsConf, TheServerSide Java Symposium and more. He is the author of APress book Practical JRuby on Rails Ola is a DZone MVB and is not an employee of DZone and has posted 45 posts at DZone. You can read more from them at their website. View Full User Profile

Should Languages be Multi-Lingual?

10.21.2009
| 3089 views |
  • submit to reddit

I’m currently sitting in the Beijing ThoughtWorks office, and for some reason language is on my mind… =) One of the discussions related to DDD that have turned up several times the last few months at conferences
is how you handle ubiquitous language when your domain is not in English. Since most programming languages are based on English, you end up mixing English and Swedish for example, if you are working with a Swedish domain. Of course, the benefits of working with these concepts in Swedish are very hard to argue against. But the dichotomy between the programming language and the domain language is definitely something that hurts my eyes, so I’m generally not very fond of that approach.

In fact, I haven’t heard anyone come up with a good solution to this problem, and this post is not really a solution either.

One of the things I’ve proposed to make this situation better is to create an external DSL that is fully in the domain language. The implementation of that DSL can then be implemented in English. The main benefit is that there is a clear separation.between the domain language and the programming language. On the other hand, the overhead of creating the DSL and also the complexities involved in translating the domain concepts into programming language concepts can become problematic too.

One interesting idea in Cucumber is the idea that you can easily add new natural languages to write the features in. When it comes to user stories at the level of testing that Cucumber provides, it’s really important to use the right language. So it got me thinking, could you use the same kind of approach in a general programming language too?

As an experiment I took a small example program for Ioke, and translated it into Mandarin, with simplified Chinese characters. Of course I used Google Translate for this, so the translation is probably not very good, but the end result is still interesting. I’m not going to try to get this into my blog, so take a look at the file at github instead: http://github.com/olabini/ioke/blob/master/examples/chinese/account.ik. As you can see there is nothing in there that even reeks of English. If you don’t understand Chinese characters it is probably hard to see what’s happening here. Basically an Account object is created, with a “transfer” method and a “print” method. Further down, two instances of this Account object is created, some transfers are made, and then the objects are printed. But provided my translation is not too crappy, this code should make sense to someone reading Chinese.

Now, this is actually extremely simple to implement in Ioke, since it relies on several of the features Ioke handles very easily. That everything is a message really helps, and having everything be first class means I can alias methods and things like that without any worry. Obviously your language also need to handle non-ascii identifiers correctly, but that should be standard in this day and age.

When thinking about it, something similar to do this can be created in languages like Lisp, Smalltalk, Factor, Io and Haskell - but most other languages would struggle. If you have keywords in your language, it’s really a killer - you would need to branch your parser to make it happen.

Of course, this approach only works when you can simply translate from one word to another. If the writing system is right to left, or top to bottom, it’s much more tricky to create a good translation.

I’m also not sure if this is actually a really good idea or not. It might be. The other thing I’ve been thinking about is how to handle multilingual editing. What if you want to be able to switch back and forth between languages? How can you handle identifiers with more than one name. Would you want to?

Lots of unanswered questions here. But it’s still funny to think about. Communication is the main goal, as usual.

From http://olabini.com/blog

Published at DZone with permission of Ola Bini, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Arek Stryjski replied on Wed, 2009/10/21 - 4:26am

I'm not a native English speaker myself. In the past I was working in multinational teams (in one case with 5 nationalities and not a single native English speaker) and the fact that the code was in Java, not German-Java, Norwegian-Java, Russian-Java etc. was very helpful. Even if it was difficult to communicate in plain English, we could always read each other code.
I don't think translating code has any advantages.

There is also question in which language we should learn programming languages.
Maybe to some people surprise, I'm not able to read Polish (my mother tongue) books about Java. When I started only English books where available, and when thinking and speaking about programming I just use English words even when talking in Polish.
Most programmers adopted this way to communicate. However some academics insist to not use this jargon, but pure Polish. In my opinion this makes they books less understandable for average developer.

To me the only open point in this discussion will be about language of specification and documentation.
Once I was in the project in Poland, with 100% Polish team, and for a Polish customer, and we used English for everything. There where some advantages for our employer in this case, but I also believe there where many disadvantages for a team.

In the end I think English is "lingua franca" of IT and it should stay this way.
To be a medical doctor you need to learn Latin, to be programmer you need to learn English. I don't have a problem with it.

Fab Mars replied on Wed, 2009/10/21 - 6:05am

I like this topic very much cause it's raised in any project with a non-anglo-saxon customer. And my return on experience on it is tha, most of the time, money matters drive you straight to english.

 

For the fun of it, I saw tricky situations with French a few times in past projects: Imagine nice things like method names with accents éèêàùô. Imagine english code with frequent french variable names and comments in arabic above. Sometimes a mix of all of this. Needless to say this happens in projects where the coding rules are not well clarified and where there's no serious code review. I saw that kind of things until 2005, and then less and less.

Indeed, in nowadays' globalization of the developments, the odds are that people who develop use a different language (and even alphabet) than the people who specify, which is different from the language (and alphabet) of the customer. You know like me that IT companies will commit to conduct developments in cheaper countries in order to get a lower bid price...or increase their margin. In that case it's very hard ot opt for something else than a 100% english usage, possibly using a glossary matching english business terms with the native language's one. We did that, it works but you have to review the code or good habits will be quickly lost :)

Needless to say that if a customer wants code and/or comments written in French, it's ok for me - the customer is always right - but he must be ready to have all his developments taking place in France, Belgium, Luxemburg, Québec or North Africa, and pay more than if the dev team was in India.

In any case, clear coding conventions must be written and enforced at all times.

 

I wouldn't advise a DSL in a specific language because of the overhead, and because I can bet my ass that, one day or another, a foreigner will have to have a look at it or support it.

The other way around, how about having language syntax using your own language? We've seen the VisualBasic example, thank you.

The flexibility offered by loke, lisp, etc, is really interesting but, in my area at least, customers want "a full blown enterprise language well established on the market, with no risk of having to pay for experts later like it's the case with Cobol now"...money concerns, again. So, 99% of mainstream projects developments will be in Java, .net, C(++) or PHP, Javascript, hence no chance of taking advantage of other languages' poossibilities.

Overall, IMO, in the Enterprise area, money concerns drive developments to 100% english and there's few you can to to curb it (provided you really want to). If you're doing a studies project, a homemade development or if your company insources all with its self employed team, you're welcome do what you want :)

Mike P(Okidoky) replied on Wed, 2009/10/21 - 11:25am

 Myself, I'm from Dutch background, and learned programming when the Dutch weren't big yet on translating everything to Dutch. Also, TV was (is) generally subtitled and unlike, say, the Germans, I got used to English sounds that way. I've lived in Canada for quite a while, and I have had a particularly easy time working in English.

But what about having to work with people that are not comfortable with English. Like many Chinese?

How about using a small translation tool that is able to pick up translation definitions from the source files.
Like so:
public class Tree extends JTree /* Tree = nl:Boom ge:Baum */
{
  private int name; /* name = nl:naam ge:nahm (?) */
}

etc. A tool could find these translation definitions and change the source code so the other guy can read it. It could be an Eclipse/Netbeans plugin that could do this on the fly. When checking the code back in (CVS, Git, whatever), it could automatically translate back to English.
This way, each person can comfortably work in their own language. New names that are added by people that don't know how to translate it could be added as:
private int #$@*(%$; /* #$@*(%$ = ? */
Which allows the translation tool to list words that aren't translated yet, which one of the team members that knows multiple languages can then help translate.

Danny Lee replied on Thu, 2009/10/22 - 8:34am

It's OK to produce multilingual DSLs for customisers and other non developers.

But multilingual programming language is a total overkill. If you want to lear a programming language you may be able to remember couple dozens keywords in English.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.