New OpenAI o1 model

For German readers, the FAZ has an interesting article about the new OpenAI o1 LLM. Especially interesting as it is not a "newer, more expensive, better" model (emphases are mine):

o1 is in many ways a significant break with the LLM trend described above. Inference takes longer with o1. So the model feels slower. Much slower. Open AI humanises the longer computing time with "thinking". But why is o1 a break? Firstly, the model is not optimised for regular, run-of-the-mill requests such as "rephrase this email in a more professional tone". The now longer and more expensive "thinking time" compared to other models gives o1 new capabilities. It is better at logical tasks, such as maths or programming, than any other model. At the same time, it is no better, and often worse, at text formulation than classic LLMs such as Claude or GPT-4o.

For the first time, o1 is an LLM that can perform complex tasks better than simple tasks, even if the user accidentally puts the tasks in the same area. If you give o1 a simple task, Open AI warns, the model may 'think' too much about the solution and complicate the result. The LLM landscape as a whole is not intuitive, and with o1 this is exacerbated.

For us as Trados Studio users, this might mean we can be quite relaxed about not having an integration to this new model right away - OpenAI might introduce a move language-oriented variety of this LLM at a later point, just as GPT-3 and GPT-4 grew into families of LLMs with different abilities (and costs).

My personal opinion is that currently the real productivity boosts for translation processes are less in more powerful models, but in comprehensive application of the existing models (not even the most advanced ones) to reduce the "donkey-work" and highlight the areas where human translation and editing skills are most needed.

emoji
Parents
  •  

    My personal opinion is that currently the real productivity boosts for translation processes are less in more powerful models, but in comprehensive application of the existing models (not even the most advanced ones) to reduce the "donkey-work" and highlight the areas where human translation and editing skills are most needed.

    For now I tend to agree with you here.  I played a bit with the o1 model on coding tasks that have been difficult to solve with 4o, and I confess I don't see the real improvement here in the actual result.  For sure it takes longer... a lot longer... and it is very verbal in the way it goes about attempting to solve the tasks, but despite recognising the problem quite clearly it still outputs very similar incorrect responses.  I think this is most likely because there is a difference in being able to rationalise a problem and solve it compared to not having the knowledge of rules and syntax in the first place.  So for the things I have been testing I don't see the full value of this model yet... I am sure this is a limitation of my own tests/needs.

    A good example of this is AutoHotKey V2.  Most data available on the internet for training... documentation, sample code, forum discussions, github sites etc. is based on a decade of V1 work.  So OpenAI has a tendency to fall back to V1.  You can train a creative in ChatGPT Plus with the V2 documentation, and it is somewhat better... least in terms of attempting to always deliver V2 code... but it almost always gets syntax wrong, often gets conventions wrong, and often gets into a loop where you cannot get it to deliver the correct solution at all and have to rely on your own troubleshooting to find the right solution.

    Compare that with SQL scripts, or even PowerShell, and it really excels, rarely making mistakes as long as you frame the questions well.

    So for me, the speed of delivery with 4o is certainly preferable to working with o1 for now.  My current needs are mostly satisfied with what we have.  In fact even local models through HuggingFace do a pretty good job there too... the obvious benefit being I don't need to be online at all!

    Short to medium term I think we will see more applications of AI into our processes leaving the human in the loop to deal with tasks that will benefit from human expertise.  In the long term...... feel free to complete this sentence!

     

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •  

    My personal opinion is that currently the real productivity boosts for translation processes are less in more powerful models, but in comprehensive application of the existing models (not even the most advanced ones) to reduce the "donkey-work" and highlight the areas where human translation and editing skills are most needed.

    For now I tend to agree with you here.  I played a bit with the o1 model on coding tasks that have been difficult to solve with 4o, and I confess I don't see the real improvement here in the actual result.  For sure it takes longer... a lot longer... and it is very verbal in the way it goes about attempting to solve the tasks, but despite recognising the problem quite clearly it still outputs very similar incorrect responses.  I think this is most likely because there is a difference in being able to rationalise a problem and solve it compared to not having the knowledge of rules and syntax in the first place.  So for the things I have been testing I don't see the full value of this model yet... I am sure this is a limitation of my own tests/needs.

    A good example of this is AutoHotKey V2.  Most data available on the internet for training... documentation, sample code, forum discussions, github sites etc. is based on a decade of V1 work.  So OpenAI has a tendency to fall back to V1.  You can train a creative in ChatGPT Plus with the V2 documentation, and it is somewhat better... least in terms of attempting to always deliver V2 code... but it almost always gets syntax wrong, often gets conventions wrong, and often gets into a loop where you cannot get it to deliver the correct solution at all and have to rely on your own troubleshooting to find the right solution.

    Compare that with SQL scripts, or even PowerShell, and it really excels, rarely making mistakes as long as you frame the questions well.

    So for me, the speed of delivery with 4o is certainly preferable to working with o1 for now.  My current needs are mostly satisfied with what we have.  In fact even local models through HuggingFace do a pretty good job there too... the obvious benefit being I don't need to be online at all!

    Short to medium term I think we will see more applications of AI into our processes leaving the human in the loop to deal with tasks that will benefit from human expertise.  In the long term...... feel free to complete this sentence!

     

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children
No Data