News

Test OpenAI’s o1 Models for Advanced Reasoning Tasks in STEM Fields, Now Available via API

Published

6 days ago

Test OpenAI’s o1 Models for Advanced Reasoning Tasks in STEM Fields, Now Available via API

OpenAI recently announced the release of a new family of large language models (LLMs) called “o1,” aimed at tasks related to science, technology, engineering, and math (STEM). This announcement surprised many, as there were expectations for a model named “Strawberry” or even GPT-5.

The o1 family introduces two models: the o1-preview and the less advanced o1-mini. These models are currently available to ChatGPT Plus users and developers through OpenAI’s paid API, enabling them to test the models in various applications, especially those requiring deep reasoning.

OpenAI describes the o1 models as having advanced reasoning capabilities, with the ability to “try different strategies, recognize mistakes, and engage in a full thinking process,” as explained by Michelle Pokrass, OpenAI’s API Tech Lead.

These models reportedly perform similarly to PhD students on challenging benchmarks, excelling in reasoning-related tasks, particularly when compared to the GPT series, according to Nikunj Handa, a Product Lead at OpenAI.

The o1 models are currently limited to text inputs and outputs, lacking multimodal capabilities like those found in GPT-4o, which can process image and file inputs. Additionally, the o1 models cannot connect to web browsing, meaning they rely on knowledge up to their training cutoff in October 2023.

Test OpenAI’s o1 Models for Advanced Reasoning Tasks in STEM Fields, Now Available via API

While slower to respond, often taking over a minute for outputs, developers with early access have reported significant improvements in coding tasks and drafting complex documents, suggesting these models could be valuable for specific applications despite their limitations.

OpenAI recommends that developers interested in reasoning tasks experiment with the o1 models, especially for complex problems that can tolerate longer response times. However, they caution that for tasks requiring faster responses or multimodal inputs, GPT-4o remains a better choice. Developers are encouraged to test o1-preview and o1-mini on tasks like coding challenges and provide feedback to OpenAI to improve the models.

Pricing for the o1 models is notably higher than other OpenAI models. The main o1-preview model is the most expensive, costing $15 per 1 million input tokens and $60 per 1 million output tokens, compared to GPT-4o’s $5 and $15 respectively. On the other hand, the o1-mini model is more affordable, priced at $3 per 1 million input tokens and $12 per 1 million output tokens. OpenAI plans to adjust pricing over time based on feedback and usage patterns.

The o1 models have a context limit of 128,000 tokens, comparable to GPT-4o, and can produce a maximum of 32,768 tokens in a single output, with o1-mini able to handle double that amount. Developers have already begun exploring various use cases, including generating white papers, optimizing staff schedules, and designing infrastructure, showcasing the models’ potential for complex, reasoning-intensive tasks.

In less than 24 hours since their release, developers have tested the o1 models for a variety of applications. These include generating detailed plans and white papers with citations, optimizing organizational workflows, creating apps and games quickly, and even completing request-for-proposal (RFP) documents autonomously. While still in its early stages, the o1 family has already proven its ability to tackle sophisticated reasoning tasks with high accuracy.

Developers can access the new o1 models through OpenAI’s public API, Microsoft Azure OpenAI Service, Azure AI Studio, and GitHub Models. While not suitable for all use cases, the o1 models offer exciting opportunities for developers working on complex, reasoning-driven applications. OpenAI plans to continue enhancing both the o1 family and the GPT series, giving developers ample tools to build new and innovative solutions.

In this article:

Click to comment

Tech

Threads Tests 24-Hour Timer for Ephemeral Posts, Enhancing Content Flexibility

Threads is experimenting with a new feature that allows users to set a 24-hour timer on their posts. After this period, the post and...

DrishtyAugust 26, 2024

Live2Diff - AI Transforms Live Video into Real-Time Stylized Content

Tech

Live2Diff – AI Transforms Live Video into Real-Time Stylized Content

A team of international researchers has developed Live2Diff, an AI system that transforms live video streams into stylized content in near real-time. Named for...

Mason HaleJuly 17, 2024

Innovations in AWS Enhance Generative AI for Enterprise Applications and Content Accuracy

Tech

Innovations in AWS Enhance Generative AI for Enterprise Applications and Content Accuracy

Amazon Web Services (AWS) recently unveiled several innovations aimed at enhancing the development and deployment of generative AI applications, addressing concerns around accuracy and...

Richie Dela CruzJuly 11, 2024

AU10TIX Exposes Admin Credentials, Potentially Compromising Client Data for Over a Year

News

AU10TIX Exposes Admin Credentials, Potentially Compromising Client Data for Over a Year

AU10TIX, an Israeli company that verifies IDs for clients like TikTok, X, and Uber, accidentally left important admin credentials exposed for over a year....

Richie Dela CruzJune 27, 2024

Gizmo Writeups

News

Test OpenAI’s o1 Models for Advanced Reasoning Tasks in STEM Fields, Now Available via API

Leave a Reply
Cancel reply

Leave a Reply

You May Also Like

Tech

Threads Tests 24-Hour Timer for Ephemeral Posts, Enhancing Content Flexibility

Tech

Live2Diff – AI Transforms Live Video into Real-Time Stylized Content

Tech

Innovations in AWS Enhance Generative AI for Enterprise Applications and Content Accuracy

News

AU10TIX Exposes Admin Credentials, Potentially Compromising Client Data for Over a Year

Leave a Reply Cancel reply

Leave a Reply

You May Also Like

Tech

Threads Tests 24-Hour Timer for Ephemeral Posts, Enhancing Content Flexibility

Tech

Live2Diff – AI Transforms Live Video into Real-Time Stylized Content

Tech

Innovations in AWS Enhance Generative AI for Enterprise Applications and Content Accuracy

News

AU10TIX Exposes Admin Credentials, Potentially Compromising Client Data for Over a Year

Leave a Reply
Cancel reply