next BIG future

next BIG future

Weekly Big Future in AI, Oct 2, 2024. OpenAI

By Brian Wang

NextBigFuture's avatar
NextBigFuture
Oct 03, 2024
∙ Paid

Big news this week in AI, AI startups, AI research and AI funding.

  1. AI Data center plans and nuclear power.

  2. OpenAI DevDay.

  3. AI Research

  4. Blackrock and Microsoft have a new $30 Billion AI Infrastructure fund

  5. Aptos partners with Ignition AI Accelerator (backed by NVIDIA and others)

  6. Major recent AI funding announcements including OpenAI

  1. AI Data center plans and nuclear power.

    OpenAI's Infrastructure Needs: OpenAI has discussed with the US government administration the need for massive AI data centers. Plans hinting at data centers each with a capacity of up to 5 GW (gigawatts), would scale AI data centers that are several times the current capacity of large cities. OpenAI wants to bjuild 5-7 of the super large AI data centers.

    Oracle's Zettascale Plans Oracle announced at an event (#OCW24) the creation of the first zettascale OCI Supercluster, powered by NVIDIA's Blackwell platform. This supercluster aims to allow customers to train and deploy next-generation AI models at an unprecedented scale. It could be years away.

    AI and Nuclear Power: There's a trend towards integrating AI data centers with nuclear power. META's AI chief mentioned building data centers next to nuclear power plants, suggesting a move to secure stable and high-capacity energy sources for AI operations. This comes alongside Microsoft's power purchase agreement for nuclear energy, signaling a broader industry move towards nuclear power for sustainability and capacity reasons.

  2. At OpenAI DevDay.

    Realtime API: This new feature allows for nearly real-time, speech-to-speech interactions, providing developers with the tools to create seamless voice-based applications. This API supports low-latency multimodal experiences, making voice interactions more natural and accessible.

    OpenAI announces Vision fine-tuning This allows developers to use images to fine tune models and improve results.

    OpenAI announces Prompt caching This makes developing your apps a lot cheaper, automatically giving a 50% discount for every input the model has already received.

    Model Distillation: This process allows developers to create smaller, more cost-effective models by distilling the knowledge from larger, more complex models like GPT-4o into smaller versions like GPT-4o mini. This makes deploying AI in resource-constrained environments more feasible without significantly compromising on capabilities.

    Cost Reductions: There's been a significant focus on making AI development more affordable. For instance, OpenAI announced a dramatic reduction in costs for using their APIs, which includes free training tokens for a limited time to encourage developers to experiment with the new features.

    Audio Input/Output in Chat Completions API: For developers not needing the real-time benefits of the Realtime API, OpenAI also enhanced the Chat Completions API with audio capabilities, allowing for voice interactions at a potentially lower cost.

  3. AI Research

    Agent Workflow Memory

    Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.

    Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

    When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%→10.9%) and CommonsenseQA (36.3%→47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

Keep reading with a 7-day free trial

Subscribe to next BIG future to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Nextbigfuture
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture