Future versions of Tesla FSD will likely merge with XAI Grok. While Elon Musk has expressed concerns about building too much of his new AI ideas into Tesla unless he has 25% voting control, the need to make a competitive AI will force Tesla FSD to work with XAI Grok. This can be structured in a way where XAI retains more of the higher AI functions.
Large multimodal models and virtual agents are the future of large language models.
FSD v13 will be a LMM (Large Multimodal Model) or have one integrated deeply into it. John Gibb and Jim Fan describe how Grok and FS will merge.
Grok 1.5+ has clearly being trained on FSD data. This won't be a bolt-on situation where an LLM simply talks to the passenger, but deeply integrated into the processing pipeline itself. There will be chain-of-thought reasoning and the ability for LMMs to retain conversations in the form of long context windows.
John thinks Grok will be able to form memories of a sort concerning local driving and driver profiles, allowing the car to be very satisfactory to every driver. FSD should personalize for every driver via a straightforward inference time conversation with each locality and each driver.
FSD v13 will likely have an LMM (Large Multimodal Model) integrated deeply into it--and yes, almost definitely this will be Grok 1.5+ as it is clearly being trained on FSD data. This won’ be a bolt-on…
Dr Know it All- John Gibb
Jim Fan said the following
Tesla FSD v13 will likely be grokking language tokens. What excites me the most about Grok-1.5V is the potential to solve edge cases in self-driving. Using language for "chain of thought" will help the car break down a complex scenario, reason with rules and counterfactuals, and explain its decisions. What Grok-1.5V can help is to lift pixel->action mapping to pixel->language->action instead.
With @Tesla_AI's highly mature data pipeline, it is not hard to label tons of edge cases with high-quality human explanation traces, and finetune Grok to be far better than GPT-4V and Gemini for multimodal FSD reasoning.
There were previous efforts on similar ideas, such as LINGO-1 (Wayve). But Tesla is spinning an unparalleled data flywheel that could scale far beyond.
Nextbigfuture has a video describing the capabilities of XAI Grok and what it means for Tesla FSD.
Large Multimodal Models
There is a clear trend where all large language models include multi-modality. They have to handle images and video.
Tesla FSD and Grok 1.5V are both focused on video and image processing, understanding and reasoning and actions from the video input.
Here is a research paper surveying all of the large multimodal models (LMM).