Last night Tesla delivered FSD Beta 9.2 to the small group (around 2,000) of users in the Early Access Program. While it was a day late from the expected 2-week timetable set by Musk, it does now bed in the expectation we will see point releases each fortnight.
As the FSD Beta users began posting about version 2021.12.25.15, the Release Notes presented in the car, was essentially a repeat of the previous build. What is not covered in the public release notes, is an extensive list of what changed behind the scenes, to improvements regarding to FSD’s decision making and vehicle control.
Thankfully today, Musk provided the internal release notes for FSD Beta 9.2 which revealed some extensive changes, while also suggesting additional improvements are coming in future releases.
What’s most surprising here is just how extensive the architectural changes are. After months of delay between v8 and v9, I think most FSD owners had hoped the rewrites, like removing radar and moving to vision-only, were behind us and we’d simply start spinning the data engine to resolve edge cases.
Here’s the tweet from Elon, detailing the changes and while I’d love to see this level of detail on the public release notes, I can appreciate, most of the general public would simply be confused by this. Perhaps an advanced option in the release notes section could show the advanced technical information regarding what’s changed and help beta testers understand how their cars have changed. As we head towards a public beta, this may become more important.
While there are just 7 bullet points here, there is an awful lot to unpack in this tweet, so let’s break it down.
1. Clear-to-go boost through turns on minor-to-major roads (plan to expand to all roads in V9.3).
This is the first time we’ve heard Tesla use the ‘Clear-to-go boost’ phrase. This refers to the situations where the car makes a turn, using computer vision to identify there is a safe gap in traffic. The car proceeds along the planned path on to a section of road with a higher speed limit.
As you leave the minor road and enter the major road, we as humans would normally accelerate (or boost) to meet the speed limit or flow of traffic. This boost also has advantages like not impacting the approaching cars behind you after completing the turn.
Musk says the next release v9.3 of FSD Beta, which is due in 2 weeks, will apply this to all roads.
2. Improved peek behavior where we are smarter about when to go around the lead vehicle by reasoning about the causes for lead vehicle being slow.
Something unique about the FSD Beta compared to the public build of FSD, is that the car’s path planning uses driveable space, with a priority on lane-keeping and road laws. This means there are scenarios where the car will cross the lane lines to overtake vehicles stopped in the lane, where it is safe to do so.
After watching a number of the FSD Beta 9.1 videos on YouTube, it was clear there were times the car attempted to do this when it should have paused and waited instead.
This improvement to ‘peek behaviour’ suggests the car will now be smarter regarding the decision to go around vehicles ahead. Often we use the cues from the vehicle ahead (or lead vehicle) to guide our own driving.
If a car stops in front of us, it could be due to a number of factors. If a car is double-parked with its hazard lights on, it’s clear that car is unlikely to move anytime soon and we would drive around it. If the car ahead crosses the lane line to avoid a car turning to park, chances are, we should do the same. The same is true if the car ahead is avoiding an accident or road works, there’s a lot we can learn from the behaviour of the car ahead and its reasonable that most of the time we should follow their actions.
The goal here is to accurately assess the environment ahead and predict what the right course of action to take is. Stopping and waiting for a stopped car ahead is clearly wrong and would leave you frustrated as seconds turn into minutes and drivers behind you go around you, so this is a fine balance to strike.
3. v1 of the Multi-modal prediction for where other vehicles expected to drive. This is only partially consumed for now.
Multi-modal trajectory prediction is an understanding of the state of surrounding objects and possible states in the future. This means calculating potential trajectories for other vehicles, pedestrians, cyclists, animals and more, which serve as inputs for the car to make decisions about its path through the scenario and if the brakes need to be applied.
The second sentence about it only being partially consumed, for now, suggests that Tesla in the future will consume more of these potential scenarios. If the car is able to accurately predict when that pedestrian is going to step out in front of you, or if the driver of that recently stopped car is about to open their door into our path, then the Tesla can safely maneuver around it, keeping everyone safe.
The challenge in complex environments is how to be efficient with the compute necessary for these predictions. This is much more intense than simple where you are now and where is everything else around you, it requires the model to predict what will happen in the future and importantly learn when the prediction is wrong.
If we use the example of when you’re approaching an intersection with oncoming traffic and you wish to turn across that traffic. One scenario is that the oncoming car is going straight forward, another is that it also turns, one is that it stops and does nothing. If we take these three scenarios and run them to find the most probable outcome, based on millions of other scenarios just like it, we can make a great guess at the rate of deceleration, we need to apply to have that braking feel natural and turn without encountering that other car. It’s complex for computers, but something we do every day after years of driving, it becomes automatic.
4. New Lanes network with 50k more clips (almost double) from the new auto-labeling pipeline
Auto-labeling is a huge step forward from the human-labelled images that Tesla has used in the past to train their Neural Nets. Of course, Tesla doesn’t just have one Neural Net, they have many, a growing list, each focused on a different part of the autonomous driving challenge.
The concept of auto-labeling is something Tesla’s head of AI, Andrej Karpathy has spoken about before and is designed to dramatically speed up the process of training.
You can think of this like a question an answer. If we find a piece of footage from a drive, when the incident happens, you can go to a few seconds after that and find the answer. You then take the preceding few seconds of footage (the problem) and you keep playing the model different question and answer combinations around similar scenarios. Instead of humans detailing the answer (i.e. where the corner is), the frames in the video can act as the pipeline to answer the question of ‘what should I do here?’.
While in some instances, drivers may do the wrong thing, the vast majority will do the correct action in any given scenario, so feeding the NN lots of examples of the same event, it is possible to figure out, in an automated way, the correct action that should be taken.
In theory the more data, the better the resulting model (informed by the training) which gets pushed to the FSD Beta release and as Tesla move towards more video processing, rather than individual frames, it can provide more context each situation, as would surround video (using all 8 cameras, rather than just 1).
The reference to ‘New Lanes’ network using twice as many, some 50,000 more clips, suggests that Tesla are now able to scale our their Neural Net(s) in their data engine, running more pipelines simultaneously.
This doubling to 50k clips signals a step-change in compute capability, potentially a result of their Dojo supercomputer coming online, the detail of which should arrive at AI Day, scheduled for August 19th (US time).
5. New VRU velocity model with 12% improvement to velocity and better VRU clear-to-go performance. This is the first model trained with “Quantization-Aware-Training”, an improved technique to mitigate int8 quantization.
Vulnerable road users (VRU) are road users not in a car, bus or truck, generally considered to include pedestrians, bike riders and motorbike riders. When Tesla talks about 12% improvement to the velocity of these, this likely refer to how well the car tracks the object over time, likely measured in ms. As we discussed earlier, to understand the objects in the scene, particularly ones like VRUs that may have trajectories that intersect with ours, tracking them faster is always better.
When it comes to ‘clear-to-go’ that suggests that if a pedestrian is crossing in front of our car, the path is not clear and we would need to brake. Once they have crossed it’s clear to accurate again, but exactly when is the important part.
If FSD can more accurately predict the trajectory of the person (or people head), then it could better determine not only when it’s clear to go (i.e. they are on the sidewalk), but also calculate our distance from the crossing, would enable a smoother input to resume our path through the crossing. In short, ‘clear-to-go’ should leave enough room to safely allow the VRU to pass, but not overdo it and again there’s fine balance here.
Many state-of-the-art deep-learning models are too large and too slow and Tesla’s is a great example of that. As powerful as their HW3 platform is, it still has constraints including power, storage, memory, and processor speed.
Quantization reduces the model size by storing model parameters and performing computations with 8-bit integers instead of 32-bit floating-point numbers (hence the Int8 reference). By using QAT, improves model performance but has a downside of introducing errors in computation and reduces model accuracy. These errors accumulate with every operation needed to calculate the final answer.
If you’re interested in further info, there’s a great video from TensorFlow, Software Engineer Pulkit Bhuwalka on quantization aware training.
6. Enabled Inter-SoC synchronous compute scheduling between vision and vector space processes. Planner in the loop is happening in v10.
Tesla’s Full Self-Driving computer, also known as HW3 featured dual System-on-a-chip (SoC) for redundancy. If one fails, the other would have enough smarts to operate the vehicle, but while most are working fine, it makes sense that Tesla would leverage the performance or ‘synchronous compute’ of both chips.
In this point, Tesla explains that in FSD Beta 9.2 they have enabled Inter-SoC workloads to be computed across both SOCs. The best way to describe this is the process of taking all the objects around the car in the vision stack (read: camera inputs) are converted into the vector space. While we see the pretty version of that represented to us in the graphics on the screen, the car sees in 3D objects with bounding boxes.
The reference to Planner being in the loop at v10, likely relates to the process of generating path planning, rather than planning routes. Once the vector space is established, trajectories understood, then planning your path through that environment would be the next logical step. Distributing these potential path plans across the SoCs is likely to achieve some performance benefits.
7. Shadow mode for new crossing/merging targets network which will help improve VRU control
Finally, we come to Shadow Mode. This concept was first shown off during Autonomy Day, where Tesla explained they run simulations on your vehicle to imagine the ‘what if scenario’. This theorists what would the next (unreleased) code branch do in the same situation as the production build? Is it a better or worse decision. If better, then the fleet would report that change is positive and it would make it into the next build, if not, more training is required.
This shadow mode was first demonstrated in relation to a potential lane-change, but really could be applied to much of the decision making processes as part of an autonomous driving solution.
The last point suggests Tesla are using Shadow Mode now for a new network that deals with new crossing/merging targets.. aiming to improve VRU control. Regarding the crossing or merging part of this, imagine you have an incredibly busy intersection, maybe even a 4-way crossing where pedestrians can cross diagonally as they can in NSW.
Once the light crosses the car needs to determine if the path is clear, but if pedestrians cross behind each other or behind objects, they are obscured and that creates a problem for tracking their path, therefore creating a problem for predicting where they’ll emerge. With 9.2 it seems Tesla is using shadow mode to test if they have a better result for tracking Vulnerable Road Users.
Ok so with all that detail on this release, it raises the question again of a broader public beta release and when that could occur. Musk was asked what happens next with the build numbers as he had previously suggested a wide release could be on the cards for V10, but definitely by V11.
Musk says that 9.3 will be the next FSD Beta build (and as we know that should be 2 weeks away. This would be followed by 9.4, which again would be 2 weeks away, or around a month from now, placing us at mid-September.
After that, Musk suggests Version 10 is a maybe, but does preface that with the statement of ‘Significant architecture changes in 10’. That last line may put the damper on expectations (previously set by Musk) that we’d see a wider release with V10, as significant changes would certainly need to be tested by the early release group first.
This places FSD Beta v10 on a timeline for delivery around the end of September and as for what this big architectural rewrite could be, Musk has suggested that Navigate on Autopilot and Smart Summon also need to move over to this new vision-only approach, used by the City Streets driving of FSD Beta.
I think we learnt a lot today about the efforts being made towards autonomous Tesla’s, but really shows that we’re likely still some time away from this being feature-complete as predicted by the end of 2021.
We are now seeing FSD Beta 9.2 videos and the build genuinely looks more confident in its ability. While each video shows different locations, the testers now appear to experience routes they would take during daily drives with a greater than zero chance of a zero-intervention drive.