When Elon unveiled the Tesla Bot, he said, a humanoid robot will be useful if it can navigate through the world without being explicitly trained. Without explicit line-by-line instructions.. Can you talk to it with phrases like ‘please pick up that bolt and attach it to the car with that wrench’.. it should be able to do that’.
He went on to say that It should be able to understand ‘please go to the store and get me the following groceries’ and said, I think we can do that.
This challenge is immense and I think it’s fun to break down what exactly is required to enable something like that.
Firstly, there’s voice recognition to start. This means the bot needs to have a microphone and if you want it to be useful, you want a far-field microphone array, similar to what’s found in Amazon Alexa or Google Assistant devices that let you yell across the room, even with the TV on and still have it understand you with a high level of accuracy.
Once the Bot receives the command, it’ll hopefully process it locally, but if not, will need to leverage either a WiFi or cellular connection (probably 5G) to get additional information from the internet.
If we take the request in chunks, we first need to deal with the ‘go to the store’ portion. Ok, which store? To understand this, you either need to have the data point of your store preference in your Tesla account or better yet, the Bot would know where it is geographically (not hard the car already has its current location in the mobile app). This location is based on GPS, so again, there’s another chip not mentioned in the slides, but absolutely necessary.
Once the bot knows its current address, the voice-to-text parser can then chunk out the ‘Store’ and combine it with the subsequent product requests to understand we’re talking about products from a grocery store. This ‘Grocery Store’ could then be passed to query against the Map API and return the list of stores in that 5km area. If there is no data point around personal preference, the closest would be chosen.
With the current and destination locations known, a route can be planned between the two locations and the Bot could begin its journey. The routes Tesla use in the cars are naturally for cars to drive along, so a Bot would need a very different version of that, still leveraging the distance and turns, but the routes would need to enable the Bot to walk along sidewalks. These could be uneven, uphill, downhill, be wet and slippery, or have obstacles and people, prams, bikes, scooters, and similar to avoid, so this challenge alone is not an easy one.
When it gets to crossing roads, then things get tricky. The bot would need to understand traffic lights, signs, crosswalks and observe and adapt to any vehicles approaching. This opens up a wild number of possible situations for things to go wrong and again raises the uncomfortable issue of global differences and how long the bot will take to make its way outside the US, given the FSD Beta still has not.
Imagine Tesla solves all those challenges, and the bot makes its way to the supermarket. Entering the store isn’t exactly easy. The store has opening hours for a start, so presumably, the Bot checked that online before leaving home. At the store, many stores have doors that are automatic, closed at first, and open as you approach for a small period of time, before closing again. Then there are plenty of stores that require you to open the door manually, using a variety of handles, so that’ll be an interesting challenge and important core competency to have. Training on multiple handle types should be fairly easy with simulation.
Imagine the Bot makes its way into the store, looking for a list of specific products. Some stores offer an isle reference to where the product is located. The best chance of success is at a store that does this and the Tesla bot can look up the item online, get this isle number and then recognise the isle number in-store using computer vision.
From here the challenge gets really difficult. At our supermarkets, there are often signs that hang from sides of the isles to indicate broad product categories and if the Bot knew a product category, like honey was in the Spreads section, it could potentially walk down to that sign and start scanning the shelves top to bottom, for products that match the label design of the product online.
You can see how difficult this challenge is already, but then comes the hardest challenge of all. The Tesla Bot needs to position it’s feet correctly, adjust its weight distribution appropriately, then either squat to lower itself to get to products on the lower shelves, or stretch upwards and extend its arms to reach products on higher shelves. It then needs to rotate its wrist, open its fingers, grasp the object with enough force to hold it, without squashing and damaging the product. At no point can the product slip and fall to the ground. The product then needs to be placed in a basket, or trolley, which the Bot should have grabbed on the way in. Seriously the realities of executing on this are mental.
Now unless you’re shopping at an Amazon Go (Tesla Bots likely banned by Bezos), you’ll need to pay for these goods. The fastest way for a human with a handful of items is to use the express lane, scanning each item and then using some kind of payment method. The last time I looked, the Bot didn’t have pockets, so carrying your card won’t work, maybe tap-and-pay built into its hand? The other option to process the groceries is a human and well that’d be like serving the person that’s about to take your job.
Despite the complexity involved, if you break the problem down into its core components, it could be possible with a humanoid robot, the question is, how long will it take?
A Tesla Bot Skill Marketplace?
At the time you buy a Bot, it will be its most dumb. Over time, the Bot will need to learn and get smarter, more capable, but how that new capability is offered is really interesting.
There are a number of approaches to having the fleet of Tesla Bots learn from each other.
Option 1 – Tesla walled garden of Optimus
One option Tesla could take is a Tesla-controlled environment. This would be centrally managed where each week or each month, Tesla engineers train the Bot on one or more new skills. These new skills are then released to the fleet via an over-the-air update and customers can optionally enable the skill and a schedule if they wish.
If the ability for the Tesla Bot to learn is predicated on advanced, time-intensive processes like leveraging motion-cap suits, then it requires an expert to develop a new skill for the Bot.
This functionality may not be free, each skillset could come at a cost, and you enable it if it’s something that represents value to you.
This benefits Tesla in building out their Tesla services revenue while helping lower the up-front cost of the Bot and making it appeal to more customers. The economics of this business model are great, as they often are in software, the R&D to build the skill occurs once, is tested, then released to thousands of paying customers.
Given we’ve seen Tesla introduce subscription pricing for their FSD software in the US, potentially we could see per-skill subscriptions, helping again to lower up-front cost, while providing flexibility to the customer, particularly for skills used one month out of the year.
Option 2 – Customers do the training, sell on Skill Marketplace
If I buy a Tesla Bot and can teach it a new skill by saying ‘watch me’ and the Bot simply observes and is able to then repeat the activity, that’s transformative. This would give rise to a rapid escalation in skill development for the bot.
If customers can train the bot to perform new skills, these could then be sold on a Skill Marketplace. This marketplace could be similar to the Apple AppStore model, where the developer takes 70% and the platform (in this case Tesla) takes 30%.
The value and price of each skill are then likely determined by the free market, those with a lot of time and ability could train the Bot to do a range of useful tasks that are subsequently available for purchase by the worldwide fleet of bots.
These tasks could relate to sorting items by colour, or constructing Ikea furniture, or even simple dance moves for some entertainment.
I’m looking forward to learning more about the training interface with the Bot, which will ultimately determine how useful it is and how much Tesla can charge for the product.
With option 2, there could definitely be a code review to ensure the skill does what it says before being published and made available. The last thing you’d want is for someone to upload a skill that’s called ‘Paper, Rock, Scissors’ but actually causes your robot to do something completely different.
I think it’s safe to assume there’ll always be a stop command, either verbal or through the app, and that Tesla would have base-level instructions to avoid it from injuring itself.