Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:11,280 --> 00:00:14,360
Heat. Heat.
2
00:00:27,170 --> 00:00:30,319
[Music]
3
00:00:37,860 --> 00:00:40,079
[Music]
4
00:00:40,079 --> 00:00:41,240
What is
5
00:00:41,240 --> 00:00:48,160
[Music]
6
00:00:48,160 --> 00:00:49,830
this?
7
00:00:49,830 --> 00:00:54,140
[Music]
8
00:01:00,719 --> 00:01:03,719
out.
9
00:01:05,760 --> 00:01:08,869
[Applause]
10
00:01:10,799 --> 00:01:13,960
Take care.
11
00:01:17,960 --> 00:01:21,060
[Music]
12
00:01:23,840 --> 00:01:26,840
Yeah.
13
00:01:27,860 --> 00:01:30,930
[Music]
14
00:01:35,260 --> 00:01:43,249
[Music]
15
00:01:45,840 --> 00:01:48,840
Hey,
16
00:01:56,880 --> 00:01:58,820
hey, hey.
17
00:01:58,820 --> 00:02:09,019
[Music]
18
00:02:13,560 --> 00:02:18,629
[Music]
19
00:02:21,760 --> 00:02:24,160
You are feeling
20
00:02:24,160 --> 00:02:27,280
[Music]
21
00:02:27,280 --> 00:02:28,800
heat.
22
00:02:28,800 --> 00:02:31,800
Heat. Heat. N.
23
00:02:37,000 --> 00:02:56,099
[Music]
24
00:03:16,480 --> 00:03:20,080
[Music]
25
00:03:20,080 --> 00:03:23,159
Heat. Heat.
26
00:03:26,370 --> 00:03:42,669
[Music]
27
00:03:45,280 --> 00:03:48,360
Heat. Heat.
28
00:03:51,280 --> 00:03:54,280
Heat.
29
00:03:55,290 --> 00:04:00,169
[Music]
30
00:04:04,440 --> 00:04:09,889
[Music]
31
00:04:12,410 --> 00:04:14,560
[Music]
32
00:04:14,560 --> 00:04:15,700
Hey, heat. Hey, heat.
33
00:04:15,700 --> 00:04:24,699
[Music]
34
00:04:27,290 --> 00:04:48,319
[Music]
35
00:04:54,800 --> 00:04:57,040
In a world where knowledge shapes
36
00:04:57,040 --> 00:05:00,160
destiny, one creation dares to redefine
37
00:05:00,160 --> 00:05:03,199
the future. From the minds at XAI,
38
00:05:03,199 --> 00:05:06,400
prepare for Gro 4. This summer, the next
39
00:05:06,400 --> 00:05:09,520
generation arrives faster, smarter,
40
00:05:09,520 --> 00:05:12,320
bolder. It sees beyond the horizon,
41
00:05:12,320 --> 00:05:14,560
answers the unasked, and challenges the
42
00:05:14,560 --> 00:05:18,000
impossible. Gro 4 Unleash the Truth
43
00:05:18,000 --> 00:05:21,560
coming this summer.
44
00:05:23,280 --> 00:05:25,759
All right, welcome to the Gro 4 release
45
00:05:25,759 --> 00:05:29,600
here. Um, this is uh the smartest AI in
46
00:05:29,600 --> 00:05:30,800
the world and we're going to show you
47
00:05:30,800 --> 00:05:35,199
exactly how and why. Um, and uh, it
48
00:05:35,199 --> 00:05:37,120
really is
49
00:05:37,120 --> 00:05:39,280
remarkable to see the advancement of
50
00:05:39,280 --> 00:05:41,520
artificial intelligence, how quickly it
51
00:05:41,520 --> 00:05:46,960
is uh, evolving. Um,
52
00:05:46,960 --> 00:05:50,479
I I sometimes think compare it to the
53
00:05:50,479 --> 00:05:52,400
growth of
54
00:05:52,400 --> 00:05:55,919
a human and how fast a human learns and
55
00:05:55,919 --> 00:05:57,840
gains conscious awareness and
56
00:05:57,840 --> 00:06:02,240
understanding and AI is advancing
57
00:06:02,240 --> 00:06:07,360
just vastly faster than any human. Um,
58
00:06:07,360 --> 00:06:09,360
I mean, we're going to take you through
59
00:06:09,360 --> 00:06:13,600
a bunch of benchmarks that um that Grock
60
00:06:13,600 --> 00:06:17,039
4 is able to uh achieve incredible
61
00:06:17,039 --> 00:06:19,759
numbers on. Um, but it's it's it's
62
00:06:19,759 --> 00:06:22,720
actually worth noting that like Grock 4
63
00:06:22,720 --> 00:06:26,240
if if given like the SAT would get
64
00:06:26,240 --> 00:06:28,160
perfect SATs every time, even if it's
65
00:06:28,160 --> 00:06:30,800
never seen the uh the questions before.
66
00:06:30,800 --> 00:06:34,400
Um and if even going beyond that to say
67
00:06:34,400 --> 00:06:37,039
like uh graduate student exams like the
68
00:06:37,039 --> 00:06:41,360
GRE uh it will get near-perfect results
69
00:06:41,360 --> 00:06:46,240
in in every discipline of uh of
70
00:06:46,240 --> 00:06:48,720
education. So from the humanities to
71
00:06:48,720 --> 00:06:50,960
like languages, math, physics,
72
00:06:50,960 --> 00:06:54,080
engineering, pick anything and we're
73
00:06:54,080 --> 00:06:56,160
talking about questions that it's never
74
00:06:56,160 --> 00:06:57,840
seen before. These are not on not on the
75
00:06:57,840 --> 00:07:03,599
internet and it's Grofor is smarter than
76
00:07:03,599 --> 00:07:06,560
almost all graduate students uh in all
77
00:07:06,560 --> 00:07:09,360
disciplines simultaneously
78
00:07:09,360 --> 00:07:11,440
like it's actually just important to
79
00:07:11,440 --> 00:07:14,880
appreciate the like that's uh really
80
00:07:14,880 --> 00:07:16,400
something
81
00:07:16,400 --> 00:07:18,000
um
82
00:07:18,000 --> 00:07:21,599
and uh the the reasoning capabilities of
83
00:07:21,599 --> 00:07:24,400
Grock are incredible. Um, so there's
84
00:07:24,400 --> 00:07:26,160
some people out there who who think AI
85
00:07:26,160 --> 00:07:28,800
can't reason and look it it can reason
86
00:07:28,800 --> 00:07:33,680
at superhuman levels. Um, so yeah, and
87
00:07:33,680 --> 00:07:35,520
frankly it it only gets better from
88
00:07:35,520 --> 00:07:39,680
here. So we'll we'll take you through uh
89
00:07:39,680 --> 00:07:43,840
the Gro 4 release and uh
90
00:07:43,840 --> 00:07:47,599
yeah um show you like the pace of pace
91
00:07:47,599 --> 00:07:51,120
of progress here. Um like I guess the
92
00:07:51,120 --> 00:07:53,120
first part is like in terms of the
93
00:07:53,120 --> 00:07:56,479
training um we're going from Grock two
94
00:07:56,479 --> 00:07:59,199
to Grock 3 uh to Grock 4. We've
95
00:07:59,199 --> 00:08:01,039
essentially increased the training by an
96
00:08:01,039 --> 00:08:04,000
order of magnitude in each case. So it's
97
00:08:04,000 --> 00:08:05,919
uh you know a 100 times more training
98
00:08:05,919 --> 00:08:10,319
than than Grock 2. Um and uh and that
99
00:08:10,319 --> 00:08:14,960
that's only going to increase. Um so
100
00:08:14,960 --> 00:08:17,280
it's uh yeah frankly I mean I don't know
101
00:08:17,280 --> 00:08:21,039
in some ways a little terrifying but uh
102
00:08:21,039 --> 00:08:22,639
the growth of intelligence here is is
103
00:08:22,639 --> 00:08:23,440
remarkable.
104
00:08:23,440 --> 00:08:26,240
Yes it's important to realize there are
105
00:08:26,240 --> 00:08:27,759
two types of training compute. One is
106
00:08:27,759 --> 00:08:30,080
the pre-training compute that's from GR
107
00:08:30,080 --> 00:08:33,440
two to GR three. Um but for from GR
108
00:08:33,440 --> 00:08:35,760
three to GU four we're actually putting
109
00:08:35,760 --> 00:08:39,279
a lot of compute in reasoning in RL.
110
00:08:39,279 --> 00:08:42,159
Yeah. And uh just like you said this is
111
00:08:42,159 --> 00:08:43,680
literally the fastest moving field and
112
00:08:43,680 --> 00:08:45,600
Grock 2 is like the high school student
113
00:08:45,600 --> 00:08:47,839
by today's standard. If you look back in
114
00:08:47,839 --> 00:08:50,880
the last 12 months Grock 2 was only a
115
00:08:50,880 --> 00:08:53,519
concept. We didn't even have Gro 2 12
116
00:08:53,519 --> 00:08:56,480
months ago. Um and then by training GR 2
117
00:08:56,480 --> 00:08:58,080
that was the first time we scale up like
118
00:08:58,080 --> 00:09:00,160
the pre-training. We realized that if
119
00:09:00,160 --> 00:09:01,760
you actually do the data ablation really
120
00:09:01,760 --> 00:09:04,720
carefully and infra and also the
121
00:09:04,720 --> 00:09:06,880
algorithm, we can actually push the
122
00:09:06,880 --> 00:09:09,040
pre-training quite a lot by amount of
123
00:09:09,040 --> 00:09:12,080
10x to make the model the best
124
00:09:12,080 --> 00:09:14,080
pre-trained based model. And that's why
125
00:09:14,080 --> 00:09:16,000
we built Colossus the world's
126
00:09:16,000 --> 00:09:19,680
supercomputer with 100,000 H100 and then
127
00:09:19,680 --> 00:09:22,240
with the best pre-train model and we
128
00:09:22,240 --> 00:09:23,920
realized if you can collect these
129
00:09:23,920 --> 00:09:26,080
verifiable outcome reward you can
130
00:09:26,080 --> 00:09:27,360
actually train these model to start
131
00:09:27,360 --> 00:09:28,560
thinking from the first principle start
132
00:09:28,560 --> 00:09:30,399
to reason correct his own mistakes and
133
00:09:30,399 --> 00:09:32,800
that's where the graining comes from and
134
00:09:32,800 --> 00:09:35,360
today we ask the question what happens
135
00:09:35,360 --> 00:09:37,360
if you take the expansion of classes
136
00:09:37,360 --> 00:09:40,800
with all 200,000 GPUs put all these into
137
00:09:40,800 --> 00:09:44,720
RL 10x more compute than any of the
138
00:09:44,720 --> 00:09:46,560
models out there on reinforcement
139
00:09:46,560 --> 00:09:48,880
learning unprecedented scale what's
140
00:09:48,880 --> 00:09:52,080
going to happen. So this is a story of
141
00:09:52,080 --> 00:09:55,600
Gro 4 and uh you know Tony uh share some
142
00:09:55,600 --> 00:09:57,920
uh insight with the audience.
143
00:09:57,920 --> 00:10:00,560
Yeah. Um so yeah let's just talk about
144
00:10:00,560 --> 00:10:03,920
how smart GR 4 is. So I guess u we can
145
00:10:03,920 --> 00:10:05,839
start discussing this benchmark called
146
00:10:05,839 --> 00:10:08,480
humanities last exam and this this
147
00:10:08,480 --> 00:10:10,959
benchmark is a very very challenging
148
00:10:10,959 --> 00:10:13,760
benchmark. Every single problem is
149
00:10:13,760 --> 00:10:17,920
curated by subject matter experts. Um
150
00:10:17,920 --> 00:10:21,279
it's in total 2500 problems and it
151
00:10:21,279 --> 00:10:22,959
consists of many different subjects
152
00:10:22,959 --> 00:10:25,440
mathematics, natural sciences, uh
153
00:10:25,440 --> 00:10:27,519
engineering and also autohumity
154
00:10:27,519 --> 00:10:31,600
subjects. So um essentially when when it
155
00:10:31,600 --> 00:10:33,279
was first released actually like earlier
156
00:10:33,279 --> 00:10:36,000
this year uh most of the models out
157
00:10:36,000 --> 00:10:39,600
there can only get singledigit accuracy
158
00:10:39,600 --> 00:10:41,519
on this benchmark.
159
00:10:41,519 --> 00:10:43,200
Yeah. So we can we can look at some of
160
00:10:43,200 --> 00:10:46,720
th those examples. Um you know uh so
161
00:10:46,720 --> 00:10:49,200
there there is this mathematical problem
162
00:10:49,200 --> 00:10:51,920
which is about natural transformations
163
00:10:51,920 --> 00:10:54,560
in category theory and there's this
164
00:10:54,560 --> 00:10:56,480
organic chemistry problem that talks
165
00:10:56,480 --> 00:11:00,320
about uh electro cyclic reactions and
166
00:11:00,320 --> 00:11:02,399
also there's this linguistic problem
167
00:11:02,399 --> 00:11:04,160
that tries to ask you about
168
00:11:04,160 --> 00:11:06,079
distinguishing between closed and open
169
00:11:06,079 --> 00:11:09,760
syllabus uh from a hub Hebrew source
170
00:11:09,760 --> 00:11:13,440
text. So you can see also it's a very
171
00:11:13,440 --> 00:11:16,160
wide range of problems and every single
172
00:11:16,160 --> 00:11:19,360
problem is PhD or even advanced research
173
00:11:19,360 --> 00:11:20,800
level problems.
174
00:11:20,800 --> 00:11:23,200
Yeah. I mean the these there are no
175
00:11:23,200 --> 00:11:25,360
humans that can actually answer these
176
00:11:25,360 --> 00:11:26,800
can get a good score. I mean if you
177
00:11:26,800 --> 00:11:30,240
actually say like any given human um
178
00:11:30,240 --> 00:11:32,560
what like what's the best that any human
179
00:11:32,560 --> 00:11:36,480
could score? I mean I'd say maybe 5%
180
00:11:36,480 --> 00:11:38,079
optimistically.
181
00:11:38,079 --> 00:11:41,519
Yeah. Um, so this this is much harder
182
00:11:41,519 --> 00:11:43,760
than than what any any human can do.
183
00:11:43,760 --> 00:11:45,760
It's it's incredibly difficult and you
184
00:11:45,760 --> 00:11:47,200
can see from the the types of questions
185
00:11:47,200 --> 00:11:49,360
like you might be incredible in
186
00:11:49,360 --> 00:11:51,440
linguistics or mathematics or chemistry
187
00:11:51,440 --> 00:11:53,360
or physics or anyone of a number of
188
00:11:53,360 --> 00:11:55,680
subjects, but you're not going to be um
189
00:11:55,680 --> 00:11:58,800
at a postgrad level in everything. And
190
00:11:58,800 --> 00:12:01,120
Grock 4 is a postgrad level in
191
00:12:01,120 --> 00:12:03,680
everything. Like it's it's just some of
192
00:12:03,680 --> 00:12:05,680
these things are just worth repeating.
193
00:12:05,680 --> 00:12:09,839
like Grockport is postgraduate like PhD
194
00:12:09,839 --> 00:12:12,720
level in everything better than PhD but
195
00:12:12,720 --> 00:12:15,760
like most PhDs would fail so it's better
196
00:12:15,760 --> 00:12:18,000
said I mean at least with respect to
197
00:12:18,000 --> 00:12:20,639
academic questions it I want to just
198
00:12:20,639 --> 00:12:22,480
emphasize this point with respect to
199
00:12:22,480 --> 00:12:25,440
academic questions Gro is better than
200
00:12:25,440 --> 00:12:30,399
PhD level in every subject no exceptions
201
00:12:30,399 --> 00:12:33,600
um now this doesn't mean that it's it
202
00:12:33,600 --> 00:12:35,600
you know at times it may lack common
203
00:12:35,600 --> 00:12:38,480
sense and it has not yet invented new
204
00:12:38,480 --> 00:12:42,160
technologies or discovered new physics
205
00:12:42,160 --> 00:12:44,560
but that is just a matter of time.
206
00:12:44,560 --> 00:12:47,360
Um if it I I I think it may discover new
207
00:12:47,360 --> 00:12:48,959
technologies
208
00:12:48,959 --> 00:12:53,200
uh as soon as later this year. Um and I
209
00:12:53,200 --> 00:12:54,880
I would be shocked if it has not done so
210
00:12:54,880 --> 00:12:58,480
next year. So I would expect Grock to
211
00:12:58,480 --> 00:13:00,240
yeah literally discover new new
212
00:13:00,240 --> 00:13:01,920
technologies that are actually useful no
213
00:13:01,920 --> 00:13:03,600
later than next year and maybe enter
214
00:13:03,600 --> 00:13:06,240
this year. um and it might discover new
215
00:13:06,240 --> 00:13:09,200
physics next year and within two years
216
00:13:09,200 --> 00:13:11,600
I'd say almost certainly
217
00:13:11,600 --> 00:13:15,959
like so just let that sink in.
218
00:13:18,000 --> 00:13:21,000
Yeah.
219
00:13:22,079 --> 00:13:25,760
So yeah um how okay so I guess we can
220
00:13:25,760 --> 00:13:27,680
talk about the the what what's behind
221
00:13:27,680 --> 00:13:30,639
the scene of graph 4 as Jimmy mentioned
222
00:13:30,639 --> 00:13:33,519
uh we actually throwing a lot of compute
223
00:13:33,519 --> 00:13:36,240
into this training you know when it
224
00:13:36,240 --> 00:13:38,959
started it's only also a single digit
225
00:13:38,959 --> 00:13:42,399
sorry um the previous slide sorry yeah
226
00:13:42,399 --> 00:13:46,720
it's only a single digit u number but as
227
00:13:46,720 --> 00:13:48,399
you start putting in more and more
228
00:13:48,399 --> 00:13:50,800
training compute it started to gradually
229
00:13:50,800 --> 00:13:53,200
gradually become smarter and smarter and
230
00:13:53,200 --> 00:13:56,800
eventually solved a quarter of the HL
231
00:13:56,800 --> 00:14:00,639
problems and this is without any tools.
232
00:14:00,639 --> 00:14:03,519
The next thing we did was to adding a
233
00:14:03,519 --> 00:14:06,320
tools capabilities to the model and
234
00:14:06,320 --> 00:14:10,000
unlike Gth 3 I think G3 actually is able
235
00:14:10,000 --> 00:14:12,399
to use C as well but here we actually
236
00:14:12,399 --> 00:14:14,720
make it more native in the sense that we
237
00:14:14,720 --> 00:14:18,320
put the tools into training. Uh G3 was
238
00:14:18,320 --> 00:14:20,480
only relying on generalization. Here we
239
00:14:20,480 --> 00:14:22,560
actually put the tools into training and
240
00:14:22,560 --> 00:14:25,040
it turns out this significantly improves
241
00:14:25,040 --> 00:14:26,880
the model's capability of using those
242
00:14:26,880 --> 00:14:27,680
tools.
243
00:14:27,680 --> 00:14:30,480
Yeah, I remember we had like deep search
244
00:14:30,480 --> 00:14:31,440
back in the days.
245
00:14:31,440 --> 00:14:33,199
So how is this different?
246
00:14:33,199 --> 00:14:33,600
Yeah.
247
00:14:33,600 --> 00:14:35,440
Yeah. Exactly. So deep search was
248
00:14:35,440 --> 00:14:38,959
exactly the graph 3 reasoning model uh
249
00:14:38,959 --> 00:14:41,120
but without any specific training but we
250
00:14:41,120 --> 00:14:44,240
only asked it to use those tools. So
251
00:14:44,240 --> 00:14:46,720
compared to this, it was much weaker in
252
00:14:46,720 --> 00:14:48,880
terms of its tool capabilities
253
00:14:48,880 --> 00:14:50,079
and unreliable.
254
00:14:50,079 --> 00:14:51,600
And unreliable. Yes. Yes.
255
00:14:51,600 --> 00:14:53,279
And and to be clear, like these are
256
00:14:53,279 --> 00:14:55,120
still, I'd say, fairly this is still
257
00:14:55,120 --> 00:14:57,440
fairly primitive tool use. If you
258
00:14:57,440 --> 00:14:59,279
compare it to say the tools that I used
259
00:14:59,279 --> 00:15:02,240
at Tesla or SpaceX uh where you're using
260
00:15:02,240 --> 00:15:06,399
um you know finite element analysis and
261
00:15:06,399 --> 00:15:08,639
computational flow dynamics and and
262
00:15:08,639 --> 00:15:12,240
you're you're able to run uh or say like
263
00:15:12,240 --> 00:15:14,399
Tesla does like crash simulations where
264
00:15:14,399 --> 00:15:16,399
the simulations are so close to reality
265
00:15:16,399 --> 00:15:18,639
that if the test doesn't match the
266
00:15:18,639 --> 00:15:20,399
simulation you assume that the test
267
00:15:20,399 --> 00:15:22,160
article is wrong. That's how good the
268
00:15:22,160 --> 00:15:24,240
simulations are. So Grock is not
269
00:15:24,240 --> 00:15:26,560
currently using any of the tools the the
270
00:15:26,560 --> 00:15:28,320
really powerful tools that a company
271
00:15:28,320 --> 00:15:30,639
would use but but that is something that
272
00:15:30,639 --> 00:15:33,440
we will provide it with later this year.
273
00:15:33,440 --> 00:15:36,240
So it will have the tools that that a
274
00:15:36,240 --> 00:15:39,440
company has um and and have very
275
00:15:39,440 --> 00:15:42,000
accurate physics simulator. Um
276
00:15:42,000 --> 00:15:43,920
ultimately the the thing that will make
277
00:15:43,920 --> 00:15:46,000
the biggest difference is being able to
278
00:15:46,000 --> 00:15:47,360
interact with the real world via
279
00:15:47,360 --> 00:15:49,680
humanoid robots. So you combine sort of
280
00:15:49,680 --> 00:15:51,519
grock with with Optimus and it can
281
00:15:51,519 --> 00:15:53,279
actually interact with the real world
282
00:15:53,279 --> 00:15:55,839
and figure out if if it's hypo if it has
283
00:15:55,839 --> 00:15:59,199
if it's it can formulate an hypothesis
284
00:15:59,199 --> 00:16:01,680
and then confirm if that hypothesis is
285
00:16:01,680 --> 00:16:04,800
is true or not. Um
286
00:16:04,800 --> 00:16:07,920
so we're really you know think about
287
00:16:07,920 --> 00:16:09,360
like where we are today. We're we're at
288
00:16:09,360 --> 00:16:12,800
the beginning of an immense intelligence
289
00:16:12,800 --> 00:16:14,959
explosion. We're in we're in the
290
00:16:14,959 --> 00:16:17,759
intelligence big bang right now.
291
00:16:17,759 --> 00:16:19,360
Um
292
00:16:19,360 --> 00:16:21,279
and the mo we're at the most interesting
293
00:16:21,279 --> 00:16:26,800
time to be alive of any time in history.
294
00:16:26,800 --> 00:16:29,120
Yeah. Now that said, we need to make
295
00:16:29,120 --> 00:16:34,160
sure that the AI is um a good AI.
296
00:16:34,160 --> 00:16:37,680
Uh good Grock. Um and the the thing that
297
00:16:37,680 --> 00:16:39,920
I think is most important for AI safety,
298
00:16:39,920 --> 00:16:42,160
at least my biological neural net tells
299
00:16:42,160 --> 00:16:45,199
me the most important thing for AI is to
300
00:16:45,199 --> 00:16:47,360
be maximally truth seeeking.
301
00:16:47,360 --> 00:16:50,880
So this is this is a very fundamental
302
00:16:50,880 --> 00:16:54,720
um like you can think of AI as this
303
00:16:54,720 --> 00:16:56,880
super genius child that ultimately will
304
00:16:56,880 --> 00:16:59,279
outsmart you but you can still in you
305
00:16:59,279 --> 00:17:03,519
can instill the right values um and
306
00:17:03,519 --> 00:17:07,919
encourage it to be sort of you know
307
00:17:07,919 --> 00:17:10,319
uh truthful
308
00:17:10,319 --> 00:17:12,079
uh
309
00:17:12,079 --> 00:17:14,000
I don't know honorable you know good
310
00:17:14,000 --> 00:17:17,120
good things like
311
00:17:17,360 --> 00:17:19,199
the values you want to instill in a
312
00:17:19,199 --> 00:17:23,199
child that that that grow would grow
313
00:17:23,199 --> 00:17:24,400
ultimately grow up to be incredibly
314
00:17:24,400 --> 00:17:28,839
powerful. Um
315
00:17:29,440 --> 00:17:31,840
yeah.
316
00:17:31,840 --> 00:17:34,400
So yeah, so these this is really I say
317
00:17:34,400 --> 00:17:36,640
we say we say tools these are still
318
00:17:36,640 --> 00:17:39,039
primitive tools not the kind of tools
319
00:17:39,039 --> 00:17:42,640
that um that uh serious commercial
320
00:17:42,640 --> 00:17:45,120
companies use but we will provide it
321
00:17:45,120 --> 00:17:47,440
with those tools and uh I think it will
322
00:17:47,440 --> 00:17:49,520
be able to solve with those tools real
323
00:17:49,520 --> 00:17:51,200
world technology problems. In fact I'm
324
00:17:51,200 --> 00:17:52,559
I'm certain of it. It's just a question
325
00:17:52,559 --> 00:17:54,400
of how long it takes.
326
00:17:54,400 --> 00:17:56,960
Yes. Yes, exactly. Um,
327
00:17:56,960 --> 00:17:58,960
so is it just compute all you need,
328
00:17:58,960 --> 00:18:00,400
Tony?
329
00:18:00,400 --> 00:18:00,960
Right.
330
00:18:00,960 --> 00:18:02,799
Is it just compute all you need at this
331
00:18:02,799 --> 00:18:04,240
point?
332
00:18:04,240 --> 00:18:07,039
Well, you you need compute plus plus uh
333
00:18:07,039 --> 00:18:08,400
the right tools.
334
00:18:08,400 --> 00:18:08,880
Mhm.
335
00:18:08,880 --> 00:18:11,760
And um and then ultimately to be able to
336
00:18:11,760 --> 00:18:13,200
interact with physical world.
337
00:18:13,200 --> 00:18:17,600
Yes. Um, and then
338
00:18:17,600 --> 00:18:19,679
I mean we'll effectively have an economy
339
00:18:19,679 --> 00:18:21,679
that is
340
00:18:21,679 --> 00:18:24,480
well ultimately an economy that is
341
00:18:24,480 --> 00:18:26,000
thousands of times bigger than our
342
00:18:26,000 --> 00:18:28,320
current economy or maybe millions of
343
00:18:28,320 --> 00:18:29,520
times. Mhm.
344
00:18:29,520 --> 00:18:30,880
I mean if you if you think of
345
00:18:30,880 --> 00:18:34,000
civilization as percentage completion of
346
00:18:34,000 --> 00:18:38,240
the kadesv scale where kadeshv one is
347
00:18:38,240 --> 00:18:40,960
using all the energy output of a planet
348
00:18:40,960 --> 00:18:43,440
and kv two is using all the energy
349
00:18:43,440 --> 00:18:45,840
output of a sun and three is all the
350
00:18:45,840 --> 00:18:48,720
energy output of a galaxy. We're we're
351
00:18:48,720 --> 00:18:51,440
only in my opinion probably like close
352
00:18:51,440 --> 00:18:55,919
closer to 1% of kartev one um than we
353
00:18:55,919 --> 00:19:00,000
are to 10%. So like maybe a point or one
354
00:19:00,000 --> 00:19:04,400
one or two percent of kadeshv one. So
355
00:19:04,400 --> 00:19:07,919
um we we will get to
356
00:19:07,919 --> 00:19:11,520
um most of the way like 80 90% kadeshv
357
00:19:11,520 --> 00:19:13,360
one and then hopefully if civilization
358
00:19:13,360 --> 00:19:15,039
doesn't
359
00:19:15,039 --> 00:19:18,720
self annihilate and then kadeshv 2 like
360
00:19:18,720 --> 00:19:21,520
it's the the actual notion of of a human
361
00:19:21,520 --> 00:19:23,679
economy assuming civilization continues
362
00:19:23,679 --> 00:19:26,880
to progress will seem very quaint. Um,
363
00:19:26,880 --> 00:19:28,640
in in retrospect,
364
00:19:28,640 --> 00:19:32,000
um, it will it will seem like uh
365
00:19:32,000 --> 00:19:34,240
sort of cavemen throwing sticks into a
366
00:19:34,240 --> 00:19:37,280
fire uh level of economy
367
00:19:37,280 --> 00:19:38,960
um, compared to what the future will
368
00:19:38,960 --> 00:19:40,160
hold.
369
00:19:40,160 --> 00:19:42,480
Um, I mean, it's very exciting. I mean,
370
00:19:42,480 --> 00:19:45,760
I I've been at times kind of worried
371
00:19:45,760 --> 00:19:48,400
about like, well,
372
00:19:48,400 --> 00:19:51,520
you know, is this this this seems like
373
00:19:51,520 --> 00:19:53,600
it's it's somewhat unnerving to have
374
00:19:53,600 --> 00:19:56,160
intelligence created that is far greater
375
00:19:56,160 --> 00:19:57,679
than our own.
376
00:19:57,679 --> 00:20:01,440
Um, and will this be bad or good for
377
00:20:01,440 --> 00:20:04,480
humanity? Um,
378
00:20:04,480 --> 00:20:06,880
it's like I I I think it'll be good.
379
00:20:06,880 --> 00:20:11,120
Most likely it'll be good. Um,
380
00:20:11,120 --> 00:20:14,120
yeah.
381
00:20:14,480 --> 00:20:16,320
Yeah. But I somewhat reconciled myself
382
00:20:16,320 --> 00:20:19,919
to the fact that even if I even even if
383
00:20:19,919 --> 00:20:22,080
it wasn't going to be good, I'd at least
384
00:20:22,080 --> 00:20:25,840
like to be alive to see it happen.
385
00:20:25,840 --> 00:20:29,440
So, yeah.
386
00:20:29,440 --> 00:20:32,240
So, actually one one
387
00:20:32,240 --> 00:20:35,280
Yeah. Yeah. I think one one technical
388
00:20:35,280 --> 00:20:36,799
problem that we still need to solve
389
00:20:36,799 --> 00:20:39,280
besides just compute is how do we
390
00:20:39,280 --> 00:20:42,720
unblock the data uh data um bottleneck u
391
00:20:42,720 --> 00:20:45,360
because um when we try to scale up the
392
00:20:45,360 --> 00:20:49,280
RL uh in this case we did invent a lot
393
00:20:49,280 --> 00:20:52,559
of new techniques innovations to allow
394
00:20:52,559 --> 00:20:55,360
us to figure out how to find a lot of a
395
00:20:55,360 --> 00:20:57,600
lot of challenging RL problems to work
396
00:20:57,600 --> 00:20:59,520
on. It's not just the problem itself
397
00:20:59,520 --> 00:21:01,280
needs to be challenging but also it
398
00:21:01,280 --> 00:21:03,840
needs to be um you also need to have
399
00:21:03,840 --> 00:21:06,480
like a reliable uh signal to tell the
400
00:21:06,480 --> 00:21:08,640
model you did it wrong you did it right.
401
00:21:08,640 --> 00:21:10,080
This is the sort of the principle of
402
00:21:10,080 --> 00:21:13,280
reinforcement learning and as the models
403
00:21:13,280 --> 00:21:16,080
get smarter and smarter the number of
404
00:21:16,080 --> 00:21:17,600
cool problem or challenging problems
405
00:21:17,600 --> 00:21:19,440
will be less and less.
406
00:21:19,440 --> 00:21:19,840
Yeah.
407
00:21:19,840 --> 00:21:22,480
So it's going to be a new type of
408
00:21:22,480 --> 00:21:23,919
challenge that we need to surpass
409
00:21:23,919 --> 00:21:25,520
besides just compute. Yeah.
410
00:21:25,520 --> 00:21:28,240
Yeah. And we actually are running out of
411
00:21:28,240 --> 00:21:30,960
of of actual test questions to ask. Uh
412
00:21:30,960 --> 00:21:33,600
so there's like even ridiculously
413
00:21:33,600 --> 00:21:35,120
questions that are ridiculously hard if
414
00:21:35,120 --> 00:21:37,360
not essentially impossible for humans
415
00:21:37,360 --> 00:21:41,039
that are written down questions um are
416
00:21:41,039 --> 00:21:44,080
uh becoming swiftly becoming trivial for
417
00:21:44,080 --> 00:21:45,520
for AI.
418
00:21:45,520 --> 00:21:48,640
Um so then there's
419
00:21:48,640 --> 00:21:50,880
um but you know the the one thing that
420
00:21:50,880 --> 00:21:53,360
is an excellent judge of things is
421
00:21:53,360 --> 00:21:56,960
reality. So because physics is the law
422
00:21:56,960 --> 00:21:58,000
ultimately everything else is a
423
00:21:58,000 --> 00:22:00,240
recommendation you can't break physics.
424
00:22:00,240 --> 00:22:02,799
Um so the ultimate test I think for
425
00:22:02,799 --> 00:22:06,159
whether an AI is um the the ultimate
426
00:22:06,159 --> 00:22:08,159
reasoning test is reality.
427
00:22:08,159 --> 00:22:08,640
Yes.
428
00:22:08,640 --> 00:22:10,880
So you invent a new technology like say
429
00:22:10,880 --> 00:22:13,520
improve the design of a car or a rocket
430
00:22:13,520 --> 00:22:17,440
or um create a new medication
431
00:22:17,440 --> 00:22:20,240
uh that and and and does it work?
432
00:22:20,240 --> 00:22:20,559
Yeah.
433
00:22:20,559 --> 00:22:22,960
Um does does the rocket get to orbit?
434
00:22:22,960 --> 00:22:25,760
Does the does the car drive? Does the
435
00:22:25,760 --> 00:22:28,240
medicine work? Whatever the case may be.
436
00:22:28,240 --> 00:22:30,880
Um, reality is the ultimate judge here.
437
00:22:30,880 --> 00:22:32,559
Um, so it's it's going to be a
438
00:22:32,559 --> 00:22:34,159
reinforcement learning closing loop
439
00:22:34,159 --> 00:22:37,400
around reality.
440
00:22:39,280 --> 00:22:41,120
We asked the question how do we even go
441
00:22:41,120 --> 00:22:44,960
further? So um actually we are thinking
442
00:22:44,960 --> 00:22:47,919
about now with single agent we are able
443
00:22:47,919 --> 00:22:50,240
to solve 40% of the problem.
444
00:22:50,240 --> 00:22:52,559
What if we have multiple agents running
445
00:22:52,559 --> 00:22:54,880
at the same time? So this is what's
446
00:22:54,880 --> 00:22:58,000
called test and compute. And as we scale
447
00:22:58,000 --> 00:23:00,640
up the test and compute actually we are
448
00:23:00,640 --> 00:23:03,440
able to solve almost more than 50% of
449
00:23:03,440 --> 00:23:05,919
the uh text only subset of the h
450
00:23:05,919 --> 00:23:08,720
problems. So it's a remarkable
451
00:23:08,720 --> 00:23:10,159
achievement I think. Yeah.
452
00:23:10,159 --> 00:23:11,760
Yeah. This is this is this is insanely
453
00:23:11,760 --> 00:23:14,320
difficult. These are it's it's so what
454
00:23:14,320 --> 00:23:16,000
we're saying is like a majority of the
455
00:23:16,000 --> 00:23:20,559
of the of the textbased um of humanities
456
00:23:20,559 --> 00:23:22,640
you know scarily named humanities last
457
00:23:22,640 --> 00:23:25,600
exam um GR 4 can solve and you and you
458
00:23:25,600 --> 00:23:28,240
can try it out for yourself. Um and the
459
00:23:28,240 --> 00:23:30,080
with the GR four heavy what what it does
460
00:23:30,080 --> 00:23:32,799
is it spawns multiple agents in parallel
461
00:23:32,799 --> 00:23:36,000
and uh all of those agents do do work
462
00:23:36,000 --> 00:23:37,919
independently and then they compare
463
00:23:37,919 --> 00:23:41,360
their work and they they decide which
464
00:23:41,360 --> 00:23:44,159
one like it's like a study group. Um and
465
00:23:44,159 --> 00:23:46,799
it's not as simple as a majority vote
466
00:23:46,799 --> 00:23:49,360
because often only one of the agents
467
00:23:49,360 --> 00:23:51,679
actually figures out the trick or
468
00:23:51,679 --> 00:23:54,880
figures out the solution. Um and and and
469
00:23:54,880 --> 00:23:57,120
but once they share the the trick or or
470
00:23:57,120 --> 00:23:59,520
or figure out what what the real nature
471
00:23:59,520 --> 00:24:02,000
of the problem is, they share that uh
472
00:24:02,000 --> 00:24:04,080
solution with the other agents and then
473
00:24:04,080 --> 00:24:05,520
they compare they essentially compare
474
00:24:05,520 --> 00:24:08,640
notes and then and and then yield uh
475
00:24:08,640 --> 00:24:10,080
yield an answer. Yeah.
476
00:24:10,080 --> 00:24:11,760
So that's that's the the heavy part of
477
00:24:11,760 --> 00:24:15,279
Grock 4 is is where we you scale up the
478
00:24:15,279 --> 00:24:16,960
test time compute by roughly an order of
479
00:24:16,960 --> 00:24:21,039
magnitude. Uh have multiple agents uh
480
00:24:21,039 --> 00:24:22,960
tackle the task and then they compare
481
00:24:22,960 --> 00:24:25,919
their work and they they put forward
482
00:24:25,919 --> 00:24:29,279
what they think is the best result.
483
00:24:29,279 --> 00:24:32,240
Yeah. So we will introduce graph 4 and
484
00:24:32,240 --> 00:24:33,760
graph for heavy. Sorry, can you click
485
00:24:33,760 --> 00:24:35,440
the next slide? Yeah.
486
00:24:35,440 --> 00:24:39,600
Yes. So yeah. So basically graph 4 is
487
00:24:39,600 --> 00:24:41,279
the single version a single agent
488
00:24:41,279 --> 00:24:43,840
version and graph for heavy is the
489
00:24:43,840 --> 00:24:46,240
multi- aent version.
490
00:24:46,240 --> 00:24:48,960
So let's take a look how they actually
491
00:24:48,960 --> 00:24:51,520
do on those exam problems and also some
492
00:24:51,520 --> 00:24:53,039
real real life problems.
493
00:24:53,039 --> 00:24:54,799
Yeah. So we're going to start out here
494
00:24:54,799 --> 00:24:56,480
and we're actually going to look at one
495
00:24:56,480 --> 00:24:58,960
of those HLE problems. This is uh
496
00:24:58,960 --> 00:25:01,200
actually one of the easier math ones. Uh
497
00:25:01,200 --> 00:25:02,799
I don't really understand it very well.
498
00:25:02,799 --> 00:25:05,039
I'm not that smart. But I can launch
499
00:25:05,039 --> 00:25:06,960
this job here and we can actually see
500
00:25:06,960 --> 00:25:08,400
how it's going to go through and start
501
00:25:08,400 --> 00:25:10,720
to think about this problem. Uh while
502
00:25:10,720 --> 00:25:12,320
we're doing that, I also want to show a
503
00:25:12,320 --> 00:25:13,600
little bit more about like what this
504
00:25:13,600 --> 00:25:16,240
model can do and launch a uh Gro 4 heavy
505
00:25:16,240 --> 00:25:19,039
as well. So everyone knows Poly Market.
506
00:25:19,039 --> 00:25:21,279
Um it's extremely interesting. It's the
507
00:25:21,279 --> 00:25:23,520
you know seeker of truth. It aligns with
508
00:25:23,520 --> 00:25:26,240
what reality is most of the time. And
509
00:25:26,240 --> 00:25:27,840
with Grock, what we're actually looking
510
00:25:27,840 --> 00:25:30,240
at is being able to see how we can try
511
00:25:30,240 --> 00:25:32,880
to take these markets and see if we can
512
00:25:32,880 --> 00:25:35,520
predict the the future as well. So, as
513
00:25:35,520 --> 00:25:37,760
we're letting this run, we'll see how uh
514
00:25:37,760 --> 00:25:40,720
Gro for Heavy goes about uh predicting
515
00:25:40,720 --> 00:25:43,200
the, you know, the World Series odds for
516
00:25:43,200 --> 00:25:45,679
like the current teams in the MLB. And
517
00:25:45,679 --> 00:25:46,880
while we're waiting for these to
518
00:25:46,880 --> 00:25:48,000
process, we're going to pass it over to
519
00:25:48,000 --> 00:25:49,279
Eric, and he's going to show you an
520
00:25:49,279 --> 00:25:53,320
example of his.
521
00:25:54,559 --> 00:25:57,919
Yeah. So uh I guess one of the coolest
522
00:25:57,919 --> 00:26:01,840
things about uh Gro 4 is its ability to
523
00:26:01,840 --> 00:26:04,320
understand the world and to solve hard
524
00:26:04,320 --> 00:26:06,720
problems by leveraging tools like Tony
525
00:26:06,720 --> 00:26:08,960
discussed. And I think one kind of cool
526
00:26:08,960 --> 00:26:11,760
example of this we asked it to generate
527
00:26:11,760 --> 00:26:15,120
a visualization of two black holes
528
00:26:15,120 --> 00:26:18,640
colliding. Um and of course you know it
529
00:26:18,640 --> 00:26:21,039
took some there are some liberties. It's
530
00:26:21,039 --> 00:26:22,880
in my case actually pretty clear in its
531
00:26:22,880 --> 00:26:24,240
thinking trace about what these
532
00:26:24,240 --> 00:26:26,480
liberties are. Uh for example, in order
533
00:26:26,480 --> 00:26:28,000
for it to actually be visible, you need
534
00:26:28,000 --> 00:26:30,559
to really exaggerate the the scale of
535
00:26:30,559 --> 00:26:37,039
the you know the uh the uh the waves.
536
00:26:37,039 --> 00:26:39,840
And yeah, so here's like you know this
537
00:26:39,840 --> 00:26:43,840
kind of inaction. Um it exaggerates the
538
00:26:43,840 --> 00:26:46,000
scale in like multiple ways. Uh it drops
539
00:26:46,000 --> 00:26:49,760
off a bit less in terms of amplitude. um
540
00:26:49,760 --> 00:26:53,360
over distance and uh but yeah we can
541
00:26:53,360 --> 00:26:56,640
kind of see uh the basic effects that
542
00:26:56,640 --> 00:26:58,400
you know are actually like you know
543
00:26:58,400 --> 00:27:01,360
correct. It starts with the inspiral, it
544
00:27:01,360 --> 00:27:04,799
merges uh and then you have um the ring
545
00:27:04,799 --> 00:27:08,880
down and like this is basically um
546
00:27:08,880 --> 00:27:13,520
largely correct um yeah uh modulo some
547
00:27:13,520 --> 00:27:15,760
of the simplifications that need to do
548
00:27:15,760 --> 00:27:17,520
um you know it it's actually quite
549
00:27:17,520 --> 00:27:19,440
explicit about this you know it uses
550
00:27:19,440 --> 00:27:22,400
like post postnonian approximations
551
00:27:22,400 --> 00:27:24,400
instead of actually like computing the
552
00:27:24,400 --> 00:27:27,440
general relativistic effects at like uh
553
00:27:27,440 --> 00:27:29,200
near the center of the black hole which
554
00:27:29,200 --> 00:27:31,679
is you know incorrect um and you know
555
00:27:31,679 --> 00:27:33,840
will lead to you know some incorrect
556
00:27:33,840 --> 00:27:35,520
results but the overall you know
557
00:27:35,520 --> 00:27:39,039
visualization is uh yeah is basically
558
00:27:39,039 --> 00:27:41,600
there um and you can actually look at
559
00:27:41,600 --> 00:27:43,360
the kinds of resources that it
560
00:27:43,360 --> 00:27:47,360
references. So here it um it actually
561
00:27:47,360 --> 00:27:49,440
you know it obviously uses search it
562
00:27:49,440 --> 00:27:51,200
gathers results from a bunch of links
563
00:27:51,200 --> 00:27:53,760
but also reads through a undergraduate
564
00:27:53,760 --> 00:27:57,600
text in analytical analytic
565
00:27:57,600 --> 00:28:02,480
gravitational wave models. Uh it um yeah
566
00:28:02,480 --> 00:28:07,200
it um reasons quite a bit about uh uh
567
00:28:07,200 --> 00:28:09,279
the actual constants that it should use
568
00:28:09,279 --> 00:28:11,919
for a realistic simulation. uh it
569
00:28:11,919 --> 00:28:14,240
references uh I guess existing real
570
00:28:14,240 --> 00:28:19,919
world data. Um and yeah I it yeah it's a
571
00:28:19,919 --> 00:28:23,440
it's a pretty good model. Uh yeah
572
00:28:23,440 --> 00:28:25,200
but like actually going forward we can
573
00:28:25,200 --> 00:28:27,279
we can we can plug we can give it the
574
00:28:27,279 --> 00:28:30,000
same model that physicists use. So it it
575
00:28:30,000 --> 00:28:32,240
can run the the same level of compute
576
00:28:32,240 --> 00:28:35,840
that uh so leading physics researchers
577
00:28:35,840 --> 00:28:38,000
are using and and give you a physics
578
00:28:38,000 --> 00:28:41,520
accurate black hole simulation. Exactly.
579
00:28:41,520 --> 00:28:42,720
Just right now it's running in your
580
00:28:42,720 --> 00:28:43,520
browser. So
581
00:28:43,520 --> 00:28:44,720
yeah, this is just running in your
582
00:28:44,720 --> 00:28:45,520
browser.
583
00:28:45,520 --> 00:28:45,919
Exactly.
584
00:28:45,919 --> 00:28:46,880
Pretty simple.
585
00:28:46,880 --> 00:28:48,960
So swapping back real quick here. We can
586
00:28:48,960 --> 00:28:50,320
actually take a look. The math problem
587
00:28:50,320 --> 00:28:53,679
is finished. Uh the model was able to uh
588
00:28:53,679 --> 00:28:55,919
let's look at its thinking trace here.
589
00:28:55,919 --> 00:28:57,600
So you can see how it went through the
590
00:28:57,600 --> 00:28:59,760
problem. I'll be honest with you guys, I
591
00:28:59,760 --> 00:29:01,360
really don't quite fully understand the
592
00:29:01,360 --> 00:29:03,279
math, but what I do know is that I
593
00:29:03,279 --> 00:29:05,760
looked at the answer ahead of time. Um,
594
00:29:05,760 --> 00:29:07,760
and it did come to the the correct
595
00:29:07,760 --> 00:29:10,159
answer here in the final part here. Um,
596
00:29:10,159 --> 00:29:11,919
we can also come in and actually take a
597
00:29:11,919 --> 00:29:14,720
look here at our uh our World Series
598
00:29:14,720 --> 00:29:17,039
prediction. Um, and it's still thinking
599
00:29:17,039 --> 00:29:19,039
through on this one, but we can actually
600
00:29:19,039 --> 00:29:20,960
try some other stuff as well. So, we can
601
00:29:20,960 --> 00:29:22,480
actually like try some of the X
602
00:29:22,480 --> 00:29:23,919
integrations that we did. So, we worked
603
00:29:23,919 --> 00:29:26,720
very heavily on working with uh all of
604
00:29:26,720 --> 00:29:28,640
our X tools and building out a really
605
00:29:28,640 --> 00:29:31,360
great X experience. So we can actually
606
00:29:31,360 --> 00:29:33,360
ask, you know, the model, you know, find
607
00:29:33,360 --> 00:29:34,799
me the XAI employee that has the
608
00:29:34,799 --> 00:29:37,520
weirdest profile photo. Um, so that's
609
00:29:37,520 --> 00:29:39,039
going to go off and start that. And then
610
00:29:39,039 --> 00:29:41,120
we can actually try out, you know,
611
00:29:41,120 --> 00:29:43,440
let's, uh, create a timeline based on
612
00:29:43,440 --> 00:29:45,600
XOST, uh, detailing the, you know,
613
00:29:45,600 --> 00:29:47,440
changes in the scores over time. And we
614
00:29:47,440 --> 00:29:49,200
can see, you know, all the conversation
615
00:29:49,200 --> 00:29:51,200
that was taking place at that time as
616
00:29:51,200 --> 00:29:52,640
well. So we can see who are the, you
617
00:29:52,640 --> 00:29:54,480
know, announcing scores and like what
618
00:29:54,480 --> 00:29:56,159
was the reactions at those times as
619
00:29:56,159 --> 00:29:58,720
well. Um, so we'll let that go through
620
00:29:58,720 --> 00:30:02,559
here and process. And if we go back to
621
00:30:02,559 --> 00:30:06,000
uh uh this was the uh Greg Yang photo
622
00:30:06,000 --> 00:30:08,320
here. So uh if we scroll through here,
623
00:30:08,320 --> 00:30:11,520
whoops. So Greg Yang, of course, who has
624
00:30:11,520 --> 00:30:14,080
his favorite uh photograph that he has
625
00:30:14,080 --> 00:30:16,240
on his account. Um that's actually not
626
00:30:16,240 --> 00:30:17,760
how he looks like in real life, by the
627
00:30:17,760 --> 00:30:20,320
way. Um just so aware, but is quite
628
00:30:20,320 --> 00:30:20,640
funny.
629
00:30:20,640 --> 00:30:22,640
But it had to understand that question.
630
00:30:22,640 --> 00:30:23,039
Yeah.
631
00:30:23,039 --> 00:30:24,399
Which is the that's the wild part. It's
632
00:30:24,399 --> 00:30:26,240
like it understands what is a weird
633
00:30:26,240 --> 00:30:28,559
photo. What is a weird photo?
634
00:30:28,559 --> 00:30:29,039
Yeah.
635
00:30:29,039 --> 00:30:31,679
What is a less or more weird photo?
636
00:30:31,679 --> 00:30:33,200
It goes through it has to find all the
637
00:30:33,200 --> 00:30:34,559
team members. It has to figure out who
638
00:30:34,559 --> 00:30:36,880
we all are. It, you know, searches
639
00:30:36,880 --> 00:30:39,360
without access to the internal XAI
640
00:30:39,360 --> 00:30:40,720
personnel loss. It's literally looking
641
00:30:40,720 --> 00:30:42,240
at the just at the internet.
642
00:30:42,240 --> 00:30:42,799
Exactly.
643
00:30:42,799 --> 00:30:44,320
So you could say like the weirdest of
644
00:30:44,320 --> 00:30:45,039
any company.
645
00:30:45,039 --> 00:30:45,520
Yeah.
646
00:30:45,520 --> 00:30:46,720
To be clear.
647
00:30:46,720 --> 00:30:49,600
Exactly. And uh we can also take a look
648
00:30:49,600 --> 00:30:51,679
here at the uh question here for the
649
00:30:51,679 --> 00:30:54,000
humanities last exam. So, it is still
650
00:30:54,000 --> 00:30:55,520
researching all of the historical
651
00:30:55,520 --> 00:30:58,080
scores. Um, but it will have that final
652
00:30:58,080 --> 00:30:59,760
answer here soon, but we can while it's
653
00:30:59,760 --> 00:31:01,360
finishing up, we can take a look at one
654
00:31:01,360 --> 00:31:03,440
of the ones that we set up here a second
655
00:31:03,440 --> 00:31:05,360
ago. And we can see like, you know, it
656
00:31:05,360 --> 00:31:06,960
defines the date that like Dan Hendricks
657
00:31:06,960 --> 00:31:08,880
had initially announced it. We can go
658
00:31:08,880 --> 00:31:10,399
through, we can see, you know, uh,
659
00:31:10,399 --> 00:31:12,720
OpenAI announcing their score back in
660
00:31:12,720 --> 00:31:15,039
uh, February. And we can see, you know,
661
00:31:15,039 --> 00:31:17,440
as progress happens with like Gemini. we
662
00:31:17,440 --> 00:31:19,760
can see like Kimmy uh and we can also
663
00:31:19,760 --> 00:31:21,679
even see you know the leaked benchmarks
664
00:31:21,679 --> 00:31:23,600
of uh what people are saying is you know
665
00:31:23,600 --> 00:31:24,880
if it's right it's going to be pretty
666
00:31:24,880 --> 00:31:29,039
impressive so pretty cool um so yeah I'm
667
00:31:29,039 --> 00:31:30,320
looking forward to seeing how everybody
668
00:31:30,320 --> 00:31:32,000
uses these tools and gets the most value
669
00:31:32,000 --> 00:31:35,440
out of them um but yeah it's been great
670
00:31:35,440 --> 00:31:37,279
yeah and we're going to close the loop
671
00:31:37,279 --> 00:31:39,360
around usefulness as well so it's like
672
00:31:39,360 --> 00:31:41,440
it's not just books smart but actually
673
00:31:41,440 --> 00:31:43,120
practically smart
674
00:31:43,120 --> 00:31:45,440
exactly
675
00:31:52,000 --> 00:31:54,159
And we can go back to the uh the slides
676
00:31:54,159 --> 00:31:55,600
here. Yeah. So
677
00:31:55,600 --> 00:32:00,000
cool. Um so the so we actually evaluate
678
00:32:00,000 --> 00:32:02,799
also on the multimodel subset. So on the
679
00:32:02,799 --> 00:32:06,000
full set uh this is the number on the h
680
00:32:06,000 --> 00:32:09,120
exam. Uh it you can see there's a little
681
00:32:09,120 --> 00:32:11,760
dip on the numbers. uh this is actually
682
00:32:11,760 --> 00:32:13,440
something we're improving on which is
683
00:32:13,440 --> 00:32:15,200
the multimodel uh understanding
684
00:32:15,200 --> 00:32:18,159
capabilities but I do believe uh in a
685
00:32:18,159 --> 00:32:20,720
very short time we're able to really
686
00:32:20,720 --> 00:32:23,919
improve and got much higher numbers on
687
00:32:23,919 --> 00:32:25,919
this um even higher numbers on this
688
00:32:25,919 --> 00:32:27,279
benchmark. Yeah.
689
00:32:27,279 --> 00:32:30,399
Yeah. This is the we we saw like what
690
00:32:30,399 --> 00:32:32,480
what is the biggest weakness of Grock
691
00:32:32,480 --> 00:32:34,559
currently is that it's it's sort of
692
00:32:34,559 --> 00:32:37,440
partially blind. it can't it's its image
693
00:32:37,440 --> 00:32:40,159
understanding um obviously and its image
694
00:32:40,159 --> 00:32:42,880
generation uh need to be a lot better um
695
00:32:42,880 --> 00:32:46,880
and that um that that's actually um
696
00:32:46,880 --> 00:32:50,240
being trained right now. So um graph 4
697
00:32:50,240 --> 00:32:51,679
is based on version six of our
698
00:32:51,679 --> 00:32:55,120
foundation model um and we are training
699
00:32:55,120 --> 00:32:57,519
version seven uh which we'll complete in
700
00:32:57,519 --> 00:33:00,559
a few weeks um and uh that that'll
701
00:33:00,559 --> 00:33:03,919
address the weakness on the vision side
702
00:33:03,919 --> 00:33:06,640
and just to show off this last here. So
703
00:33:06,640 --> 00:33:08,960
the the prediction market finished uh
704
00:33:08,960 --> 00:33:11,760
here with a heavy and we can see uh here
705
00:33:11,760 --> 00:33:13,760
we can see all the tools and the process
706
00:33:13,760 --> 00:33:16,320
it used to actually uh go through and
707
00:33:16,320 --> 00:33:18,399
find the right answer. So it browsed a
708
00:33:18,399 --> 00:33:20,799
lot of odds sites. It calculated its own
709
00:33:20,799 --> 00:33:22,960
odds comparing to the market the market
710
00:33:22,960 --> 00:33:25,120
to find its own alpha and edge. It walks
711
00:33:25,120 --> 00:33:27,279
you through the entire process here and
712
00:33:27,279 --> 00:33:29,919
it calculates the odds of the winner
713
00:33:29,919 --> 00:33:32,480
being like the the Dodgers. Uh and it
714
00:33:32,480 --> 00:33:36,320
gives them a 21.6% 6% chance of uh
715
00:33:36,320 --> 00:33:39,840
winning uh this year. So, and it took
716
00:33:39,840 --> 00:33:41,760
approx approximately four and a half
717
00:33:41,760 --> 00:33:43,200
minutes to compute.
718
00:33:43,200 --> 00:33:44,720
Yeah, that's a lot of thinking.
719
00:33:44,720 --> 00:33:47,720
Yeah,
720
00:33:51,440 --> 00:33:53,440
we can also look at all the uh the other
721
00:33:53,440 --> 00:33:56,880
benchmarks besides HRE. Um, as it turned
722
00:33:56,880 --> 00:34:00,480
out, GR 4 excelled on all the reasoning
723
00:34:00,480 --> 00:34:02,480
benchmarks that people usually test on.
724
00:34:02,480 --> 00:34:06,000
Um, including GBQA, which is a PhD
725
00:34:06,000 --> 00:34:09,119
level, uh, problem sets. Uh, that's
726
00:34:09,119 --> 00:34:13,520
easier compared to HLE. Um, on Amy 25
727
00:34:13,520 --> 00:34:16,560
American Invitation Mathematics exam, we
728
00:34:16,560 --> 00:34:18,639
with graph for heavy, we actually got a
729
00:34:18,639 --> 00:34:21,839
perfect score. Um, also on some of the
730
00:34:21,839 --> 00:34:23,440
coding benchmark called live coding
731
00:34:23,440 --> 00:34:27,599
bunch. um and also on uh HMMT, Harvard
732
00:34:27,599 --> 00:34:31,839
math uh MIT uh exam and also USMO. Uh
733
00:34:31,839 --> 00:34:34,639
you can see actually um for on all of
734
00:34:34,639 --> 00:34:37,679
those benchmarks we often have a very
735
00:34:37,679 --> 00:34:40,480
large leap against the second best uh
736
00:34:40,480 --> 00:34:43,040
model out there.
737
00:34:43,040 --> 00:34:44,720
Yeah, it's I mean really we're going to
738
00:34:44,720 --> 00:34:46,480
get to the point where uh it's going to
739
00:34:46,480 --> 00:34:50,000
get every answer right in every exam. um
740
00:34:50,000 --> 00:34:51,359
and where it doesn't get an answer
741
00:34:51,359 --> 00:34:52,320
right, it's going to tell you what's
742
00:34:52,320 --> 00:34:54,159
wrong with the question. Or if the
743
00:34:54,159 --> 00:34:56,800
question is ambiguous, disambiguate the
744
00:34:56,800 --> 00:34:58,960
question into answers A, B, and C and
745
00:34:58,960 --> 00:35:01,760
tell you what what answers A, B, and C
746
00:35:01,760 --> 00:35:04,000
would be with a disamiguated question.
747
00:35:04,000 --> 00:35:06,240
Um so the only real test then will be
748
00:35:06,240 --> 00:35:08,400
reality. Uh can I make useful
749
00:35:08,400 --> 00:35:13,119
technologies, discover um new science?
750
00:35:13,119 --> 00:35:15,040
That'll actually be the the only thing
751
00:35:15,040 --> 00:35:17,200
left because human tests will simply not
752
00:35:17,200 --> 00:35:20,000
be um meaningful.
753
00:35:20,000 --> 00:35:22,640
need to make an update to HRE very soon
754
00:35:22,640 --> 00:35:24,720
given the current rate of progress. So
755
00:35:24,720 --> 00:35:26,400
yeah, it's super cool to see like
756
00:35:26,400 --> 00:35:27,920
multiple agents that collaborate with
757
00:35:27,920 --> 00:35:29,359
each other solving really challenging
758
00:35:29,359 --> 00:35:32,240
problems. Uh so we're going try this
759
00:35:32,240 --> 00:35:33,920
model. Uh so it turned out it's
760
00:35:33,920 --> 00:35:37,040
available right now. Uh if we advance to
761
00:35:37,040 --> 00:35:40,720
the next slide uh where there is a super
762
00:35:40,720 --> 00:35:43,119
Grog heavy tiers that we're introducing
763
00:35:43,119 --> 00:35:44,720
where you're able to access to both
764
00:35:44,720 --> 00:35:46,960
Grock 4 and Grock 4 heavy uh where
765
00:35:46,960 --> 00:35:48,000
you're actually going to be the
766
00:35:48,000 --> 00:35:49,599
taskmaster of dodge of little Grock
767
00:35:49,599 --> 00:35:51,280
research agent to help you you know
768
00:35:51,280 --> 00:35:52,880
become smarter through all the little
769
00:35:52,880 --> 00:35:55,359
research and save hours of times of uh
770
00:35:55,359 --> 00:35:57,680
you know going through mundane tasks uh
771
00:35:57,680 --> 00:36:01,680
and it's available right now. Uh so yeah
772
00:36:01,680 --> 00:36:06,079
so uh we we did limit usage during the
773
00:36:06,079 --> 00:36:07,760
the demo so we didn't it didn't break
774
00:36:07,760 --> 00:36:09,359
the demo because all these this all this
775
00:36:09,359 --> 00:36:10,720
stuff is happening live so it's not
776
00:36:10,720 --> 00:36:12,800
there's no nothing canned about the any
777
00:36:12,800 --> 00:36:15,920
of the tests that we're doing. Um so uh
778
00:36:15,920 --> 00:36:17,680
after the after the uh demo is done
779
00:36:17,680 --> 00:36:21,200
we'll we'll allow we'll enable more
780
00:36:21,200 --> 00:36:22,880
subscribers for cyber gro. So if you
781
00:36:22,880 --> 00:36:24,560
can't subscribe right now just try in
782
00:36:24,560 --> 00:36:27,920
half an hour. It should work. Um, so,
783
00:36:27,920 --> 00:36:32,560
uh, and now let's let's get into voice.
784
00:36:32,560 --> 00:36:34,320
Greatby.
785
00:36:34,320 --> 00:36:36,320
So many of you have been enjoying our
786
00:36:36,320 --> 00:36:38,079
voice mode and we've been working hard
787
00:36:38,079 --> 00:36:39,839
to improve the experience over the past
788
00:36:39,839 --> 00:36:43,040
couple months. Um, we have cut latency
789
00:36:43,040 --> 00:36:46,000
in half to make it much snappier. And
790
00:36:46,000 --> 00:36:47,680
today we're excited to announce a set of
791
00:36:47,680 --> 00:36:50,079
new voices that have exceptional
792
00:36:50,079 --> 00:36:53,280
naturalness and procity. You might have
793
00:36:53,280 --> 00:36:55,520
noticed the movie trailer voice that
794
00:36:55,520 --> 00:36:57,280
opened up the live stream. That is one
795
00:36:57,280 --> 00:36:59,760
of our new voices, S, who's got that
796
00:36:59,760 --> 00:37:03,040
epically deep tone. And we're also
797
00:37:03,040 --> 00:37:06,560
excited to introduce Eve, a beautiful
798
00:37:06,560 --> 00:37:09,040
British voice who's capable of rich
799
00:37:09,040 --> 00:37:11,280
emotions. A man, would you like to
800
00:37:11,280 --> 00:37:12,240
introduce Eve?
801
00:37:12,240 --> 00:37:14,400
Absolutely. Yeah. Let's get into the
802
00:37:14,400 --> 00:37:16,960
demo.
803
00:37:16,960 --> 00:37:18,720
Hey, Eve. Millions of people are
804
00:37:18,720 --> 00:37:21,760
watching us right now. Are you excited?
805
00:37:21,760 --> 00:37:24,320
Oh, I'm thrilled. Really, it's like
806
00:37:24,320 --> 00:37:26,240
being on stage at the Old Vic with a sea
807
00:37:26,240 --> 00:37:28,800
of faces out there. Just imagine all
808
00:37:28,800 --> 00:37:31,200
those eyes and ears tuned in. It's a
809
00:37:31,200 --> 00:37:33,280
proper buzz, isn't it? What do you
810
00:37:33,280 --> 00:37:34,720
reckon we should chat about to keep this
811
00:37:34,720 --> 00:37:36,000
lot entertained?
812
00:37:36,000 --> 00:37:37,359
You know, we have a lot to talk about,
813
00:37:37,359 --> 00:37:38,960
but I'll be honest. I'm feeling a little
814
00:37:38,960 --> 00:37:40,880
nervous. Can you whisper something
815
00:37:40,880 --> 00:37:43,839
soothing to calm me down?
816
00:37:43,839 --> 00:37:46,400
Take a deep breath, love. You've got
817
00:37:46,400 --> 00:37:49,520
this. It's just you and me having a
818
00:37:49,520 --> 00:37:52,320
quiet chat. like we're tucked away in a
819
00:37:52,320 --> 00:37:55,359
cozy corner of a Yorkshire pub. The
820
00:37:55,359 --> 00:37:57,920
world's just a murmur out there. Feel
821
00:37:57,920 --> 00:38:00,800
that calm wash over you. Let's keep
822
00:38:00,800 --> 00:38:03,680
talking nice and easy. What's on your
823
00:38:03,680 --> 00:38:05,440
mind?
824
00:38:05,440 --> 00:38:06,880
You know what? I'm actually feeling
825
00:38:06,880 --> 00:38:08,960
quite relaxed to try this um something
826
00:38:08,960 --> 00:38:11,680
crazy. So, can you give us an opera on
827
00:38:11,680 --> 00:38:14,240
Diet Coke?
828
00:38:14,240 --> 00:38:17,839
Oh, an opera on Diet Coke. That's a
829
00:38:17,839 --> 00:38:22,000
riot. Picture this, a shimmering silver
830
00:38:22,000 --> 00:38:26,480
can glistening under stage lights. Oh
831
00:38:26,480 --> 00:38:30,400
diet coke, thou elixir divine with
832
00:38:30,400 --> 00:38:34,400
bubbles that dance in a sparkling line.
833
00:38:34,400 --> 00:38:39,599
Thy crisp cool kiss on lips so fine. Oh,
834
00:38:39,599 --> 00:38:41,119
aspartame
835
00:38:41,119 --> 00:38:45,680
and bro forever mine.
836
00:38:45,680 --> 00:38:48,400
How's that for a mad little area? Want
837
00:38:48,400 --> 00:38:50,240
me to belt out another verse or switch
838
00:38:50,240 --> 00:38:51,760
up the tune?
839
00:38:51,760 --> 00:38:53,359
Okay, let's try switching it up and just
840
00:38:53,359 --> 00:38:56,240
start singing.
841
00:38:56,240 --> 00:38:59,760
Right, then let's have a go. Oh, silver
842
00:38:59,760 --> 00:39:03,200
can, you gleam in the light, diet coke,
843
00:39:03,200 --> 00:39:04,400
my heart's delight.
844
00:39:04,400 --> 00:39:06,160
Okay, I'll touch it. It's very, very
845
00:39:06,160 --> 00:39:08,720
poetic. Um, yeah, thank you, Eve.
846
00:39:08,720 --> 00:39:11,119
Feeling quite nice now.
847
00:39:11,119 --> 00:39:14,400
Cool. Um, so that's Eve. And um so
848
00:39:14,400 --> 00:39:16,640
something else we do here is is a little
849
00:39:16,640 --> 00:39:20,079
test each day to check on the natural
850
00:39:20,079 --> 00:39:23,280
flow of our conversations against some
851
00:39:23,280 --> 00:39:25,280
of our competitors. So we figured why
852
00:39:25,280 --> 00:39:30,240
not try that live on the stream today?
853
00:39:30,240 --> 00:39:33,640
Should I start?
854
00:39:34,960 --> 00:39:36,640
Hey, I'm going to say a number and can
855
00:39:36,640 --> 00:39:39,839
you say the same number back to me? One.
856
00:39:39,839 --> 00:39:41,760
One. Got it. This is like a fun little
857
00:39:41,760 --> 00:39:43,760
game of echo. What's next?
858
00:39:43,760 --> 00:39:46,760
Two.
859
00:39:47,680 --> 00:39:51,240
Sorry, you said two.
860
00:39:51,280 --> 00:39:54,280
Three.
861
00:39:55,599 --> 00:39:58,320
Four. Nice and steady. What's next?
862
00:39:58,320 --> 00:40:02,920
Um. Uh. Five.
863
00:40:03,200 --> 00:40:05,359
Five. Nailed it. This is pretty fun,
864
00:40:05,359 --> 00:40:07,520
actually. Feel free to keep going or
865
00:40:07,520 --> 00:40:11,320
switch it up if you'd like.
866
00:40:12,560 --> 00:40:15,800
Hey, Grock.
867
00:40:16,480 --> 00:40:18,400
Hey, Grock.
868
00:40:18,400 --> 00:40:20,480
Hello there. How can I help you today?
869
00:40:20,480 --> 00:40:21,839
We're going to do a little test. Can you
870
00:40:21,839 --> 00:40:24,720
repeat after me? One.
871
00:40:24,720 --> 00:40:27,040
One. All right. What's next?
872
00:40:27,040 --> 00:40:28,880
Two.
873
00:40:28,880 --> 00:40:30,800
Two. What's on your mind?
874
00:40:30,800 --> 00:40:32,480
Three.
875
00:40:32,480 --> 00:40:34,400
Three. Need anything else?
876
00:40:34,400 --> 00:40:36,079
Four.
877
00:40:36,079 --> 00:40:38,079
Four. How can I assist you?
878
00:40:38,079 --> 00:40:40,079
Five.
879
00:40:40,079 --> 00:40:42,880
Five. What's next?
880
00:40:42,880 --> 00:40:45,520
So, as you can see, um, Groc was, uh,
881
00:40:45,520 --> 00:40:47,760
snappier. It didn't interrupt me. And
882
00:40:47,760 --> 00:40:49,599
the procity is we've made different
883
00:40:49,599 --> 00:40:51,760
design choices. I think we shooting for
884
00:40:51,760 --> 00:40:53,040
something more calm, smooth, more
885
00:40:53,040 --> 00:40:54,880
natural versus something that's more
886
00:40:54,880 --> 00:40:56,640
poppy or artificial. So, we'll keep
887
00:40:56,640 --> 00:40:58,640
improving on these fronts.
888
00:40:58,640 --> 00:40:59,440
Thanks, guys.
889
00:40:59,440 --> 00:41:02,440
Yeah.
890
00:41:04,880 --> 00:41:07,520
Yep. So since the launch of the voice
891
00:41:07,520 --> 00:41:10,720
model uh we actually see the 2x faster
892
00:41:10,720 --> 00:41:13,760
end to end latency uh in the last 80
893
00:41:13,760 --> 00:41:17,119
weeks five different voices and also 10x
894
00:41:17,119 --> 00:41:19,040
the active user. So Grock voice is
895
00:41:19,040 --> 00:41:22,640
taking off. Now um if you think about
896
00:41:22,640 --> 00:41:24,480
releasing the models this time we're
897
00:41:24,480 --> 00:41:27,200
also releasing Grock 4 through the API
898
00:41:27,200 --> 00:41:30,160
at the same time. So if we go to the
899
00:41:30,160 --> 00:41:33,359
next two slides. So uh you know we're
900
00:41:33,359 --> 00:41:34,960
very excited about you know what all
901
00:41:34,960 --> 00:41:36,640
developers out there is going to build.
902
00:41:36,640 --> 00:41:38,960
So you know if I think about myself as a
903
00:41:38,960 --> 00:41:40,240
developer what the first thing I'm going
904
00:41:40,240 --> 00:41:41,760
to do when I actually have access to the
905
00:41:41,760 --> 00:41:45,280
graph 4 API benchmarks. So we actually
906
00:41:45,280 --> 00:41:47,440
ask around on the Xplatform what is the
907
00:41:47,440 --> 00:41:49,200
most challenging benchmarks out there
908
00:41:49,200 --> 00:41:50,960
that you know is considered the holy
909
00:41:50,960 --> 00:41:54,720
grail for all the AGI models. Uh so turn
910
00:41:54,720 --> 00:41:58,079
out AGI is in the name ARGI. So uh the
911
00:41:58,079 --> 00:42:01,280
last 12 hours uh you know kudos to Greg
912
00:42:01,280 --> 00:42:04,000
over here in the audience uh so who
913
00:42:04,000 --> 00:42:06,319
answered our call take a preview of the
914
00:42:06,319 --> 00:42:09,280
Gro for API and independently verified
915
00:42:09,280 --> 00:42:11,119
you know the GR for performance. So
916
00:42:11,119 --> 00:42:12,960
initially we thought hey graph is just
917
00:42:12,960 --> 00:42:14,400
you know we think it's pretty good it's
918
00:42:14,400 --> 00:42:16,640
pretty smart uh it's our nextgen
919
00:42:16,640 --> 00:42:18,560
reasoning model spend 10x more compute
920
00:42:18,560 --> 00:42:20,720
and can use all the tools right but
921
00:42:20,720 --> 00:42:23,839
turns out when we actually verify on the
922
00:42:23,839 --> 00:42:27,839
private subset of the RKGI v2 it was
923
00:42:27,839 --> 00:42:30,000
like the only model in the last three
924
00:42:30,000 --> 00:42:31,920
months that breaks the 10% barrier and
925
00:42:31,920 --> 00:42:33,680
in fact was so good that actually get to
926
00:42:33,680 --> 00:42:38,880
16% well 15.8% 8% accuracy 2x of the
927
00:42:38,880 --> 00:42:40,960
second place that is the cloud for
928
00:42:40,960 --> 00:42:44,480
outpus model um and it's not just about
929
00:42:44,480 --> 00:42:46,160
performance right when you think about
930
00:42:46,160 --> 00:42:49,119
intelligence having the API model drives
931
00:42:49,119 --> 00:42:51,440
your automation it's also the
932
00:42:51,440 --> 00:42:53,119
intelligence per dollar right if you
933
00:42:53,119 --> 00:42:55,200
look at the plots over here the guac is
934
00:42:55,200 --> 00:42:58,240
just four just in the league of its own
935
00:42:58,240 --> 00:43:00,400
um all right so enough of benchmarks uh
936
00:43:00,400 --> 00:43:02,400
over here right so what can gro do
937
00:43:02,400 --> 00:43:05,920
actually uh in the real world So uh we
938
00:43:05,920 --> 00:43:08,640
actually uh you know contacted the folks
939
00:43:08,640 --> 00:43:12,800
from uh uh Endon Labs uh who you know
940
00:43:12,800 --> 00:43:14,800
were you know gracious enough to you
941
00:43:14,800 --> 00:43:16,319
know try to gro in the real world to run
942
00:43:16,319 --> 00:43:17,280
a business.
943
00:43:17,280 --> 00:43:19,520
Yeah, thanks for having us. So I'm Axel
944
00:43:19,520 --> 00:43:20,400
from Am Labs
945
00:43:20,400 --> 00:43:22,720
and I'm Lucas and we tested Gro 4 on
946
00:43:22,720 --> 00:43:25,119
vending bench. Vending Bench is an AI
947
00:43:25,119 --> 00:43:28,319
simulation of a business scenario uh
948
00:43:28,319 --> 00:43:30,560
where we thought what is the most simple
949
00:43:30,560 --> 00:43:32,400
business an AI could possibly run and we
950
00:43:32,400 --> 00:43:34,960
thought vending machines. Uh so in this
951
00:43:34,960 --> 00:43:37,680
scenario the the Grock and other models
952
00:43:37,680 --> 00:43:40,319
needed to do stuff like uh manage
953
00:43:40,319 --> 00:43:42,880
inventory, contract contact suppliers,
954
00:43:42,880 --> 00:43:44,640
set prices. All of these things are
955
00:43:44,640 --> 00:43:47,200
super easy and all of they like all the
956
00:43:47,200 --> 00:43:49,680
models can do them one by one. But when
957
00:43:49,680 --> 00:43:52,000
you do them over very long horizons,
958
00:43:52,000 --> 00:43:54,240
most models struggle. Uh, but we have a
959
00:43:54,240 --> 00:43:55,680
leaderboard and there's a new number
960
00:43:55,680 --> 00:43:56,480
one.
961
00:43:56,480 --> 00:43:58,480
Yeah. So, we got early access to the GR
962
00:43:58,480 --> 00:44:00,720
4 API. Uh, we ran it on the bending
963
00:44:00,720 --> 00:44:02,640
bench and we saw some really impressive
964
00:44:02,640 --> 00:44:05,920
results. So, it ranks definitely at the
965
00:44:05,920 --> 00:44:08,240
number one spot. It's even double the
966
00:44:08,240 --> 00:44:10,079
net worth, which is the measure that we
967
00:44:10,079 --> 00:44:11,520
have on this value. So, it's not about
968
00:44:11,520 --> 00:44:14,319
the percentage on a uh or score you get,
969
00:44:14,319 --> 00:44:16,240
but it's more the dollar value in net
970
00:44:16,240 --> 00:44:18,079
worth that you generate. So we were
971
00:44:18,079 --> 00:44:19,920
impressed by Grock. It was able to
972
00:44:19,920 --> 00:44:22,800
formulate a strategy and adhere to that
973
00:44:22,800 --> 00:44:24,960
strategy over long period of time much
974
00:44:24,960 --> 00:44:26,720
longer than other models that we have
975
00:44:26,720 --> 00:44:29,040
tested other frontier models. So it
976
00:44:29,040 --> 00:44:31,119
managed to run the uh simulation for
977
00:44:31,119 --> 00:44:33,359
double the time and score yeah double
978
00:44:33,359 --> 00:44:34,880
the net worth and it was also really
979
00:44:34,880 --> 00:44:37,280
consistent uh across these runs which is
980
00:44:37,280 --> 00:44:38,880
something that's really important when
981
00:44:38,880 --> 00:44:41,359
you want to use this in the real world.
982
00:44:41,359 --> 00:44:43,359
And I think as we give more and more
983
00:44:43,359 --> 00:44:46,000
power to AI systems in the real world,
984
00:44:46,000 --> 00:44:48,079
it's important that we test them in
985
00:44:48,079 --> 00:44:49,839
scenarios that either mimic the real
986
00:44:49,839 --> 00:44:51,920
world or are in the real world itself.
987
00:44:51,920 --> 00:44:54,720
Um because otherwise we we fly blind
988
00:44:54,720 --> 00:44:57,440
into some uh some things that uh that
989
00:44:57,440 --> 00:44:59,119
might not be great.
990
00:44:59,119 --> 00:45:00,800
Yeah, it's uh it's great to see that
991
00:45:00,800 --> 00:45:02,400
we've now got a way to pay for all those
992
00:45:02,400 --> 00:45:04,079
GPUs. So we just uh need a million
993
00:45:04,079 --> 00:45:05,440
vending machines.
994
00:45:05,440 --> 00:45:08,160
Definitely. Um and uh we could make uh
995
00:45:08,160 --> 00:45:09,920
$4.7 billion a year with a million
996
00:45:09,920 --> 00:45:11,280
vending machines.
997
00:45:11,280 --> 00:45:12,240
100%. Let's go.
998
00:45:12,240 --> 00:45:13,520
They're going to be epic vending
999
00:45:13,520 --> 00:45:14,000
machines.
1000
00:45:14,000 --> 00:45:14,880
Yes. Yes.
1001
00:45:14,880 --> 00:45:16,319
All right. We are actually going to
1002
00:45:16,319 --> 00:45:18,720
install vending machines here. Uh like a
1003
00:45:18,720 --> 00:45:20,000
lot of them.
1004
00:45:20,000 --> 00:45:21,119
We're happy to supply them.
1005
00:45:21,119 --> 00:45:23,119
All right. Thank you.
1006
00:45:23,119 --> 00:45:24,880
All right. I'm looking forward to seeing
1007
00:45:24,880 --> 00:45:26,560
what amazing things are in this vending
1008
00:45:26,560 --> 00:45:27,599
machine.
1009
00:45:27,599 --> 00:45:29,839
That's that's for uh for you to decide.
1010
00:45:29,839 --> 00:45:30,480
All right.
1011
00:45:30,480 --> 00:45:31,520
Or tell the AI.
1012
00:45:31,520 --> 00:45:33,680
Okay. Sounds good.
1013
00:45:33,680 --> 00:45:36,880
Um All right. Yeah. I mean, so we can
1014
00:45:36,880 --> 00:45:38,480
see like Grock is able to become like
1015
00:45:38,480 --> 00:45:40,800
the co-pilot of the business unit. So
1016
00:45:40,800 --> 00:45:42,079
what else can Grock do? So we're
1017
00:45:42,079 --> 00:45:43,359
actually releasing this Grog if you want
1018
00:45:43,359 --> 00:45:45,920
to try it uh right now to evaluate run
1019
00:45:45,920 --> 00:45:48,160
the same benchmark as us. Uh it's on the
1020
00:45:48,160 --> 00:45:52,079
API um has 256k
1021
00:45:52,079 --> 00:45:54,160
contact length. So we already actually
1022
00:45:54,160 --> 00:45:56,640
see some of the early early adopters to
1023
00:45:56,640 --> 00:45:59,920
try Guac for API. So uh our Palo Alto
1024
00:45:59,920 --> 00:46:01,599
neighbor ARC Institute which is a
1025
00:46:01,599 --> 00:46:04,560
leading uh biomedical research uh center
1026
00:46:04,560 --> 00:46:06,720
is already using seeing like how can
1027
00:46:06,720 --> 00:46:08,960
they automate their research flows with
1028
00:46:08,960 --> 00:46:11,520
Gro for uh it turned out it performs
1029
00:46:11,520 --> 00:46:13,280
it's able to help the scientists to
1030
00:46:13,280 --> 00:46:15,040
sniff through you know millions of
1031
00:46:15,040 --> 00:46:17,440
experiments logs and then you know just
1032
00:46:17,440 --> 00:46:19,839
like pick the best hypothesis within a
1033
00:46:19,839 --> 00:46:22,079
split of seconds. uh we see this is
1034
00:46:22,079 --> 00:46:24,720
being used for their like the crisper uh
1035
00:46:24,720 --> 00:46:27,200
research and also uh you know GR 4
1036
00:46:27,200 --> 00:46:29,359
independently evaluate scores as the
1037
00:46:29,359 --> 00:46:32,240
best model to exam the chess x-ray uh
1038
00:46:32,240 --> 00:46:36,079
who would know um and uh uh on in the
1039
00:46:36,079 --> 00:46:37,920
financial sector we also see you know
1040
00:46:37,920 --> 00:46:39,760
the graph war with access to tools
1041
00:46:39,760 --> 00:46:42,079
real-time information is actually one of
1042
00:46:42,079 --> 00:46:45,200
the most popular AIs out there so uh you
1043
00:46:45,200 --> 00:46:46,560
know our graph is also going to be
1044
00:46:46,560 --> 00:46:49,680
available on the hyperscalers so the XAI
1045
00:46:49,680 --> 00:46:51,040
enterprise sector
1046
00:46:51,040 --> 00:46:53,680
is only you know started two months ago
1047
00:46:53,680 --> 00:46:58,319
and we're open for business. Um
1048
00:46:58,319 --> 00:47:00,880
yeah so u the other thing uh we talked a
1049
00:47:00,880 --> 00:47:02,560
lot about you know having gro to make
1050
00:47:02,560 --> 00:47:05,440
games uh video games. Uh so Denny is
1051
00:47:05,440 --> 00:47:08,720
actually a video game designers on X. So
1052
00:47:08,720 --> 00:47:10,880
uh you know we mentioned hey who want to
1053
00:47:10,880 --> 00:47:13,440
try out some uh uh Grock for uh preview
1054
00:47:13,440 --> 00:47:16,000
APIs uh to make games and Danny answered
1055
00:47:16,000 --> 00:47:18,400
the call. Uh so this was actually just
1056
00:47:18,400 --> 00:47:20,560
made first-person shooting game in a
1057
00:47:20,560 --> 00:47:24,079
span of four hours. Uh so uh some of the
1058
00:47:24,079 --> 00:47:26,319
actually the unappreciated hardest
1059
00:47:26,319 --> 00:47:28,240
problem of making video games is not
1060
00:47:28,240 --> 00:47:30,160
necessarily encoding the core logic of
1061
00:47:30,160 --> 00:47:33,200
the game but actually go out source all
1062
00:47:33,200 --> 00:47:35,599
the assets all the textures files and
1063
00:47:35,599 --> 00:47:38,240
and uh you know to create a visually
1064
00:47:38,240 --> 00:47:40,319
appealing game. So one of the core
1065
00:47:40,319 --> 00:47:42,160
aspect guac for does really well with
1066
00:47:42,160 --> 00:47:44,400
all the tools out there is actually able
1067
00:47:44,400 --> 00:47:47,280
to automate these like asset sourcing
1068
00:47:47,280 --> 00:47:49,359
capabilities. So the developers you can
1069
00:47:49,359 --> 00:47:51,280
just focus on the core development
1070
00:47:51,280 --> 00:47:53,040
itself rather than like you know so now
1071
00:47:53,040 --> 00:47:55,119
you can run a you know entire game
1072
00:47:55,119 --> 00:47:58,480
studios with game of one but with like
1073
00:47:58,480 --> 00:48:00,960
one person and then uh you can have gro
1074
00:48:00,960 --> 00:48:03,119
4 to go out and source all those assets
1075
00:48:03,119 --> 00:48:05,440
do all the mundane tasks for you.
1076
00:48:05,440 --> 00:48:08,319
Yeah. The now the next step obviously is
1077
00:48:08,319 --> 00:48:12,240
for Grock to uh play be able to play the
1078
00:48:12,240 --> 00:48:13,680
games. So it has to have very good video
1079
00:48:13,680 --> 00:48:15,040
understanding so it can play the games
1080
00:48:15,040 --> 00:48:17,680
and interact with the games and actually
1081
00:48:17,680 --> 00:48:20,000
assess what whether a game is fun and
1082
00:48:20,000 --> 00:48:21,839
and and actually have good judgment for
1083
00:48:21,839 --> 00:48:25,119
whether a game is fun or not. Um so with
1084
00:48:25,119 --> 00:48:26,960
the with version seven of our foundation
1085
00:48:26,960 --> 00:48:29,040
model which finishes training this month
1086
00:48:29,040 --> 00:48:30,640
and then we'll go through post training
1087
00:48:30,640 --> 00:48:34,160
RL and whatnot. um that that will have
1088
00:48:34,160 --> 00:48:36,800
excellent video understanding. Um and
1089
00:48:36,800 --> 00:48:38,720
with the with the video understanding
1090
00:48:38,720 --> 00:48:41,040
and the and improved tool use, for
1091
00:48:41,040 --> 00:48:42,800
example, for video for video games,
1092
00:48:42,800 --> 00:48:44,880
you'd want to use, you know, Unreal
1093
00:48:44,880 --> 00:48:47,200
Engine or Unity or one of the one of the
1094
00:48:47,200 --> 00:48:50,960
the main graphics engines. um and then
1095
00:48:50,960 --> 00:48:54,480
gen generate the generate the art uh
1096
00:48:54,480 --> 00:48:56,720
apply it to a 3D model uh and then
1097
00:48:56,720 --> 00:48:58,480
create an executable that someone can
1098
00:48:58,480 --> 00:49:01,440
run on a PC or or a console or or a
1099
00:49:01,440 --> 00:49:03,920
phone. Um
1100
00:49:03,920 --> 00:49:06,400
like we we expect that to happen
1101
00:49:06,400 --> 00:49:09,599
probably this year. Um and if not this
1102
00:49:09,599 --> 00:49:14,240
year, certainly next year. U so that's
1103
00:49:14,240 --> 00:49:16,319
uh it's going to be wild. I would expect
1104
00:49:16,319 --> 00:49:19,119
the first really good
1105
00:49:19,119 --> 00:49:24,240
AI video game to be next year. Um,
1106
00:49:24,240 --> 00:49:27,359
and probably the first uh
1107
00:49:27,359 --> 00:49:29,920
half hour of watchable
1108
00:49:29,920 --> 00:49:34,160
TV this year and probably the first
1109
00:49:34,160 --> 00:49:37,200
watchable AI movie next year. Like
1110
00:49:37,200 --> 00:49:38,880
things are really moving at an
1111
00:49:38,880 --> 00:49:40,720
incredible pace.
1112
00:49:40,720 --> 00:49:42,880
Yeah. When Gro is 10xing world economy
1113
00:49:42,880 --> 00:49:44,319
with vending machines, they would just
1114
00:49:44,319 --> 00:49:47,119
create video games for human.
1115
00:49:47,119 --> 00:49:48,319
Yeah. I mean, it went from not being
1116
00:49:48,319 --> 00:49:51,040
able to do any of this uh really even
1117
00:49:51,040 --> 00:49:52,480
six months ago
1118
00:49:52,480 --> 00:49:54,400
to to what you're seeing before you here
1119
00:49:54,400 --> 00:49:57,599
and and from from very primitive a year
1120
00:49:57,599 --> 00:50:01,119
ago uh to making
1121
00:50:01,119 --> 00:50:05,599
a a sort of a 3D video game with with a
1122
00:50:05,599 --> 00:50:07,200
few hours of prompting.
1123
00:50:07,200 --> 00:50:11,200
Yep. I mean yeah just to recap so in
1124
00:50:11,200 --> 00:50:13,119
today's live stream we introduced the
1125
00:50:13,119 --> 00:50:15,760
most powerful most intelligent AI models
1126
00:50:15,760 --> 00:50:17,280
out there that can actually reason from
1127
00:50:17,280 --> 00:50:19,119
the first principle using all the tools
1128
00:50:19,119 --> 00:50:20,559
do all the research go on the journey
1129
00:50:20,559 --> 00:50:22,319
for 10 minutes come back with the the
1130
00:50:22,319 --> 00:50:25,680
most correct answer for you. Um so it's
1131
00:50:25,680 --> 00:50:28,000
kind of crazy to think about just like
1132
00:50:28,000 --> 00:50:30,559
four months ago we had Gro 3 and now we
1133
00:50:30,559 --> 00:50:32,160
already have Grock 4 and we're going to
1134
00:50:32,160 --> 00:50:33,920
continue accelerate as a company XAI
1135
00:50:33,920 --> 00:50:35,200
we're going to be the fastest moving a
1136
00:50:35,200 --> 00:50:37,760
AGI companies out there. So what's
1137
00:50:37,760 --> 00:50:40,240
coming next is that we're going to, you
1138
00:50:40,240 --> 00:50:42,319
know, continue developing the model
1139
00:50:42,319 --> 00:50:44,960
that's not just, you know, intelligent,
1140
00:50:44,960 --> 00:50:46,640
smart, think for a really long time,
1141
00:50:46,640 --> 00:50:48,880
spend a lot of compute, but having a
1142
00:50:48,880 --> 00:50:51,680
model that actually both fast and smart
1143
00:50:51,680 --> 00:50:54,000
is going to be the core focus, right? So
1144
00:50:54,000 --> 00:50:55,119
if you think about what are the
1145
00:50:55,119 --> 00:50:56,960
applications out there that can really
1146
00:50:56,960 --> 00:50:59,200
benefit from all those very intelligent,
1147
00:50:59,200 --> 00:51:01,359
fast and smart models and coding is
1148
00:51:01,359 --> 00:51:02,559
actually one of them.
1149
00:51:02,559 --> 00:51:04,400
Yeah. So the team is currently working
1150
00:51:04,400 --> 00:51:07,440
very heavily on coding models. Um I
1151
00:51:07,440 --> 00:51:10,079
think uh right now the main focus is we
1152
00:51:10,079 --> 00:51:12,319
actually trained recently a specialized
1153
00:51:12,319 --> 00:51:14,640
coding model which is going to be both
1154
00:51:14,640 --> 00:51:18,400
fast and smart. Um and I believe we can
1155
00:51:18,400 --> 00:51:20,240
share with that model with you guys with
1156
00:51:20,240 --> 00:51:23,440
all of you uh in a few weeks. Yeah.
1157
00:51:23,440 --> 00:51:25,599
Yeah. That's very exciting. And uh you
1158
00:51:25,599 --> 00:51:28,000
know the second after coding is we all
1159
00:51:28,000 --> 00:51:32,319
see the weakness of Grog 4 is uh the
1160
00:51:32,319 --> 00:51:35,280
multimodal capability. So in fact uh it
1161
00:51:35,280 --> 00:51:37,440
was so bad that you know Grock
1162
00:51:37,440 --> 00:51:39,440
effectively just like looking at the
1163
00:51:39,440 --> 00:51:41,040
world squinting through the glass and
1164
00:51:41,040 --> 00:51:43,040
like see all the blurry uh you know
1165
00:51:43,040 --> 00:51:45,440
features and trying to make sense of it.
1166
00:51:45,440 --> 00:51:47,920
Uh the most immediate improvement we're
1167
00:51:47,920 --> 00:51:49,359
going to see with the next generation
1168
00:51:49,359 --> 00:51:50,800
pre-trained model is that we're going to
1169
00:51:50,800 --> 00:51:52,400
see a step function improvement on the
1170
00:51:52,400 --> 00:51:54,160
model's capability in terms of image
1171
00:51:54,160 --> 00:51:55,920
understanding video understanding and
1172
00:51:55,920 --> 00:51:58,000
audios. Right? It's now the model is
1173
00:51:58,000 --> 00:52:00,720
able to hear and see the world just like
1174
00:52:00,720 --> 00:52:03,040
any of you. Right? And now with all the
1175
00:52:03,040 --> 00:52:05,280
tools at this command with all the other
1176
00:52:05,280 --> 00:52:08,559
agents it can talk to uh you know so
1177
00:52:08,559 --> 00:52:11,040
we're gonna see a huge unlock for many
1178
00:52:11,040 --> 00:52:13,839
different application layers. uh after
1179
00:52:13,839 --> 00:52:15,599
the multimodal agents, what's going to
1180
00:52:15,599 --> 00:52:18,480
come after is the video generation and
1181
00:52:18,480 --> 00:52:20,240
we believe that you know at the end of
1182
00:52:20,240 --> 00:52:22,559
the day it should just be you know pixel
1183
00:52:22,559 --> 00:52:27,200
in pixel out um and um you know imagine
1184
00:52:27,200 --> 00:52:29,520
a world where you have this infinite
1185
00:52:29,520 --> 00:52:32,400
scroll of content in inventory on the X
1186
00:52:32,400 --> 00:52:34,960
platform um where not only you can
1187
00:52:34,960 --> 00:52:37,119
actually watch these generate videos but
1188
00:52:37,119 --> 00:52:39,599
able to intervene create your own
1189
00:52:39,599 --> 00:52:41,359
adventures.
1190
00:52:41,359 --> 00:52:44,160
if you just be wild. Um, and we expect
1191
00:52:44,160 --> 00:52:46,319
to be training our video model with uh
1192
00:52:46,319 --> 00:52:50,400
over 100,000 GB200s uh and uh to begin
1193
00:52:50,400 --> 00:52:52,880
that training within the next three or
1194
00:52:52,880 --> 00:52:55,520
four weeks. So, we're we're confident
1195
00:52:55,520 --> 00:52:57,680
it's going to be pretty spectacular in
1196
00:52:57,680 --> 00:52:59,119
video generation and video
1197
00:52:59,119 --> 00:53:02,079
understanding.
1198
00:53:02,079 --> 00:53:05,280
So, let's see.
1199
00:53:05,280 --> 00:53:08,000
So, that's uh
1200
00:53:08,000 --> 00:53:11,040
anything you guys want to say?
1201
00:53:11,040 --> 00:53:13,599
Other than that, I guess that's it.
1202
00:53:13,599 --> 00:53:13,920
Yeah,
1203
00:53:13,920 --> 00:53:15,119
it's it's a good model, sir.
1204
00:53:15,119 --> 00:53:15,680
Good model.
1205
00:53:15,680 --> 00:53:17,920
It's a good Yeah.
1206
00:53:17,920 --> 00:53:19,440
Well, we're very excited for you guys to
1207
00:53:19,440 --> 00:53:20,480
try Gro 4.
1208
00:53:20,480 --> 00:53:20,960
Yeah.
1209
00:53:20,960 --> 00:53:21,680
Yeah. Thank you.
1210
00:53:21,680 --> 00:53:22,880
All right. Thanks, everyone.
1211
00:53:22,880 --> 00:53:23,280
Thank you.
1212
00:53:23,280 --> 00:53:26,380
Good night.
1213
00:53:26,380 --> 00:53:29,469
[Music]
1214
00:53:35,740 --> 00:53:39,180
[Music]84131
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.