Ugh the video on the bottom is the more interesting one. Seems like the video AI has the same problems as the picture AI, still can’t correctly model the world.
The guy in brown disappears for some reason. TBF, that can happen irl too. Theoretically.
GPT-4 needing several jumps to be even close to rat intelligence is a wild take. I just can’t see how someone can reach that conclusion after spending any significant amount of time using GPT-4.
ChatGPT is clearly much smarter and much dumber than a rat at the same time.
But then. So is a chess program.
I’m not really sure how to translate AI advances into high-jumping.
FWIW, the mind blowing thing about the Sora examples is the object stability and the shadows and reflections, which strongly suggest the model is building some kind of world model rather than just locally extrapolating. I don’t know where that puts Sora’s intelligence in comparison to different invertebrates (maybe smarter than parasitic flatworms but below the annelids?), but it seems like a big deal to me, especially because humans learn their physical world models visually, not through text.
This kind of discussion is what’s interesting:
https://x.com/raphaelmilliere/status/1758685128002601293?s=61&t=CwVKdl7e5GoYqphDmQHrPg
Humans learn about the physical world by interacting with it visually, but also in other ways. Because they learn from video, if the AIs are building “world simulations”, they’re video-world simulations, not physical-world simulations. It’s not hard to find threads skeptical of OpenAI’s claims.
https://x.com/ChombaBupe/status/1758554389038395610?s=20
I recalled something about Einstein’s philosophy of science that I wanted to reference, but when I looked, I was told no, it’s more complicated than that. I was staring into a very deep rabbit hole, so I backed away. I don’t think the people who are serious about this stuff can afford to be so lazy.
I really don’t know what to tell you man. If you’re interested in AI, you should check out Sora, because it’s a huge advance over the prior state-of-the-art, and it’s way more capable than most people expected to see in video generation for quite some time. If you don’t find AI research worthwhile, probably find something else to do.
Personally, I think AI research is cool and find Sora pretty interesting!
That’s fine. I agree it’s cool. I have found it useful, at times. I may even explore some simple code at some point. I’m just wary of the hype and don’t think there’s good reason to believe it’s modeling the real world in anything like the way humans do. I’m open to changing my mind.
Can someone tell me what the best AI to sign up for that I can use to make pictures if I tell it something I want it to draw? How about one to generate photos of people that look like they are real photos taken with a camera?
@Lawnmower_Man , this sounds like it might be in your wheelhouse?
I think is a more complex matter.
So we clearly do create models of the world, I also think a huge amount of what we do is similar neural net brute force
I.e. when I’m catching a ball, it’s more some complex iteration of “if I twitch this muscle in this circumstance I’ll catch it” rather than F=MA, gravity, wind resistance, etc
Midjourney
EDIT: Dall-E 3 is included with GPT-4, which you should probably have a subscription to anyways, but I think it’s deliberately gimped at photorealism. If you have a good GPU, you could also just download Stable Diffusion and run locally, but that’s substantially more work.
EDIT2: I think Dall-E is available free as Copilot Image Creator, so maybe try that first and see if it suits your requirements.
So, being unqualified to address it, I’d rather not. I’d immediately get bogged down in words. Computer scientists and physicists don’t always bother defining terms and on the other hand, philosophers are endlessly debating definitions.
Like, what is a “model”? I might say something like, “It’s a simplified but useful representation of the world.” But what is the world? What kinds of things are there in the world and what interactions occur? What concepts do we need to describe what happens in the world? Are these unique? Is what we call the world even real? And what do I mean by representation? How are such representations created? Where in my brain or a computer do they reside? How do they get in there? What are the operational details? And what does it mean to be useful? Trying to answer these questions just brings up more and more questions.
The CS people are making assumptions about all these things without necessarily being explicit about what those assumptions are. Their job is to produce AI that does worthwhile things, and there’s a pot of money waiting for those who do, so why bother thinking carefully about this other stuff? At some point, you have to cut off debate to actually do things. But of course, the philosophers are going to keep being a pain in the ass. It’s an old conflict. I’m not taking a side, just expressing frustration.
Midjourney if you’re willing to buy a month of it at $30. I’m assuming that’s the pricing haven’t looked at it in a while. The ones charging per prompt were useless to me b/c I was firing off hundreds of them per day.
What computer people are saying is more well-defined than you’re giving them credit for. In the context of a model generating video, the question is: when the model is trying to predict what the video looks like, does it first predict what things will be in the image and what properties they have, then use that intermediate prediction to generate the video, or does it only consider lower-level features of the image itself (e.g., by starting with some memorized images that correspond to words in the prompt, then smoothing everything out).
I’m deliberately trying not to get that far into the details. I just don’t know enough about the subject. I’m sure the CS guys know the stuff they’re experts at but suspect they are out of their depth on some of the broader issues.
If you limit the question as you do, you believe the first possibiltiy is the case, and that means the AI has constructed a model of the world, is that right? My guess is the second possibility is true. But even if you’re right, I think this amounts to casting shadows on the cave wall and is far from an understanding of what is going on outside the cave.
I have a friend who is a philosophy professor whose entire career is about the nature of models. She’s mainly focused on meteorology, where she has a advanced degree.
I don’t think the CS people know anything about thought or intelligence. They are just trying to fine tune input-output relationships to the best of their ability. It makes for a great dog and pony show, however.
It’s like behaviorism in psychology: don’t think about cognitive, objective behavior is the only thing that matters. Yeah, that research program produced about a million college degrees and a handful of useful results in understanding the world.
Re: behaviorism: The tic-tac-toe playing chicken in Chinatown was trained by students of B.F Skinner. So there’s that.
Thanks for the recs. I’m assuming that Dall-E and Midjourney can do both of the things that I mentioned.
Here’s another question? Can these AI programs basically photoshop something by me telling them what I want. For example something like these
-take this photo and make it daytime instead of night
-take the guy on the left and put him on the right
-take the face of the middle person and turn it into a headshot for a website
Also are there any other good free options besides the Copilot Image Creator version of Dall-E. If midjourney is the best, I might pony up for that and just pay for it on and off as needed.
No, you’ll be sorely disappointed unless there was some radical advancement in the technology recently that I’m not aware of. Not sure exactly what your objective is but this site can generate AI headshots.
https://thispersondoesnotexist.com/