Fresh juice

2024-02-19

OpenAI has introduced Sora

A new AI model from leading research lab OpenAI can generate disturbingly convincing fake videos from simple text descriptions. Dubbed Sora, the technology hints at a future where synthesizing photo-realistic media grows trivial - for better or worse.

 

Bringing Text Prompts to Life

Feed Sora a written scene like “a woman walks down a rainy Tokyo street wearing red dress and leather jacket, illuminated by neon signs” and it produces a full video lasting up to a minute. The results showcase fluid motions, environmental details, skin imperfections, even reflected light sources. Viewers would likely never guess the footage was artificially created.

Sora represents a massive leap over past text-to-video models only outputting crude, jerky clips a few seconds long at most. It owes breakthroughs to OpenAI’s DALL-E system famous for generating striking images from imagination. Engineers adapted DALL-E’s underlying “diffusion” technology, which gradually transforms random noise into finished pictures, to instead handle video frames.

Additional neural network components help Sora understand relationships between objects and characters to realistically depict their interactions. This understanding of basic physics and causality remains a key challenge for AI.

 

Guarding Against Misuse

Despite impressive capabilities, OpenAI says Sora still struggles perfectly replicating complex logic like showing cookie bite marks after a character takes a bite.spatial awareness also proves tricky. The team plans further development before allowing public access.

However, risks of misuse already loom large. OpenAI researcher Tim Brooks acknowledged synthetic videos could enable new forms of misinformation like fabricated war footage or speeches by public figures. We’ve already witnessed the havoc basic “deepfakes” can wreak.

Hoping to mitigate hazards, OpenAI implements filters blocking requests with violence, hate speech, celebrities, or imagery violating policies. Additional protocols will trace video origins and audit for security flaws, applying lessons from DALL-E’s release. Nonetheless, the arms race against media manipulation continues accelerating.

 

Promise and Peril Collide

At the frontier, promise and peril often collide. Like other generative AI categories, text-to-video synthesis holds enormous positive potential too. Creative industries and consumers alike stand to benefit. But unchecked proliferation without safeguards would spell societal disaster.

For now, OpenAI treads cautiously given the power in its hands. Striking the right balance between enabling beneficial uses and blocking harms remains a work in progress across much of AI. How the story unfolds as these exponential technologies continue advancing is anyone’s guess. But our collective choices today will surely reshape tomorrow.

Share with friends:

Write and read comments can only authorized users