From toy to tool: DALL-E 3 is a wake-up call for visual artists—and the rest of us

TM101 · Dec 23, 2023

010101 said:
art isn't a necessity & your attachment to it is just a learned habit

you should not send yourself into a emotional crisis over entertainment

whether sourced from natural intelligence or artificial intelligence enjoy it the same

*

Ghost Utmost said:
Paintings used to be the highest form of art. Like, those big ass murals on chapel ceilings. That was the I Max of the time.

Photography reallllllly fukced up the painting game. Painters still could do fancy things, but when it comes to reproducing something you can see with eyesight, but you want to capture... such as a portrait or what a far off city look like or an undiscovered spider you happen across... photography kills the need to try to draw something you wanna "remember". So about 90% of the reason to sketch something out as a record is gone. Especially with the cell phone cam.

As we develop new tech: computer animation replaces hand drawn frames.

I CANNOT understand why paintings are still being made. And I was an art minor in college. I've sold artwork.

An art gallery or museum is the most boring place on Earth to me. Even if the artist can draw something that looks like a photo. I done care. I've seen 38289 artists that can do that.

Of course movies are our thing now. Art has reached its highest form to date in cinema. There's visual design. Choreography. Architecture. Fashion. Music. Sound design. Acting. Basically every form of art goes into a movie.

The AI thing is replacing "graphic design" in a way. If you need a quick image to communicate to others you can throw it in the AI and make... Like a flyer for the company Christmas party. Or a greeting card. Or something kinda useless that would have taken a human a couple hours tho create. The AI does it in moments.

I believe the human touch will be valuable, still. Cause one thing the AI cannot do is CREATE. AI literally gathers all images of "pickles" and "plates" on the whole web (or whatever it draws from) then gives you a VERY good amalgamation.. average.. composite.? of the existing information.

The human has to be the force that has walked around taking pictures of pickles and plates and up loading them to the Internet in the first place.

The AI will never generate the source material. It can only do a highly advanced form of photocopy

Excuse me, but the industries AI is disrupting are not lucrative

Gemini and the supply paradox of AI

www.theintrinsicperspective.com

bnew · Dec 26, 2023

https://archive.is/cAG7e

Amestafuu (Emeritus) · Dec 26, 2023

TrifeGod said:
They are killing art and everything that sparks the brain, shyt is sad

Whose they? The inventors of this tech did not have that in mind that is just a consequence.

fruity loops and a laptop did that to music studios a while ago. It's an inevitable part of technology evolving in our time. We make use of it while crying about what it does to us.

If we as consumers don't want to be discerning in what we watch, buy, consume then nobody will stop the inevitable. That's the reality.

bnew · Dec 26, 2023

https://archive.is/yGfxL

https://archive.is/kwcf2

https://archive.is/xL4kH

bnew · Jan 5, 2024

https://archive.is/W3YZU

https://archive.is/VrDjB

TL15 · Jan 5, 2024

I honestly hope in our lifetime that AI gets so good that you can just type in a reference and they’ll create a whole show/movie/video game off that :wow:

bnew · Jan 5, 2024

TL15 said:
I honestly hope in our lifetime that AI gets so good that you can just type in a reference and they’ll create a whole show/movie/video game off that

1-3 years, all the pieces are there already but haven't been made into a singular solution, AI agents will likely use on or more llm's to do so. give then each a role like writer A, writer B, sound/music engineer or producer, etc and a add a evaluator for dialog and a scene evaluator. it's gonna take a lot of compute as the technology stands now but if the compute can be reduced it's likely very possible.

edit;
once they can create images consistently based on one or more characters the comics might get a resurgence of new audience.

TL15 · Jan 5, 2024

bnew said:
1-3 years, all the pieces are there already but haven't been made into a singular solution, AI agents will likely use on or more llm's to do so. give then each a role like writer A, writer B, sound/music engineer or producer, etc and a add a evaluator for dialog and a scene evaluator. it's gonna take a lot of compute as the technology stands now but if the compute can be reduced it's likely very possible.

edit;
once they can create images consistently based on one or more characters the comics might get a resurgence of new audience.

Imagine in 20 years having a gaming service where you can input a scenario and it can randomly generate levels to play

Or

Saying that you want to watch a randomly generated episode of XYZ TV show and it can generate it for you.

The potential entertainment capabilities are endless :whew:

Buddy · Jan 5, 2024

I can't tell if this is supposed to be an actual article or if the dude just wanted to show off all his AI art :what:

RageKage · Jan 5, 2024

Thick_Necc_Jenkins said:
They arent. The people that will collect AI are broke people. AI art is shytty investment cause it's not "real" it's a step below buying prints of an artwork. And prints are made because 9 times out of 10 you cant afford the original work.

I agree but the business of art takes all form as you know and the majority of it is not in collectible work but commercial art and this is what AI is perfectly suited to fill (IMO)

This is just one of many areas automation and AI is competing with us analog humans and we are not going to win that fight.

The optimist hopes this may lead to a Star Trek future

The pessimist believes it looks more like this

fact · Jan 5, 2024

TL15 said:
I honestly hope in our lifetime that AI gets so good that you can just type in a reference and they’ll create a whole show/movie/video game off that

I want one where it makes audiobooks into movies

bnew · Jan 6, 2024

How much detail is too much? Midjourney v6 attempts to find out

As Midjourney rolls out new features, it continues to make some artists furious.

BENJ EDWARDS - 1/5/2024, 1:41 PM

Enlarge / An AI-generated image of a "Beautiful queen of the universe looking at the camera in sci-fi armor, snow and particles flowing, fire in the background" created using alpha Midjourney v6.

Midjourney

124

In December, just before Christmas, Midjourney launched an alpha version of its latest image synthesis model, Midjourney v6. Over winter break, Midjourney fans put the new AI model through its paces, with the results shared on social media. So far, fans have noted much more detail than v5.2 (the current default) and a different approach to prompting. Version 6 can also handle generating text in a rudimentary way, but it's far from perfect.

Too much detail?

With AI-generated detail ramping up dramatically between major Midjourney versions, one could wonder if there is ever such as thing as "too much detail" in an AI-generated image. Midjourney v6 seems to be testing that very question, creating many images that sometimes seem more detailed than reality in an unrealistic way, although that can be modified with careful prompting.

Previous Slide Next Slide

In our testing of version 6 (which can currently be invoked with the "--v 6.0" argument at the end of a prompt), we noticed times when the new model appeared to produce worse results than v5.2, but Midjourney veterans like Wieland tell Ars that those differences are largely due to the different way that v6.0 interprets prompts. That is something Midjourney is continuously updating over time. "Old prompts sometimes work a bit better than the day they released it," Wieland told us.

bnew · Jan 6, 2024

A comparison between output from Midjourney versions (from left to right: v3, v4, v5, v5.2, v6) with the prompt a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.

Enlarge / A comparison between output from Midjourney versions (from left to right: v3, v4, v5, v5.2, v6) with the prompt "a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting."

Midjourney

We submitted Version 6 to our usual battery of image synthesis tests: barbarians with CRTs, cats holding cans of beer, plates of pickles, and Abraham Lincoln. Results felt a lot like Midjourney 5.2 but with more intricate detail. Compared to other AI image synthesis models available, Midjourney still seems to be the photorealism champion, although DALL-E 3 and fine-tuned versions of Stable Diffusion XL aren't far behind.

Compared with DALL-E 3, Midjourney v6 arguably bests its photorealism but falls behind in the prompt fidelity category. And yet v6 is notably more capable than v5.2 at handling descriptive prompts. "Version 6 is a bit more 'natural language,' less keywords and the usual prompt mechanics," says Wieland.

Enlarge / An AI-generated comparison of Abraham Lincoln using a computer at his desk using DALL-E 3 (left) and Midjourney v6 (right).

OpenAI, Midjourney

In an announcement on the Midjourney Discord, Midjourney creator David Holz described changes to v6:

Much more accurate prompt following as well as longer prompts

Improved coherence, and model knowledge

Improved image prompting and remix

Minor text drawing ability (you must write your text in "quotations" and --style raw or lower --stylize values may help)

/imagine a photo of the text "Hello World!" written with a marker on a sticky note --ar 16:9 --v 6

Improved upscalers, with both 'subtle' and 'creative' modes (increases resolution by 2x)

(you'll see buttons for these under your images after clicking U1/U2/U3/U4)

Style and prompting for V6

Prompting with V6 is significantly different than V5. You will need to 'relearn' how to prompt.

V6 is MUCH more sensitive to your prompt. Avoid 'junk' like "award winning, photorealistic, 4k, 8k"

Be explicit about what you want. It may be less vibey but if you are explicit it's now MUCH better at understanding you.

If you want something more photographic / less opinionated / more literal you should probably default to using --style raw

Lower values of --stylize (default 100) may have better prompt understanding while higher values (up to 1000) may have better aesthetics

Midjourney v6 is still a work in progress, with Holz announcing that things will change rapidly over the coming months. "DO NOT rely on this exact model being available in the future," he wrote. "It will significantly change as we take V6 to full release." As far as the current limitations go, Wieland says, "I try to keep in mind that this is just v6 alpha and they will do updates without announcements and it kind of feels, like they already did a few updates."

Midjourney is also working on a web interface that will be an alternative to (and potentially a replacement of) the current Discord-only interface. The new interface is expected to widen Midjourney's audience by making it more accessible.

An unresolved controversy

Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use

The tool's creators are seeking to make it so that AI model developers must pay artists to train on data from them that is uncorrupted.

venturebeat.com

Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use

Carl Franzen @carlfranzen

January 19, 2024 3:03 PM

A hand pours a bottle of glowing purple liquid onto a keyboard of a vintage desktop PC displaying a pixellated purple skull and crossbones log amid flickering lines of static.

Credit: VentureBeat made with OpenAI DALL-E 3 via ChatGPT Plus

It’s here: months after it was first announced, Nightshade, a new, free software tool allowing artists to “poison” AI models seeking to train on their works, is now available for artists to download and use on any artworks they see fit.

Developed by computer scientists on the Glaze Project at the University of Chicago under Professor Ben Zhao, the tool essentially works by turning AI against AI. It makes use of the popular open-source machine learning framework PyTorch to identify what’s in a given image, then applies a tag that subtly alters the image at the pixel level so other AI programs see something totally different than what’s actually there.

It’s the second such tool from the team: nearly one year ago, the team unveiled Glaz e, a separate program designed to alter digital artwork at a user’s behest to confuse AI training algorithms into thinking the image has a different style than what is actually present (such as different colors and brush strokes than are really there).

But whereas the Chicago team designed Glaze to be a defensive tool — and still recommends artists use it in addition to Nightshade to prevent an artist’s style from being imitated by AI models — Nightshade is designed to be “an offensive tool.”

An AI model that ended up training on many images altered or “shaded” with Nightshade would likely erroneously categorize objects going forward for all users of that model, even in images that had not been shaded with Nightshade.

“For example, human eyes might see a shaded image of a cow in a green field largely unchanged, but an AI model might see a large leather purse lying in the grass,” the team further explains.

Therefore, an AI model trained on images of a cow shaded to look like a purse would start to generate purses instead of cows, even when the user asked for the model to make a picture of a cow.

Requirements and how Nightshade works

Artists seeking to use Nightshade must have a Mac with Apple chips inside (M1, M2 or M3) or a PC running Windows 10 or 11. The tool can be downloaded for both OSes here. The Windows file also is capable of running on a PC’s GPU, provided it is one from Nvidia on this list of supported hardware.

Some users have also reported long download times due to the overwhelming demand for the tool — as long as eight hours in some cases (the two versions are 255MB and 2.6GB in size for Mac and PC, respectively.

Screen-Shot-2024-01-19-at-5.01.07-PM.png

Screenshot of comment on Glaze/Nightshade Project Instagram account. Credit: VentureBeat

Users must also agree to the Glaze/Nightshade team’s end-user license agreement (EULA), which stipulates they use the tool on machines under their control and don’t modify the underlying source code, nor “Reproduce, copy, distribute, resell or otherwise use the Software for any commercial purpose.”

Nightshade v1.0 “transforms images into ‘poison’ samples, so that [AI] models training on them without consent will see their models learn unpredictable behaviors that deviate from expected norms, e.g. a prompt that asks for an image of a cow flying in space might instead get an image of a handbag floating in space,” states a blog post from the development team on its website.

That is, by using Nightshade v 1.0 to “shade” an image, the image will be transformed into a new version thanks to open-source AI libraries — ideally subtly enough so that it doesn’t look much different to the human eye, but that it appears to contain totally different subjects to any AI models training on it.

In addition, the tool is resilient to most of the typical transformations and alterations a user or viewer might make to an image. As the team explains:

“You can crop it, resample it, compress it, smooth out pixels, or add noise, and the effects of the poison will remain. You can take screenshots, or even photos of an image displayed on a monitor, and the shade effects remain. Again, this is because it is not a watermark or hidden message (steganography), and it is not brittle.”

Applause and condemnation

While some artists have rushed to download Nightshade v1.0 and are already making use of it — among them, Kelly McKernan, one of the former lead artist plaintiffs in the ongoing class-action copyright infringement lawsuitagainst AI art and video generator companies Midjourney, DeviantArt, Runway, and Stability AI — some web users have complained about it, suggesting it is tantamount to a cyberattack on AI models and companies. (VentureBeat uses Midjourney and other AI image generators to create article header artwork.)

The Glaze/Nightshade team, for its part, denies it is seeking destructive ends, writing:”Nightshade’s goal is not to break models, but to increase the cost of training on unlicensed data, such that licensing images from their creators becomes a viable alternative.”

In other words, the creators are seeking to make it so that AI model developers must pay artists to train on data from them that is uncorrupted.

The latest front in the fast-moving fight over data scraping

How did we get here? It all comes down to how AI image generators have been trained: by scraping data from across the web, including scraping original artworks posted by artists who had no prior express knowledge nor decision-making power about this practice, and say the resulting AI models trained on their works threatens their livelihood by competing with them.

As VentureBeat has reported, data scraping involves letting simple programs called “bots” scour the internet and copy and transform data from public facing websites into other formats that are helpful to the person or entity doing the scraping.

It’s been a common practice on the internet and used frequently prior to the advent of generative AI, and is roughly the same technique used by Google and Bing to crawl and index websites in search results.

But it has come under new scrutiny from artists, authors, and creatives who object to their work being used without their express permission to train commercial AI models that may compete with or replace their work product.

AI model makers defend the practice as not only necessary to train their creations, but as lawful under “fair use,” the legal doctrine in the U.S. that states prior work may be used in new work if it is transformed and used for a new purpose.

Though AI companies such as OpenAI have introduced “opt-out” code that objectors can add to their websites to avoid being scraped for AI training, the Glaze/Nightshade team notes that “Opt-out lists have been disregarded by model trainers in the past, and can be easily ignored with zero consequences. They are unverifiable and unenforceable, and those who violate opt-out lists and do-not-scrape directives can not be identified with high confidence.”

Nightshade, then, was conceived and designed as a tool to “address this power asymmetry.”

The team further explains their end goal:

“Used responsibly, Nightshade can help deter model trainers who disregard copyrights, opt-out lists, and do-not-scrape/robots.txt directives. It does not rely on the kindness of model trainers, but instead associates a small incremental price on each piece of data scraped and trained without authorization.”

Basically: make widespread data scraping more costly to AI model makers, and make them think twice about doing it, and thereby have them consider pursuing licensing agreements with human artists as a more viable alternative.

Of course, Nightshade is not able to reverse the flow of time: any artworks scraped prior to being shaded by the tool were still used to train AI models, and shading them now may impact the model’s efficacy going forward, but only if those images are re-scraped and used again to train an updated version of an AI image generator model.

There is also nothing on a technical level stopping someone from using Nightshade to shade AI-generated artwork or artwork they did not create, opening the door to potential abuses.

bnew · Mar 5, 2024

Stable Diffusion 3: Research Paper — Stability AI

Following our announcement of the early preview of Stable Diffusion 3, today we are publishing the research paper which outlines the technical details of our upcoming model release, and invite you to sign up for the waitlist to participate in the early preview.

stability.ai

Stable Diffusion 3: Research Paper

5 Mar

Key Takeaways:

Today, we’re publishing our research paper that dives into the underlying technology powering Stable Diffusion 3.
Stable Diffusion 3 outperforms state-of-the-art text-to-image generation systems such as DALL·E 3, Midjourney v6, and Ideogram v1 in typography and prompt adherence, based on human preference evaluations.
Our new Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image and language representations, which improves text understanding and spelling capabilities compared to previous versions of SD3.

Following our announcement of the early preview of Stable Diffusion 3, today we are publishing the research paper which outlines the technical details of our upcoming model release. The paper will be accessible on arXiv soon, and we invite you to sign up for the waitlistto participate in the early preview.

Performance

With SD3 as a baseline, this chart outlines the areas it wins against competing models based on human evaluations of Visual Aesthetics, Prompt Following, and Typography.

We have compared output images from Stable Diffusion 3 with various other open models including SDXL, SDXL Turbo, Stable Cascade, Playground v2.5 and Pixart-α as well as closed-source systems such as DALL·E 3, Midjourney v6 and Ideogram v1 to evaluate performance based on human feedback. During these tests, human evaluators were provided with example outputs from each model and asked to select the best results based on how closely the model outputs follow the context of the prompt it was given (“prompt following”), how well text was rendered based on the prompt (“typography”) and, which image is of higher aesthetic quality (“visual aesthetics”).

From the results of our testing, we have found that Stable Diffusion 3 is equal to or outperforms current state-of-the-art text-to-image generation systems in all of the above areas.

In early, unoptimized inference tests on consumer hardware our largest SD3 model with 8B parameters fits into the 24GB VRAM of a RTX 4090 and takes 34 seconds to generate an image of resolution 1024x1024 when using 50 sampling steps. Additionally, there will be multiple variations of Stable Diffusion 3 during the initial release, ranging from 800m to 8B parameter models to further eliminate hardware barriers.

Architecture Details

For text-to-image generation, our model has to take both modalities, text and images, into account. This is why we call this new architecture MMDiT, a reference to its ability to process multiple modalities. As in previous versions of Stable Diffusion, we use pretrained models to derive suitable text and image representations. Specifically, we use three different text embedders - two CLIP models and T5 - to encode text representations, and an improved autoencoding model to encode image tokens.

Conceptual visualization of a block of our modified multimodal diffusion transformer: MMDiT.

The SD3 architecture builds upon the Diffusion Transformer (“DiT”, Peebles & Xie, 2023). Since text and image embeddings are conceptually quite different, we use two separate sets of weights for the two modalities. As shown in the above figure, this is equivalent to having two independent transformers for each modality, but joining the sequences of the two modalities for the attention operation, such that both representations can work in their own space yet take the other one into account.

Our novel MMDiT architecture outperforms established text-to-image backbones such as UViT (Hoogeboom et al, 2023) and DiT (Peebles & Xie, 2023), when measuring visual fidelity and text alignment over the course of training.

By using this approach, information is allowed to flow between image and text tokens to improve overall comprehension and typography within the outputs generated. This architecture is also easily extendable to multiple modalities such as video, as we discuss in our paper.

Thanks to Stable Diffusion 3’s improved prompt following, our model has the ability to create images that focus on various different subjects and qualities while also remaining highly flexible with the style of the image itself.

Improving Rectified Flows by Reweighting

Stable Diffusion 3 employs a Rectified Flow (RF) formulation ( Liu et al., 2022; Albergo & Vanden-Eijnden,2022; Lipman et al., 2023), where data and noise are connected on a linear trajectory during training. This results in straighter inference paths, which then allow sampling with fewer steps. Furthermore, we introduce a novel trajectory sampling schedule into the training process. This schedule gives more weight to the middle parts of the trajectory, as we hypothesize that these parts result in more challenging prediction tasks. We test our approach against 60 other diffusion trajectories such as LDM, EDM and ADM, using multiple datasets, metrics, and sampler settings for comparison. The results indicate that while previous RF formulations show improved performance in few step sampling regimes, their relative performance declines with more steps. In contrast, our re-weighted RF variant consistently improves performance.

Scaling Rectified Flow Transformer Models

We conduct a scaling study for text-to-image synthesis with our reweighted Rectified Flow formulation and MMDiT backbone. We train models ranging from 15 blocks with 450M parameters to 38 blocks with 8B parameters and observe a smooth decrease in the validation loss as a function of both model size and training steps (top row). To test whether this translates into meaningful improvements of the model outputs, we also evaluate automatic image-alignment metrics ( GenEval) as well as human preference scores (ELO) (bottom row). Our results demonstrate a strong correlation between these metrics and the validation loss, indicating that the latter is a strong predictor of overall model performance. Furthermore, the scaling trend shows no signs of saturation, which makes us optimistic that we can continue to improve the performance of our models in the future.

Flexible Text Encoders

By removing the memory-intensive 4.7B parameter T5 text encoder for inference, SD3’s memory requirements can be significantly decreased with only small performance loss. Removing this text encoder does not affect visual aesthetics (win rate w/o T5: 50%) and results only in slightly reduced text adherence (win rate 46%) as seen in the above image under the “Performance” section. However, we recommend including T5 for using SD3’s full power in generating written text, since we observe larger performance drops in typography generation without it (win rate 38%) as seen in the examples below:

Removing T5 for inference only results in significant performance drops when rendering very complex prompts involving many details or large amounts of written text. The above figure shows three random samples per example.

To learn more about MMDiT, Rectified Flows, and the research behind Stable Diffusion 3, read our full research paper here.

To stay updated on our progress follow us on Twitter, Instagram, LinkedIn, and join our Discord Community.

Stable Diffusion 3: Research Paper | Hacker News

news.ycombinator.com

1/5
Our research paper for Stable Diffusion 3 has landed.

Our Multimodal Diffusion Transformer architecture scales great & the current unfinished base model beats other open & closed models.

We believe MM-DiT will be a standard & look forward to releasing this & other modalities

2/5
No we do make the best edge language models though

3/5
Our StableLM 2 report out, model beats Google Gemma 2 & others on a third less parameters.

Best multilingual, quantizes great too.

Runs on CPUs fine.

We introduce a multi-stage infinite LR scheduler that will help optimise runs & more, great work by team x.com/rikelhood/stat…

4/5
Upload to Claude 3 and ask it questions hehe

5/5
Ideogram is a bit better on aesthetics.

After this is optimised I don't think we will need many more image models tbh.

From toy to tool: DALL-E 3 is a wake-up call for visual artists—and the rest of us

All Star

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Keep my name out of it

All Star

Fukk you thought it was?

Veteran

How much detail is too much? Midjourney v6 attempts to find out​

As Midjourney rolls out new features, it continues to make some artists furious.​

FURTHER READING​

Too much detail?​

Veteran

An unresolved controversy​

FURTHER READING​

Veteran

Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use​

Requirements and how Nightshade works​

Applause and condemnation​

The latest front in the fast-moving fight over data scraping​

Veteran

Stable Diffusion 3: Research Paper​

Key Takeaways:​

Performance​

Architecture Details​

Improving Rectified Flows by Reweighting​

Scaling Rectified Flow Transformer Models​

Flexible Text Encoders​

Similar threads

How much detail is too much? Midjourney v6 attempts to find out

As Midjourney rolls out new features, it continues to make some artists furious.

FURTHER READING

Too much detail?

An unresolved controversy

FURTHER READING

Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use

Requirements and how Nightshade works

Applause and condemnation

The latest front in the fast-moving fight over data scraping

Stable Diffusion 3: Research Paper

Key Takeaways:

Performance

Architecture Details

Improving Rectified Flows by Reweighting

Scaling Rectified Flow Transformer Models

Flexible Text Encoders