The A.I Megathread (LLM , GPT , Development)

bnew · Apr 28, 2023

Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images — Stability AI

DeepFloyd IF is a state-of-the-art text-to-image model released on a non-commercial, research-permissible license that allows research labs to examine and experiment with advanced text-to-image generation approaches. In line with other Stability AI models, Stability AI intends to release a DeepFloyd

stability.ai

{post is incomplete, go to the site for more info/examples}

Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images

28 Apr
Today Stability AI, together with its multimodal AI research lab DeepFloyd, announced the research release of DeepFloyd IF, a powerful text-to-image cascaded pixel diffusion model.
DeepFloyd IF is a state-of-the-art text-to-image model released on a non-commercial, research-permissible license that provides an opportunity for research labs to examine and experiment with advanced text-to-image generation approaches. In line with other Stability AI models, Stability AI intends to release a DeepFloyd IF model fully open source at a future date.

Description and Features

Deep text prompt understanding:
The generation pipeline utilizes the large language model T5-XXL-1.1 as a text encoder. A significant amount of text-image cross-attention layers also provides better prompt and image alliance.
Application of text description into images:
Incorporating the intelligence of the T5 model, DeepFloyd IF generates coherent and clear text alongside objects of different properties appearing in various spatial relations. Until now, these use cases have been challenging for most text-to-image models.
A high degree of photorealism:
This property is reflected by the impressive zero-shot FID score of 6.66 on the COCO dataset (FID is a main metric used to evaluate the performance of text-to-image models; the lower the score, the better).
Aspect ratio shift:
The ability to generate images with a non-standard aspect ratio, vertical or horizontal, as well as the standard square aspect.
Zero-shot image-to-image translations:
Image modification is conducted by (1) resizing the original image to 64 pixels, (2) adding noise through forward diffusion, and (3) using backward diffusion with a new prompt to denoise the image (in inpainting mode, the process happens in the local zone of the image). The style can be changed further through super-resolution modules via a prompt text description. This approach gives the opportunity to modify style, patterns and details in output while maintaining the basic form of the source image – all without the need for fine-tuning.

View fullsize unnamed (1).png

View fullsize unnamed (2).png

View fullsize unnamed (3).png

View fullsize unnamed (4).png

View fullsize unnamed (5).png

View fullsize

Definitions and processes
DeepFloyd IF is a modular, cascaded, pixel diffusion model. We break down the definitions of each of these descriptors here:

Modular:
DeepFloyd IF consists of several neural modules (neural networks that can solve independent tasks, like generating images from text prompts and upscaling) whose interactions in one architecture create synergy.
Cascaded:
DeepFloyd IF models high-resolution data in a cascading manner, using a series of individually trained models at different resolutions. The process starts with a base model that generates unique low-resolution samples (a ‘player’), then upsampled by successive super-resolution models (‘amplifiers’) to produce high-resolution images.
Diffusion:
DeepFloyd IF’s base and super-resolution models are diffusion models, where a Markov chain of steps is used to inject random noise into data before the process is reversed to generate new data samples from the noise.
Pixel:
DeepFloyd IF works in pixel space. The diffusion is implemented on a pixel level, unlike latent diffusion models (like Stable Diffusion), where latent representations are used.

Regular Developer · Apr 28, 2023

THanks for updating this. I don't have time to search for all this stuff myself, and I'm a little wary about how its gonna affect the software development industry. I might get github copilot for my personal projects. I'm not going to take the time to learn no damn react-native

bnew · Apr 28, 2023

bnew · Apr 29, 2023

Stability AI releases StableVicuna, the AI World’s First Open Source RLHF LLM Chatbot — Stability AI

Experience the power of StableVicuna, the first large-scale open source chatbot trained via reinforced learning from human feedback (RHLF). With enhanced training and fine-tuning capabilities, StableVicuna offers advanced chatbot solutions to drive engagement and improve customer interactions. Try S

stability.ai

[/U]

Stability AI releases StableVicuna, the AI World’s First Open Source RLHF LLM Chatbot

28 Apr

“A Stable Vicuña” — Stable Diffusion XL
Background
In recent months, there has been a significant push in the development and release of chatbots. From Character.ai's chatbot last spring to ChatGPT in November and Bard in December, the user experience created by tuning language models for chat has been a hot topic. The emergence of open access and open-source alternatives has further fueled this interest.

The Current Environment of Open Source Chatbots
The success of these chat models is due to two training paradigms: instruction finetuning and reinforcement learning through human feedback (RLHF). While there have been significant efforts to build open source frameworks for helping train these kinds of models, such as trlX, trl, DeepSpeed Chat and ColossalAI, there is a lack of open access and open source models that have both paradigms applied. In most models, instruction finetuning is applied without RLHF training because of the complexity that it involves.
Recently, Open Assistant, Anthropic, and Stanford have begun to make chat RLHF datasets readily available to the public. Those datasets, combined with the straightforward training of RLHF provided by trlX, are the backbone for the first large-scale instruction fintuned and RLHF model we present here today: StableVicuna.

Introducing the First Large-Scale Open Source RLHF LLM Chatbot
We are proud to present StableVicuna, the first large-scale open source chatbot trained via reinforced learning from human feedback (RHLF). StableVicuna is a further instruction fine tuned and RLHF trained version of Vicuna v0 13b, which is an instruction fine tuned LLaMA 13b model. For the interested reader, you can find more about Vicuna here.
Here are some of the examples with our Chatbot,

Ask it to do basic math
Ask it to write code
Ask it to help you with grammar

Similarly, here are a number of benchmarks showing the overall performance of StableVicuna compared to other similarly sized open source chatbots.

In order to achieve StableVicuna’s strong performance, we utilize Vicuna as the base model and follow the typical three-stage RLHF pipeline outlined by Steinnon et al. and Ouyang et al. Concretely, we further train the base Vicuna model with supervised finetuning (SFT) using a mixture of three datasets:

OpenAssistant Conversations Dataset (OASST1), a human-generated, human-annotated assistant-style conversation corpus comprising 161,443 messages distributed across 66,497 conversation trees, in 35 different languages;
GPT4All Prompt Generations, a dataset of 437,605 prompts and responses generated by GPT-3.5 Turbo;
And Alpaca, a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine.

We use trlx to train a reward model that is first initialized from our further SFT model on the following RLHF preference datasets:

OpenAssistant Conversations Dataset (OASST1) contains 7213 preferences samples;
Anthropic HH-RLHF, a dataset of preferences about AI assistant helpfulness and harmlessness containing 160,800 human labels;
And Stanford Human Preferences (SHP), a dataset of 348,718 collective human preferences over responses to questions/instructions in 18 different subject areas, from cooking to philosophy.

Finally, we use trlX to perform Proximal Policy Optimization (PPO) reinforcement learning to perform RLHF training of the SFT model to arrive at StableVicuna!

Obtaining StableVicuna-13B
StableVicuna is of course on the HuggingFace Hub! The model is downloadable as a weight delta against the original LLaMA model. To obtain StableVicuna-13B, you can download the weight delta from here. However, please note that you also need to have access to the original LLaMA model, which requires you to apply for LLaMA weights separately using the link provided in the GitHub repo or here. Once you have both the weight delta and the LLaMA weights, you can use a script provided in the GitHub repo to combine them and obtain StableVicuna-13B.

Announcing Our Upcoming Chatbot Interface
Alongside our chatbot, we are excited to preview our upcoming chat interface which is in the final stages of development. The following screenshots offer a glimpse of what users can expect.

bnew · Apr 29, 2023

Large Language Models for Commercial Use | TrueFoundry

This blog post resolves doubts about the licensing of LLM models to avoid legal troubles when using, modifying, or sharing them.

blog.truefoundry.com

Engineering and Product

Large Language Models for Commercial Use

This blog explains what a license is for LLM models and why it is important. We resolve any doubts that you might have about the licensing of these models so that you do not run into legal troubles while using, modifying, or sharing them.

Truefoundry

Apr 27, 2023 • 5 min read

Seeing LLMs (Large language models) being used for a variety of value-generating tasks all across the industry, every business wants to get their hands on them. However, before you start using these models commercially, it is important to understand the licensing and legal norms around this.

In this blog, we will resolve any doubts that you might have about the licensing of these models so that you do not run into legal troubles while using, modifying, or sharing them.

We will continuously update this blog to cover all major LLMs and their licensing implications.

Can you just start using LLMs for your business?

We had a chat with a few leaders, and it turns out licensing LLMs commercially is more complicated. Let's take the example of Vicuna.

Vicuna is an open-source chatbot trained by fine-tuning on LLaMA

If you deploy the Vicuna 13B model using a hugging face, you would find that the team behind the project has released just the delta weights of the models, which in turn need to be applied to the Original LLaMA model to make it work.

The Vicuna model card would show the license to be Apache 2.0 license, making one believe that the model can be used commercially.

Vicuna 13B Delta weights model Hugging Face Model Card
However, the LLaMA weights are not available commercially, making the vicuna model, in turn, only usable in research settings, not commercially.

Confusing, right? Let us try to explain how this works

Different types of licenses and what do they mean?

Here is a table with some of the common licenses that LLMs are found to have:

LICENSE	LLMS	PERMISSIVE OR COPYLEFT	PATENT GRANT	COMMERCIAL USE	REDISTRIBUTION	MODIFICATION
Apache 2.0	BERT, XLNet, and XLM-RoBERTa	Permissive	Yes	Yes (with attribution)	Yes	Yes
MIT	GPT-2, T5, and BLOOM	Permissive	No	Yes (with attribution)	Yes	Yes
GPL-3.0	GLM-130B and NeMO LLM	Copyleft	Yes (for GPL-3.0 licensed software only)	Yes (with source code)	Yes (with source code)	Yes
Proprietary	GPT-3, LaMDA and Cohere	Varies	Varies	Varies	Varies	Varies

Copyleft licenses like GPL-3.0 require that any derivative works of the software be licensed under the same license. This means that if you use GPL-3.0-licensed software in your project, your project must also be licensed under GPL-3.0.

Permissive licenses like Apache 2.0 and MIT allow users to use, modify and distribute the software under the license with minimal restrictions on how they use it or how they distribute it.

Explaining some of the common licenses that LLMs are licensed under:

Apache 2.0 License

Under this license, users must give credit to the original authors, include a copy of the license, and state any changes made to the software. Users must also not use any trademarks or logos associated with the software without permission.

MIT License

This license allows anyone to use, modify, and distribute the software for any purpose as long as they include a copy of the license and a notice of the original authors. The MIT License is similar to the Apache 2.0 License, but it does not have any conditions regarding trademarks or logos.

GPL-3.0 License

It allows anyone to use, modify, and distribute the software for any purpose as long as they share their source code under the same license. This means that users cannot create proprietary versions of the software or incorporate it into closed-source software without disclosing their code. The GPL-3.0 License also has some other conditions, such as providing a disclaimer of warranty and liability and ensuring that users can access or run the software without any restrictions.

Proprietary License

The last type of license for LLMs that we will discuss is the proprietary license, which is a non-open source license that grants limited rights to use the software under certain terms and conditions. The proprietary license usually requires users to pay a fee or obtain permission to access or use the software and may impose restrictions on how the software can be used or modified. The proprietary license may also prohibit users from sharing or distributing the software or its outputs without authorization.

RAIL License

RAIL license is a new copyright license that combines an Open Access approach to licensing with behavioral restrictions aimed at enforcing a vision of responsible AI. This license has certain use-based restrictions like it cannot be used in

Anything that violates laws and regulations
Exploit or harm minors, or uses that discriminate or harm “individuals or groups based on social behavior or known or predicted personal or personality characteristics.”

Some models under this license are- OPT, Stable Diffusion, BLOOM

The following table summarizes the license details of each LLM:

LLM License
Sheet2 Model,Domain,License ,Number of Parameters (in millions),Pretraining Dataset,Results,creator,Inference Speed,Release DateAlpaca-medium,general,MIT,7000,Base model Llama. Fine tuned on interactions collected from DaVinci model.,Almost equivalent with GPT-3 variant of ChatGPT. ,Stanford,M…

Google Docs

Which models can I use?

The problem with LLMs for commercial use is that they may not be open-source or may not allow commercial use (Models built on top of Meta's LLaMA model). This means that companies may have to pay to use them or may not be able to use them at all. Additionally, some companies may prefer to use open-source models for reasons such as transparency and the ability to modify the code.

There are several open-source language models that can be used commercially for free.

Bloom

Bloom is an open-access multilingual language model that contains 176 billion parameters and is trained for 3.5 months on 384 A100–80GB GPUs.

It is licensed under bigscience-bloom-rail-1.0 license. This restricts BLOOM to not be used for certain use cases like for giving medical advice and medical results interpretation. This is in addition to other restrictions that are present under the RAIL license (Described above)

Dolly 2.0

Dolly 2.0 is a 12B parameter language model based on the EleutherAI Pythia model family and fine-tuned exclusively on a new, high-quality human-generated instruction following dataset, crowdsourced among Databricks employees. It is the first open-source, instruction-following LLM fine-tuned on a human-generated instruction dataset licensed for research and commercial use. The entirety of Dolly 2.0, including the training code, the dataset, and the model weights, are open-sourced and suitable for commercial use.

RWKV Raven

RWKV-LM is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it combines the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, “infinite” ctx_len, and free sentence embedding.

It is licensed under Apache 2.0.

Eleuther AI Models (Polyglot, GPT Neo, GPT NeoX, GPT-J, Pythia)

EleutherAI has trained and released several LLMs and the codebases used to train them. Several of these LLMs were the largest or most capable available at the time and have been widely used since in open-source research applications.

bnew · Apr 29, 2023

https://web.archive.org/web/20230429231517/https://twitter.com/bohanhou1998/status/1652151502012837890

MLC LLM | Home

mlc.ai

GitHub - mlc-ai/mlc-llm: Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. - GitHub - mlc-ai/mlc-llm: Enable everyone to develop, optimize and deploy AI models natively on everyo...

github.com

MLC LLM is a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases.

Our mission is to enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Everything runs locally with no server support and accelerated with local GPUs on your phone and laptops. Supported platforms include:

iPhone
Metal GPUs and Intel/ARM MacBooks;
AMD and NVIDIA GPUs via Vulkan on Windows and Linux;
NVIDIA GPUs via CUDA on Windows and Linux;
WebGPU on browsers (through companion project WebLLM).

Check out our instruction page to try out!

What is MLC LLM?

In recent years, there has been remarkable progress in generative artificial intelligence (AI) and large language models (LLMs), which are becoming increasingly prevalent. Thanks to open-source initiatives, it is now possible to develop personal AI assistants using open-sourced models. However, LLMs tend to be resource-intensive and computationally demanding. To create a scalable service, developers may need to rely on powerful clusters and expensive hardware to run model inference. Additionally, deploying LLMs presents several challenges, such as their ever-evolving model innovation, memory constraints, and the need for potential optimization techniques.

The goal of this project is to enable the development, optimization, and deployment of AI models for inference across a range of devices, including not just server-class hardware, but also users' browsers, laptops, and mobile apps. To achieve this, we need to address the diverse nature of compute devices and deployment environments. Some of the key challenges include:

Supporting different models of CPUs, GPUs, and potentially other co-processors and accelerators.
Deploying on the native environment of user devices, which may not have python or other necessary dependencies readily available.
Addressing memory constraints by carefully planning allocation and aggressively compressing model parameters.

MLC LLM offers a repeatable, systematic, and customizable workflow that empowers developers and AI system researchers to implement models and optimizations in a productivity-focused, Python-first approach. This methodology enables quick experimentation with new models, new ideas and new compiler passes, followed by native deployment to the desired targets. Furthermore, we are continuously expanding LLM acceleration by broadening TVM backends to make model compilation more transparent and efficient.

bnew · Apr 29, 2023

GitHub - z-x-yang/Segment-and-Track-Anything: An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-f

An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo...

github.com

About

An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.

Segment and Track Anything (SAM-Track)

Online Demo:

Tutorial: tutorial-v1.5 (Text), tutorial-v1.0 (Click & Brush)

Segment and Track Anything is an open-source project that focuses on the segmentation and tracking of any objects in videos, utilizing both automatic and interactive methods. The primary algorithms utilized include the SAM (Segment Anything Models) for automatic/interactive key-frame segmentation and the DeAOT (Decoupling features in Associating Objects with Transformers) (NeurIPS2022) for efficient multi-object tracking and propagation. The SAM-Track pipeline enables dynamic and automatic detection and segmentation of new objects by SAM, while DeAOT is responsible for tracking all identified objects.

This video showcases the segmentation and tracking capabilities of SAM-Track in various scenarios, such as street views, AR, cells, animations, aerial shots, and more.

Demo1 showcases SAM-Track's ability to interactively segment and track individual objects. The user specified that SAM-Track tracked a man playing street basketball.

Demo2 showcases SAM-Track's ability to interactively add specified objects for tracking.The user customized the addition of objects to be tracked on top of the segmentation of everything in the scene using SAM-Track.

bnew · Apr 29, 2023

GitHub - MadryLab/photoguard: Raising the Cost of Malicious AI-Powered Image Editing

Raising the Cost of Malicious AI-Powered Image Editing - GitHub - MadryLab/photoguard: Raising the Cost of Malicious AI-Powered Image Editing

github.com

About

Raising the Cost of Malicious AI-Powered Image Editing

gradientscience.org/photoguard/

Raising the Cost of Malicious AI-Powered Image Editing

This repository contains the code for our recent work on safe-guarding images against manipulation by ML-powerd photo-editing models such as stable diffusion.

Raising the Cost of Malicious AI-Powered Image Editing
Hadi Salman*, Alaa Khaddaj*, Guillaume Leclerc*, Andrew Ilyas, Aleksander Madry
Paper: [2302.06588] Raising the Cost of Malicious AI-Powered Image Editing
Blog post: Raising the Cost of Malicious AI-Powered Image Editing
Interactive demo: Photoguard - a Hugging Face Space by hadisalman (check below for how to run it locally)

@article{salman2023raising,
title={Raising the Cost of Malicious AI-Powered Image Editing},
author={Salman, Hadi and Khaddaj, Alaa and Leclerc, Guillaume and Ilyas, Andrew and Madry, Aleksander},
journal={arXiv preprint arXiv:2302.06588},
year={2023}
}

bnew · Apr 29, 2023

https://web.archive.org/web/20230430005058/https://twitter.com/lupantech/status/1652022897563795456

@lupantech

65B LLaMA-Adapter-V2 code & checkpoint are NOW ready at GitHub - ZrrSkywalker/LLaMA-Adapter: Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters!

Big update enhancing multimodality & chatbot.

LLaMA-Adapter-V2 surpasses #ChatGPT in response quality (102%:100%) & beats #Vicuna in win-tie-lost (50:14).

️Thanks to Peng Gao &
@opengvlab
!

LLaMA-Adapter/llama_adapter_v2_chat65b at main · ZrrSkywalker/LLaMA-Adapter

Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters - LLaMA-Adapter/llama_adapter_v2_chat65b at main · ZrrSkywalker/LLaMA-Adapter

github.com

bnew · Apr 29, 2023

https://web.archive.org/save/https://twitter.com/gdb/status/1652369023609470976

edit:
copied the prompt using

Image-to-Multilingual-OCR

Gradio - a Hugging Face Space by awacke1

Act as a dual PhD in sports psychology and neuroscience: Your job is to design a system to someone addicted to something that will positively impact their life; in this case, starting an exercise habit (running): Create a 60 plan using research-backed principles to have anyone--even someone who hates running--build a running habit if they follow the plan: Incorporate research such as BF Skinner's study of addiction, BJ Fogg's Behavioral Model; and similar research on addiction and compulsion.

Outline a week-by-week plan; but give a detailed day-by-day plan for the first week:

bnew · Apr 29, 2023

https://web.archive.org/web/20230430012709/https://twitter.com/DrEalmutairi/status/1652272468105543681

https://unesdoc.unesco.org/ark:/48223/pf0000385146/PDF/385146eng.pdf.multi

ChatGPT and Artificial Intelligence in higher education
Quick start guide Portrait created by DALL.E 2, an AI system that can create realistic images and art in response to a text description. The AI was asked to produce an impressionist portrait of how artificial intelligence would look going to university. Concept by UNESCO IESALC

bnew · Apr 29, 2023

https://web.archive.org/web/20230430015051/https://twitter.com/jbrowder1/status/1652187049255120897

bnew · Apr 29, 2023

https://web.archive.org/web/20230430015207/https://twitter.com/gdb/status/1652411976713371649

bnew · Apr 29, 2023

https://web.archive.org/web/20230430010034/https://twitter.com/emollick/status/1652170706312896512

bnew · Apr 29, 2023

https://web.archive.org/web/20230430022903/https://twitter.com/rasbt/status/1652288118924644352

Understanding Large Language Models

A Cross-Section of the Most Relevant Literature To Get Up to Speed

magazine.sebastianraschka.com

The A.I Megathread (LLM , GPT , Development)

Veteran

Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images​

⚡

Veteran

Veteran

Stability AI releases StableVicuna, the AI World’s First Open Source RLHF LLM Chatbot​

Veteran

Large Language Models for Commercial Use​

Truefoundry​

Can you just start using LLMs for your business?​

Different types of licenses and what do they mean?​

Apache 2.0 License​

MIT License​

GPL-3.0 License​

Proprietary License​

RAIL License​

Which models can I use?​

Bloom​

Dolly 2.0​

RWKV Raven​

Eleuther AI Models (Polyglot, GPT Neo, GPT NeoX, GPT-J, Pythia)​

Veteran

What is MLC LLM?​

Veteran

About​

Segment and Track Anything (SAM-Track)​

Veteran

About​

Raising the Cost of Malicious AI-Powered Image Editing​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images

Stability AI releases StableVicuna, the AI World’s First Open Source RLHF LLM Chatbot

Large Language Models for Commercial Use

Truefoundry

Can you just start using LLMs for your business?

Different types of licenses and what do they mean?

Apache 2.0 License

MIT License

GPL-3.0 License

Proprietary License

RAIL License

Which models can I use?

Bloom

Dolly 2.0

RWKV Raven

Eleuther AI Models (Polyglot, GPT Neo, GPT NeoX, GPT-J, Pythia)

What is MLC LLM?

About

Segment and Track Anything (SAM-Track)

About

Raising the Cost of Malicious AI-Powered Image Editing