The A.I Megathread (LLM , GPT , Development)

bnew · Dec 4, 2023

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured...

arxiv.org

Computer Science > Machine Learning

[Submitted on 1 Dec 2023]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2312.00752 [cs.LG]
	(or arXiv:2312.00752v1 [cs.LG] for this version)
	[2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces Focus to learn more

Submission history

From: Albert Gu [view email]
[v1] Fri, 1 Dec 2023 18:01:34 UTC (1,264 KB)

https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf

GitHub - state-spaces/mamba: Mamba SSM architecture

Mamba SSM architecture. Contribute to state-spaces/mamba development by creating an account on GitHub.

github.com

bnew · Dec 4, 2023

Asking ChatGPT to Repeat Words ‘Forever’ Is Now a Terms of Service Violation

A technique used by Google researchers to reveal ChatGPT training data is now banned by OpenAI.

www.404media.co

bnew · Dec 4, 2023

https://archive.is/Vm4IU

bnew · Dec 5, 2023

bnew · Dec 5, 2023

Easily Train a Specialized LLM: PEFT, LoRA, QLoRA, LLaMA-Adapter, and More

Training a specialized LLM over your own data is easier than you think...

cameronrwolfe.substack.com

https://archive.is/EI5NJ

bnew · Dec 5, 2023

'https://archive.is/r08VA

bnew · Dec 5, 2023

bnew · Dec 5, 2023

GitHub - ytongbai/LVM

Contribute to ytongbai/LVM development by creating an account on GitHub.

github.com

bnew · Dec 5, 2023

https://archive.is/ur1RO

bnew · Dec 5, 2023

https://github.com/ise-uiuc/magicoder

About

Magicoder: Source Code Is All You Need

large-language-models ai4code llm llm4code

Magicoder: Source Code Is All You Need

Models |

Dataset |

Quick Start |

Demo |

Citation |

Acknowledgements

Important
We are keeping improving the documents and adding more implementation details. Please stay tuned!

About

Magicoder is a model family empowered by OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets for generating low-bias and high-quality instruction data for code.
OSS-Instruct mitigates the inherent bias of the LLM-synthesized instruction data by empowering them with a wealth of open-source references to produce more diverse, realistic, and controllable data.

Models

Model	Checkpoint	Size	HumanEval (+)	MBPP (+)	Demo	License
Magicoder-CL-7B	HF Link	7B	60.4 (55.5)	64.2 (52.6)	--	Llama2
Magicoder-S-CL-7B	HF Link	7B	70.7 (66.5)	68.4 (56.6)	--	Llama2
Magicoder-DS-6.7B	HF Link	6.7B	66.5 (60.4)	75.4 (61.9)	--	DeepSeek
Magicoder-S-DS-6.7B	HF Link	6.7B	76.8 (70.7)	75.7 (64.4)	--	DeepSeek

Dataset

Magicoder-OSS-Instruct-75K: generated through OSS-Instruct using gpt-3.5-turbo-1106 and used to train both Magicoder and Magicoder-S series.
Magicoder-Evol-Instruct-110K: decontaminated and redistributed from theblackcat102/evol-codealpaca-v1, used to further finetune Magicoder series and obtain Magicoder-S models.

Note
Magicoder models are trained on the synthetic data generated by gpt-3.5-turbo-1106 developed by OpenAI. Please pay attention to OpenAI's terms of use when using the models and the datasets.

EvalPlus Leaderboard

evalplus.github.io

bnew · Dec 5, 2023

IIVI said:
Moravec's paradox is a phenomenon observed by robotics researcher Hans Moravec, in which tasks that are easy for humans to perform (eg, motor or social skills) are difficult for machines to replicate, whereas tasks that are difficult for humans (eg, performing mathematical calculations or large-scale data analysis) are relatively easy for machines to accomplish.

For example, a computer-aided diagnostic system might be able to analyse large volumes of images quickly and accurately but might struggle to recognise clinical context or technical limitations that a human radiologist would easily identify. Similarly, a machine learning algorithm might be able to predict a patient's risk of a specific condition on the basis of their medical history and laboratory results but might not be able to account for the nuances of the patient's individual case or consider the effect of social and environmental factors that a human physician would consider. In surgery, there has been great progress in the field of robotics in health care when robotic elements are controlled by humans, but artificial intelligence-driven robotic technology has been much slower to develop.

Click to expand...

Great article by Terence Tao:

Embracing change and resetting expectations

unlocked.microsoft.com

https://archive.is/LhOF4

https://archive.is/4NEOT

bnew · Dec 5, 2023

llmware (llmware)

Enterprise LLM-based applications, middleware and specialized models.

huggingface.co

TheBloke/dragon-yi-6B-v0-GGUF · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Model Card for Model ID

dragon-yi-6b-v0 part of the dRAGon ("Delivering RAG On ...") model series, RAG-instruct trained on top of a Yi-6B base model.

DRAGON models have been fine-tuned with the specific objective of fact-based question-answering over complex business and legal documents with an emphasis on reducing hallucinations and providing short, clear answers for workflow automation.

Benchmark Tests

Evaluated against the benchmark test: RAG-Instruct-Benchmark-Tester
Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.

--Accuracy Score: 99.5 correct out of 100
--Not Found Classification: 90.0%
--Boolean: 87.5%
--Math/Logic: 77.5%
--Complex Questions (1-5): 4 (Above Average)
--Summarization Quality (1-5): 4 (Above Average)
--Hallucinations: No hallucinations observed in test runs.

For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).

Model Description

Developed by: llmware
Model type: Yi
Language(s) (NLP): English
License: Yi License Link
Finetuned from model: Yi-6B

Direct Use

DRAGON is designed for enterprise automation use cases, especially in knowledge-intensive industries, such as financial services, legal and regulatory industries with complex information sources.

DRAGON models have been trained for common RAG scenarios, specifically: question-answering, key-value extraction, and basic summarization as the core instruction types without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.

This model is licensed according to the terms of the license of the base model, Yi-6B, at this link.

Bias, Risks, and Limitations

Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.

How to Get Started with the Model

The fastest way to get started with BLING is through direct import in transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("dragon-yi-6b-v0")
model = AutoModelForCausalLM.from_pretrained("dragon-yi-6b-v0")
Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model. The generation_test_llmware_script.py includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.

bnew · Dec 5, 2023

https://www.nytimes.com/2023/12/05/technology/ai-chatgpt-google-meta.html

https://www.thecoli.com/threads/inside-the-a-i-arms-race-that-changed-silicon-valley-forever.1008604/

bnew · Dec 5, 2023

AI Alliance will open-source AI models; Meta, IBM, NASA on board

A new industry group known as the AI Alliance believes that artificial intelligence models should be open-source, in contrast to...

9to5mac.com

AI Alliance will open-source AI models; Meta, IBM, Intel, NASA on board

Ben Lovejoy | Dec 5 2023 - 4:11 am PT

AI Alliance will open-source AI models | Software code on widescreen monitor

A new industry group known as the AI Alliance believes that artificial intelligence models should be open-source, in contrast to the proprietary models developed by OpenAI and Google.

Meta, IBM, Intel, and NASA are just some of the organizations to sign up, believing that the approach offers three key benefits …

The AI Alliance

The really big breakthroughs in generative AI have so far come from the likes of OpenAI and Google, who keep their models a closely-guarded secret.

But there are some companies and organizations who believe that big AI projects should be open-source. More than 40 of them have signed up to the AI Alliance, reports Bloomberg.

Meta and IBM are joining more than 40 companies and organizations to create an industry group dedicated to open source artificial intelligence work, aiming to share technology and reduce risks.

The coalition, called the AI Alliance, will focus on the responsible development of AI technology, including safety and security tools, according to a statement Tuesday. The group also will look to increase the number of open source AI models — rather than the proprietary systems favored by some companies — develop new hardware and team up with academic researchers.

Three key benefits of open-source models

The alliance says that working openly together in this way offers three benefits.

First, speed. Allowing models to be shared, so that researchers can build on the work of others, will enable more rapid progress.

Second, safety. Allowing independent peer groups to examine code created by others is the best way to identify potential flaws and risks. This is the same argument for open-sourcing security protocols, like encryption systems.

Third, equal opportunity. By providing anyone with access to the tools being built, it creates a level playing field in which solo researchers and startups have the same opportunities as well-funded companies.

Mission statement

The AI Alliance describes its mission as:

Accelerating and disseminating open innovation across the AI technology landscape to improve foundational capabilities, safety, security and trust in AI, and to responsibly maximize benefits to people and society everywhere.

The AI Alliance brings together a critical mass of compute, data, tools, and talent to accelerate open innovation in AI.

The AI Alliance seeks to:

Build and support open technologies across software, models and tools.

Enable developers and scientists to understand, experiment, and adopt open technologies.

Advocate for open innovation with organizational and societal leaders, policy and regulatory bodies, and the public.

IBM and Meta have taken the lead in establishing the body. IBM said that the formation of the group is “a pivotal moment in defining the future of AI,” while Meta said that it means “more people can access the benefits, build innovative products and work on safety.”

Other members are listed as:

Agency for Science, Technology and Research (A*STAR)
Aitomatic
AMD
Anyscale
Cerebras
CERN
Cleveland Clinic
Cornell University
Dartmouth
Dell Technologies
Ecole Polytechnique Federale de Lausanne
ETH Zurich
Fast.ai
Fenrir, Inc.
FPT Software
Hebrew University of Jerusalem
Hugging Face
IBM
Abdus Salam International Centre for Theoretical Physics (ICTP)
Imperial College London
Indian Institute of Technology Bombay
Institute for Computer Science, Artificial Intelligence
Intel
Keio University
LangChain
LlamaIndex
Linux Foundation
Mass Open Cloud Alliance, operated by Boston University and Harvard
Meta
Mohamed bin Zayed University of Artificial Intelligence
MLCommons
National Aeronautics and Space Administration
National Science Foundation
New York University
NumFOCUS
OpenTeams
Oracle
Partnership on AI
Quansight
Red Hat
Rensselaer Polytechnic Institute
Roadzen
Sakana AI
SB Intuitions
ServiceNow
Silo AI
Simons Foundation
Sony Group
Stability AI
Together AI
TU Munich
UC Berkeley College of Computing, Data Science, and Society
University of Illinois Urbana-Champaign
The University of Notre Dame
The University of Texas at Austin
The University of Tokyo
Yale University

Apple is reportedly testing its own generative AI chatbot internally, but is not expected to bring anything to market in the next year or so.

Photo: Fili Santillán/Unsplash

bnew · Dec 5, 2023

The A.I Megathread (LLM , GPT , Development)

Veteran

Computer Science > Machine Learning​

Mamba: Linear-Time Sequence Modeling with Selective State Spaces​

Submission history​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

About​

Magicoder: Source Code Is All You Need​

About​

Models​

Dataset​

Veteran

Veteran

Model Card for Model ID​

Benchmark Tests​

Model Description​

Direct Use​

Bias, Risks, and Limitations​

How to Get Started with the Model​

Veteran

Veteran

AI Alliance will open-source AI models; Meta, IBM, Intel, NASA on board​

The AI Alliance​

Three key benefits of open-source models​

Mission statement​

Veteran

Computer Science > Machine Learning

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Submission history

About

Magicoder: Source Code Is All You Need

About

Models

Dataset

Model Card for Model ID

Benchmark Tests

Model Description

Direct Use

Bias, Risks, and Limitations

How to Get Started with the Model

AI Alliance will open-source AI models; Meta, IBM, Intel, NASA on board

The AI Alliance

Three key benefits of open-source models

Mission statement