Nov 1, 2015



Computer Science > Machine Learning​

[Submitted on 1 Dec 2023]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces​

Albert Gu, Tri Dao
Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:arXiv:2312.00752 [cs.LG]
(or arXiv:2312.00752v1 [cs.LG] for this version)
[2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Focus to learn more

Submission history​

From: Albert Gu [view email]
[v1] Fri, 1 Dec 2023 18:01:34 UTC (1,264 KB)

Last edited:


Nov 1, 2015


Magicoder: Source Code Is All You Need

large-language-models ai4code llm llm4code

🎩 Magicoder: Source Code Is All You Need​

🎩 Models | 📚 Dataset | 🚀 Quick Start | 👀 Demo | 📝 Citation | 🙏 Acknowledgements

We are keeping improving the documents and adding more implementation details. Please stay tuned!


  • 🎩Magicoder is a model family empowered by 🪄OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets for generating low-bias and high-quality instruction data for code.
  • 🪄OSS-Instruct mitigates the inherent bias of the LLM-synthesized instruction data by empowering them with a wealth of open-source references to produce more diverse, realistic, and controllable data.
Overview of OSS-Instruct

🎩 Models​

ModelCheckpointSizeHumanEval (+)MBPP (+)DemoLicense
Magicoder-CL-7B🤗 HF Link7B60.4 (55.5)64.2 (52.6)--Llama2
Magicoder-S-CL-7B🤗 HF Link7B70.7 (66.5)68.4 (56.6)--Llama2
Magicoder-DS-6.7B🤗 HF Link6.7B66.5 (60.4)75.4 (61.9)--DeepSeek
Magicoder-S-DS-6.7B🤗 HF Link6.7B76.8 (70.7)75.7 (64.4)--DeepSeek

📚 Dataset​

Magicoder models are trained on the synthetic data generated by gpt-3.5-turbo-1106 developed by OpenAI. Please pay attention to OpenAI's terms of use when using the models and the datasets.



Nov 1, 2015

Moravec's paradox is a phenomenon observed by robotics researcher Hans Moravec, in which tasks that are easy for humans to perform (eg, motor or social skills) are difficult for machines to replicate, whereas tasks that are difficult for humans (eg, performing mathematical calculations or large-scale data analysis) are relatively easy for machines to accomplish.

For example, a computer-aided diagnostic system might be able to analyse large volumes of images quickly and accurately but might struggle to recognise clinical context or technical limitations that a human radiologist would easily identify. Similarly, a machine learning algorithm might be able to predict a patient's risk of a specific condition on the basis of their medical history and laboratory results but might not be able to account for the nuances of the patient's individual case or consider the effect of social and environmental factors that a human physician would consider. In surgery, there has been great progress in the field of robotics in health care when robotic elements are controlled by humans, but artificial intelligence-driven robotic technology has been much slower to develop.

Great article by Terence Tao:



Nov 1, 2015

Model Card for Model ID​

dragon-yi-6b-v0 part of the dRAGon ("Delivering RAG On ...") model series, RAG-instruct trained on top of a Yi-6B base model.

DRAGON models have been fine-tuned with the specific objective of fact-based question-answering over complex business and legal documents with an emphasis on reducing hallucinations and providing short, clear answers for workflow automation.

Benchmark Tests​

Evaluated against the benchmark test: RAG-Instruct-Benchmark-Tester
Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.

--Accuracy Score: 99.5 correct out of 100
--Not Found Classification: 90.0%
--Boolean: 87.5%
--Math/Logic: 77.5%
--Complex Questions (1-5): 4 (Above Average)
--Summarization Quality (1-5): 4 (Above Average)
--Hallucinations: No hallucinations observed in test runs.

For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).

Model Description​

  • Developed by: llmware
  • Model type: Yi
  • Language(s) (NLP): English
  • License: Yi License Link
  • Finetuned from model: Yi-6B

Direct Use​

DRAGON is designed for enterprise automation use cases, especially in knowledge-intensive industries, such as financial services, legal and regulatory industries with complex information sources.

DRAGON models have been trained for common RAG scenarios, specifically: question-answering, key-value extraction, and basic summarization as the core instruction types without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.

This model is licensed according to the terms of the license of the base model, Yi-6B, at this link.

Bias, Risks, and Limitations​

Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.

How to Get Started with the Model​

The fastest way to get started with BLING is through direct import in transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("dragon-yi-6b-v0")
model = AutoModelForCausalLM.from_pretrained("dragon-yi-6b-v0")
Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model. The includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.


Nov 1, 2015

AI Alliance will open-source AI models; Meta, IBM, Intel, NASA on board​

Ben Lovejoy | Dec 5 2023 - 4:11 am PT

AI Alliance will open-source AI models | Software code on widescreen monitor

A new industry group known as the AI Alliance believes that artificial intelligence models should be open-source, in contrast to the proprietary models developed by OpenAI and Google.

Meta, IBM, Intel, and NASA are just some of the organizations to sign up, believing that the approach offers three key benefits …

The AI Alliance​

The really big breakthroughs in generative AI have so far come from the likes of OpenAI and Google, who keep their models a closely-guarded secret.

But there are some companies and organizations who believe that big AI projects should be open-source. More than 40 of them have signed up to the AI Alliance, reports Bloomberg.

Meta and IBM are joining more than 40 companies and organizations to create an industry group dedicated to open source artificial intelligence work, aiming to share technology and reduce risks.

The coalition, called the AI Alliance, will focus on the responsible development of AI technology, including safety and security tools, according to a statement Tuesday. The group also will look to increase the number of open source AI models — rather than the proprietary systems favored by some companies — develop new hardware and team up with academic researchers.

Three key benefits of open-source models​

The alliance says that working openly together in this way offers three benefits.

First, speed. Allowing models to be shared, so that researchers can build on the work of others, will enable more rapid progress.

Second, safety. Allowing independent peer groups to examine code created by others is the best way to identify potential flaws and risks. This is the same argument for open-sourcing security protocols, like encryption systems.

Third, equal opportunity. By providing anyone with access to the tools being built, it creates a level playing field in which solo researchers and startups have the same opportunities as well-funded companies.

Mission statement​

The AI Alliance describes its mission as:

Accelerating and disseminating open innovation across the AI technology landscape to improve foundational capabilities, safety, security and trust in AI, and to responsibly maximize benefits to people and society everywhere.

The AI Alliance brings together a critical mass of compute, data, tools, and talent to accelerate open innovation in AI.

The AI Alliance seeks to:

Build and support open technologies across software, models and tools.

Enable developers and scientists to understand, experiment, and adopt open technologies.

Advocate for open innovation with organizational and societal leaders, policy and regulatory bodies, and the public.

IBM and Meta have taken the lead in establishing the body. IBM said that the formation of the group is “a pivotal moment in defining the future of AI,” while Meta said that it means “more people can access the benefits, build innovative products and work on safety.”

Other members are listed as:

  • Agency for Science, Technology and Research (A*STAR)
  • Aitomatic
  • AMD
  • Anyscale
  • Cerebras
  • CERN
  • Cleveland Clinic
  • Cornell University
  • Dartmouth
  • Dell Technologies
  • Ecole Polytechnique Federale de Lausanne
  • ETH Zurich
  • Fenrir, Inc.
  • FPT Software
  • Hebrew University of Jerusalem
  • Hugging Face
  • IBM
  • Abdus Salam International Centre for Theoretical Physics (ICTP)
  • Imperial College London
  • Indian Institute of Technology Bombay
  • Institute for Computer Science, Artificial Intelligence
  • Intel
  • Keio University
  • LangChain
  • LlamaIndex
  • Linux Foundation
  • Mass Open Cloud Alliance, operated by Boston University and Harvard
  • Meta
  • Mohamed bin Zayed University of Artificial Intelligence
  • MLCommons
  • National Aeronautics and Space Administration
  • National Science Foundation
  • New York University
  • NumFOCUS
  • OpenTeams
  • Oracle
  • Partnership on AI
  • Quansight
  • Red Hat
  • Rensselaer Polytechnic Institute
  • Roadzen
  • Sakana AI
  • SB Intuitions
  • ServiceNow
  • Silo AI
  • Simons Foundation
  • Sony Group
  • Stability AI
  • Together AI
  • TU Munich
  • UC Berkeley College of Computing, Data Science, and Society
  • University of Illinois Urbana-Champaign
  • The University of Notre Dame
  • The University of Texas at Austin
  • The University of Tokyo
  • Yale University

Apple is reportedly testing its own generative AI chatbot internally, but is not expected to bring anything to market in the next year or so.

Photo: Fili Santillán/Unsplash