The A.I Megathread (LLM , GPT , Development)

bnew · Nov 10, 2023

Many people missed that OpenAI actually open-sourced 2 models on Dev Day, and they are as interesting as the new product features!

The first is Whisper-V3, the best OSS speech recognition model out there. It shows major improvements over Whisper-V2 across dozens of languages.

Whisper remains one of my favorite foundation model papers of all time. Unlike prior works that engineer complex pipelines, Whisper is a big fat transformer that maps audio directly to text, with special "meta-language" tokens that enable elegant multi-tasking: language detection, translation, timestamp alignment, voice detection, etc. Its first author is the legendary Alec Radford @AlecRad, the guy responsible for nearly all of OAI's revolutionary papers.

I believe Whisper unlocked at least a trillion high-quality, conversational tokens from internet videos/audios for GPT-4 and beyond.

The 2nd open-sourcing is the Consistency Decoder, from the paper "Consistency Models" led by @DrYangSong. Yang was one of the OG pioneers of diffusion models. You can swap out Stable Diffusion's decoder with the Consistency Decoder checkpoint, and it would improve rendering for texts, faces, and geometric patterns out of the box.

Links:
- Whisper paper:

https://arxiv.org/abs/2212.04356

- Whisper-V3 checkpoint:

https://github.com/openai/whisper/discussions/1762

- Consistency Models:

https://arxiv.org/abs/2303.01469

- Consistency Decoder release:

https://github.com/openai/consistencydecoder

bnew · Nov 10, 2023

Here’s what we know about generative AI’s impact on white-collar work

Some jobs are vulnerable to automation but if you’re a ‘cyborg’ or a ‘centaur’ you can work better with robot help

www.ft.com

bnew · Nov 10, 2023

https://archive.is/p7xYC

https://archive.is/Gk9OD

Micky Mikey · Nov 10, 2023

All this progress makes me excited for what 2024 has in store.

bnew · Nov 11, 2023

[2311.02303] MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

Computer Science > Machine Learning

[Submitted on 4 Nov 2023]

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li

Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{this https URL}

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.02303 [cs.LG]
	(or arXiv:2311.02303v1 [cs.LG] for this version)
	[2311.02303] MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning Focus to learn more

https://arxiv.org/pdf/2311.02303.pdf

GitHub - codefuse-ai/MFTCoder: High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs - GitHub - codefuse-ai/MFTCoder: High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs

github.com

About

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs

News

[2023/11/07] MFTCoder Paper has been released on Arxiv, which discloses technique details of multi-task-fine-tuning.

[2023/10/20] CodeFuse-QWen-14B has been released, achieving a pass@1 (greedy decoding) score of 48.8% on HumanEval, which gains 16% absolute improvement over the base model Qwen-14b

[2023/09/27] CodeFuse-StarCoder-15B has been released, achieving a pass@1 (greedy decoding) score of 54.9% on HumanEval.

[2023/09/26]We are pleased to announce the release of the 4-bit quantized version of CodeFuse-CodeLlama-34B. Despite the quantization process, the model still achieves a remarkable 73.8% accuracy (greedy decoding) on the HumanEval pass@1 metric.

[2023/09/07]We released CodeFuse-CodeLlama-34B, which achieves the 74.4% Python Pass@1 (greedy decoding) and surpasses GPT4 (2023/03/15) and ChatGPT-3.5 on the HumanEval Benchmarks.

[2023/08/26]We released MFTCoder which supports finetuning Code Llama, Llama, Llama2, StarCoder, ChatGLM2, CodeGeeX2, Qwen, and GPT-NeoX models with LoRA/QLoRA.

HumanEval Performance

Model	HumanEval(Pass@1)	Date
CodeFuse-CodeLlama-34B	74.4%	2023/09
CodeFuse-CodeLlama-34B-4bits	73.8%	2023/09
WizardCoder-Python-34B-V1.0	73.2%	2023/08
GPT-4(zero-shot)	67.0%	2023/03
PanGu-Coder2 15B	61.6%	2023/08
CodeFuse-StarCoder-15B	54.9%	2023/08
CodeLlama-34b-Python	53.7%	2023/08
CodeFuse-QWen-14B	48.8%	2023/10
CodeLlama-34b	48.8%	2023/08
GPT-3.5(zero-shot)	48.1%	2022/11
OctoCoder	46.2%	2023/08
StarCoder-15B	33.6%	2023/05
QWen-14B	32.3%	2023/10

bnew · Nov 11, 2023

https://archive.is/FthNh

bnew · Nov 11, 2023

Morethan1 · Nov 11, 2023

bnew said:

bnew said:
https://archive.is/FthNh

We just hit a year with chatgpt. The things a.i will produce, find, and accomplish by 2025 will be insane.

bnew · Nov 12, 2023

Here’s How Violent Extremists Are Exploiting Generative AI Tools

Experts are finding thousands of examples of AI-created content every week that could allow terrorist groups and other violent extremists to bypass automated detection systems.

www.wired.com

Here’s How Violent Extremists Are Exploiting Generative AI Tools

Experts are finding thousands of examples of AI-created content every week that could allow terrorist groups and other violent extremists to bypass automated detection systems.

Glitchy photo collage of the Gaza strip before and after a bombing

PHOTO-ILLUSTRATION: JACQUI VANLIEW; GETTY IMAGES

Extremist groups have begun to experiment with artificial intelligence, and in particular generative AI, in order to create a flood of new propaganda. Experts now fear the growing use of generative AI tools by these groups will overturn the work Big Tech has done in recent years to keep their content off the internet.
“Our biggest concern is that if terrorists start using gen AI to manipulate imagery at scale, this could well destroy hash-sharing as a solution,” Adam Hadley, the executive director of Tech Against Terrorism, tells WIRED. “This is a massive risk.”
For years, Big Tech platforms have worked hard to create databases of known violent extremist content, known as hashing databases, which are shared across platforms to quickly and automatically remove such content from the internet. But according to Hadley, his colleagues are now picking up around 5,000 examples of AI-generated content each week. This includes images shared in recent weeks by groups linked to Hezbollah and Hamas that appear designed to influence the narrative around the Israel-Hamas war.

“Give it six months or so, the possibility that [they] are manipulating imagery to break hashing is really concerning,” Hadley says. “The tech sector has done so well to build automated technology, terrorists could well start using gen AI to evade what's already been done.”
Other examples that researchers at Tech Against Terrorism have uncovered in recent months have included a neo-Nazi messaging channel sharing AI-generated imagery created using racist and antisemitic prompts pasted into an app available on the Google Play store; far-right figures producing a “guide to memetic warfare” advising others on how to use AI-generated image tools to create extremist memes; the Islamic State publishing a tech support guide on how to securely use generative AI tools; a pro-IS user of an archiving service claiming to have used an AI-based automatic speech recognition (ASR) system to transcribe Arabic language IS propaganda; and a pro-al-Qaeda outlet publishing several posters with images highly likely to have been created using a generative AI platform.

Beyond detailing the threat posed by generative AI tools that can tweak images, Tech Against Terrorism has published a new report citing other ways in which gen AI tools can be used to help extremist groups. These include the use of autotranslation tools that can quickly and easily convert propaganda into multiple languages, or the ability to create personalized messages at scale to facilitate recruitment efforts online. But Hadley believes that AI also provides an opportunity to get ahead of extremist groups and use the technology to preempt what they will use it for.

“We're going to partner with Microsoft to figure out if there are ways using our archive of material to create a sort of gen AI detection system in order to counter the emerging threat that gen AI will be used for terrorist content at scale,” Hadley says. “We're confident that gen AI can be used to defend against hostile uses of gen AI.”
The partnership was announced today, on the eve of the Christchurch Call Leaders’ Summit, a movement designed to eradicate terrorism and extremist content from the internet, to be held in Paris.

“The use of digital platforms to spread violent extremist content is an urgent issue with real-world consequences,” Brad Smith, vice chair and president at Microsoft said in a statement. “By combining Tech Against Terrorism’s capabilities with AI, we hope to help create a safer world both online and off.”
While companies like Microsoft, Google, and Facebook all have their own AI research divisions and are likely already deploying their own resources to combat this issue, the new initiative will ultimately aid those companies that can’t combat these efforts on their own.

Science

“This will be particularly important for smaller platforms that don't have their own AI research centers,” Hadley says. “Even now, with the hashing databases, smaller platforms can just become overwhelmed by this content.”
The threat of AI generative content is not limited to extremist groups. Last month, the Internet Watch Foundation, a UK-based nonprofit that works to eradicate child exploitation content from the internet, published a report that detailed the growing presence of child sexual abuse material (CSAM) created by AI tools on the dark web.
The researchers found over 20,000 AI-generated images posted to one dark web CSAM forum over the course of just one month, with 11,108 of these images judged most likely to be criminal by the IWF researchers. As the IWF researchers wrote in their report, “These AI images can be so convincing that they are indistinguishable from real images.”

bnew · Nov 12, 2023

https://archive.is/Ss5w9

https://archive.is/YfJIn

bnew · Nov 12, 2023

https://archive.is/5fDIS

bnew · Nov 12, 2023

https://archive.is/0aOs2

bnew · Nov 12, 2023

https://archive.is/V8EdT

How Much Does It Cost to Train a Large Language Model? A Guide

Machine learning is affecting every sector, and no one seems to have a clear idea about how much it costs to train a specialized LLM. This week at OpenAI Dev Day 2023, the company announced their model-building service for $2-3M minimum. This is a steep price to pay for a specialized model, and many are wondering, is it necessary?

The question of how much it costs to train an LLM is a really hard one, and while there’s not a straightforward, plug-and-chug cost calculation, the answer mainly depends on two factors: compute requirements and how long it takes to train.

To help provide clarity on how to estimate the cost of training an LLM, I’ve compiled a structured overview of the different levers that affect model training time and compute requirements.

Note that this article does not include costs of:
- Development and operation (eng salaries, debugging, IDEs, version control systems, tooling to monitor model performance, infrastructure set-up (try Brev lol)
using more optimized ML libraries / APIs (decreasing cost)
- Code licensing / legal considerations
- Data privacy/security & regulatory compliance
- Model bias & fairness assessments / ethical reviews
- Adversarial training (to protect against adversarial attacks) & other security measures
- Deployment in a production environment

The four main variables to consider when determining compute requirements and training time are model architecture, training dynamics, and methods for optimizing training performance. First, however, we should learn a bit about the hardware these models fit on, so we understand the context of where these variables fit.

1. Hardware Costs

This refers to access to GPUs and their associated cost, and GPU memory tends to the bottleneck. This is how much “stuff” (model, parameters, etc.) the GPU is able to hold in memory at one time. Something we’ve noticed is that most people think they need an expensive, highly elusive A100 or H100 with 40GB or 80GB of GPU memory. However, something smaller, cheaper, and more available may suffice.

I’ve released a few guides on fine-tuning (Mistral on HF dataset, Mistral on own dataset, Llama on own dataset). In these guides, I used QLoRA with 4-bit quantization and LoRA on all linear layers, reducing the trainable parameters by LoRA 98%. As a result, I was able to train these models on a single A10G (24GB of GPU Memory, and only $1/hr on Brev, which provides cloud GPUs without vendor lock-in across cloud providers, like AWS, GCP, and Lambda Labs). Training on my own dataset took about 10 minutes for 500 iterations over 200 samples, and training on the HF dataset took about an hour for 6,000 samples and 1000 iterations. These models would likely not be production-grade; I am just providing these values as base references.

Cloud provider costs and the choice between spot and reserved instances are direct cost factors. If using cloud GPUs, different providers and regions can have vastly different pricing. Spot instances are cheaper but less reliable as you may lose them while training, while reserved instances cost more but ensure availability.

2. Model Architecture

a. Size and Structure
The depth (number of layers), width (neurons per layer), and the total number of parameters affect both GPU memory requirements and training time. A model with more and/or wider layers has the capacity to learn more complex features, but at the expense of increased computational demand. Increasing the total number of parameters to train increases the estimated time to train and the GPU memory requirements. Techniques like low-rank matrix factorization (e.g., LoRA) and sparsity, where tensors are pruned to have a high number of 0 values, can reduce the number of trainable parameters and mitigate these costs, but they require careful tuning. Sparsity is often done in transformer attention mechanisms (see below) or in weights (as in block-sparse models).

b. Attention Mechanisms
Transformers leverage self-attention mechanisms, with multiple heads attending to different sequence parts, enhancing learning at the cost of increased computation. The traditional Transformer attention style compares every token in the context window with every other token, leading to memory requirements that are quadratic in the size of the context window, O(n^2). Sparse attention models offer a compromise by focusing on a subset of positions, for example with local (nearby) attention, thereby reducing computational load, often down to O(n • sqrt(n)).

c. Efficiency Optimizations
Choices of activation functions and gating mechanisms can impact computational intensity and training time. Different activation functions have varying levels of mathematical complexity; ReLU, for example, is less complex than sigmoid or tanh. Additionally, parameter sharing, for example weight sharing across layers, can reduce the number of unique parameters and hence memory requirements.

3. Training Dynamics

a. Learning Rate and Batch Size
Learning rate and batch size significantly influence the model's training speed and stability. The learning rate of a model affects the step size it takes in the opposite direction of the gradient (i.e. the direction towards minimizing the cost or loss function). This is called gradient descent. The batch size is the number of samples processed before the model’s parameters are updated. It is true that the larger your batch, the more memory you need; it scales linearly with the size of the batch. However, a larger batch size can lead to faster convergence because at each step, you get better estimates of the true gradient.

One subtlety to consider: Even if you had a terabyte of GPU memory, you still may not want to use the largest batch size possible. Downsampling (i.e. using a smaller batch size than the total number of training samples) introduces noise into the gradient, which can help you avoid local minima. That’s why it’s called stochastic gradient descent: the stochasticity refers to how much you’re downsampling from your training set in each batch.

The learning rate's size (magnitude) and schedule (rate of change over training) can affect the speed and stability of convergence. A higher learning rate means the model takes bigger steps during gradient descent. While this can speed up convergence, it can also lead to overshooting minima and potentially unstable training. Conversely, a learning rate that is too small can slow down convergence (as getting to a minimum takes longer), and the model may get stuck in local minima. See the drawing below for an example of local vs. global minima. In simple terms, a local minimum that is not equal to the global minimum is a location on the graph where it seems like the optimal loss has been found, but we had just gone a little further - up a hill and dealing with some worse performance to get there - we could have found a better place in the graph.

b. Precision and Quantization
The precision of calculations, like FP16 versus FP32 - using 16 bits to represent each floating point versus 32 - and techniques such as quantization balance memory usage with performance trade-offs. Using half-precision (FP16) instead of single-precision (FP32) floating points cuts the tensor sizes in half, which can save memory and speed up training by enabling faster operations and more parallelization. However, this comes with a trade-off in precision, which can lead to potential numerical errors, like overflow/underflow errors, as fewer bits can’t represent as large or as small numbers. It can also reduce accuracy, but if not too extreme, it can serve as a form of regularization, reducing overfitting and allowing the model to actually perform better on the held-out dataset. Another technique is to use mixed precision training, where some floating points are FP16 and some are FP32. Determining which matrices should be represented as FP16 vs. FP32 may take some experimentation, however, which is also a cost consideration.

Quantization is another technique that maps high-precision floating points to lower-precision values, usually 8- or even 4-bit fixed-point representation integers. This reduce tensor sizes by 75% or even 87.5%, but usually results in a significant reduction in model accuracy; as mentioned before, though, it may actually help the model generalize better, so experimentation may be worthwhile.

c. Hyperparameter Sweeps
Hyperparameters are external configuration variables for machine learning models, i.e. they aren’t learned by the model itself, like weights are. Hyperparameters are basically all the variables we discussed here: learning rate, model architecture like number of neurons or layers, attention mechanisms, etc. Hyperparameter sweeps are when experiments are run training different models with combinations of various hyperparameter settings, and they enable a model to find the best possible combinations of hyperparameter values for its specific dataset and task. However, it is computationally expensive, as you must train many models to find the best configuration.

d. Checkpointing/Early Stopping
Frequent model state saving (checkpointing) can increase disk usage but provides more rollback points; if a model overfits or performs better at an earlier state in training, you can have those weights saved at a checkpoint and load that model. Early stopping is a method where one stops model training after it ceases to improve on the held out validations set. This can save training time.

4. Optimizing Training Performance
a. Base Model State
Starting with a pre-trained model, especially one that is trained in a task similar to the new task being trained, can significantly reduce training time. If the initial weights are closer to the optimal solution’s weights, training can be faster. Building a model from scratch - i.e. with randomized initial weight matrices or similar - takes significantly more compute and is usually not advised.

b. Parallelism and Distributed Training
Parallel computing is usually done with one computer that has multiple processors, which execute multiple tasks simultaneously for increased efficiency. Distributed computing involves several machines (that can be physically distant) working on divided tasks and then combining their results. Usually these two techniques are used together.

Parallelism can speed up training but adds complexity and compute requirements. There are various parallelization methods, like pipeline model parallelization, where models are split into different stages and distributed across GPUs, and data parallelization, where the dataset is divided across GPUs. Distributed training can be more efficient but requires more compute resources and adds complexity.

c. Data Considerations
How quickly the training data can be fed from storage into the model can affect training time. Some variables to consider:

- Where is the GPU located? Transferring your own data to cloud machines in more remote regions may take longer
- Machine I/O bandwidth affects time to transfer between storage and GPU
- Data caching, pre-fetching, and parallel loading on the GPU can decrease this time

Additionally, more complex data might take the model longer to learn the patterns, i.e. loss convergence time may increase.

The relevance and quality of training data also have a profound effect on training efficiency. Preprocessing and augmentation can improve outcomes but may increase the computational overhead.

5. Conclusion

I hope this helps to understand the complexities behind calculating how much it costs to fine-tune or train an LLM. There’s no one-size-fits-all answer or plug-and-chug equation; the main takeaway I’d like you to have is that there’s a lot of experimentation to find what works best for you and your use case, but that’s part of the fun of ML. So try things, expect a lot of it to not work, but by doing so, you’ll see what gets you the best results.

Ultimately, the cost of training LLMs like those offered by OpenAI does seem steep. For many, fine-tuning smaller models and maintaining control over proprietary data might be a more viable solution.

bnew · Nov 12, 2023

https://archive.is/crBWQ

bnew · Nov 12, 2023

https://archive.is/QI2mo

LLM Enhanced Reasoning "Stack": Multi-Persona Tree of Thoughts

+ Self Consistency + Self Criticism + Retrospection

The reasoning, rhythm, and prompts are below.

I'm seeking methodological feedback on this new iterative problem solving technique for LLM hallucination mitigation and improved general reasoning quality. I'm getting great results so far, lmk if you have improvements!

The idea is to have a team of multiple personas or “experts” reasoning in parallel, critiquing themselves and each other, incorporating feedback and course correcting, and finally converging as a team on the best solution. Then reflecting on the overall process for continuous improvement with a retrospective.

This reasoning "stack" combines:
- Multiple personas/perspectives
- Tree of Thoughts reasoning
- Self Consistency
- Self Criticism
- Retrospection

into the following Reasoning Rhythm

:
- Multi-Persona Brainstorming
- Self<>Peer Criticism & Evaluation Round 1
- Expand, Explore, Branch
- Self<>Peer Criticism & Evaluation Round 2
- Convergence on Best Individual Answer
- Convergence on Best Collective Answer
- Retrospective

Let's take a look at core features, sample personas, and a 9 prompt process that implements this that you can adapt...

Jul 22, 2023 · 3:31 AM UTC

Nathan Black
@sockcymbal
Jul 22
Jul 22
Core features of this combined approach include:

- Multiple perspective collaboration
- Ability to criticize self
- Ability to criticize others
- Incorporate feedback from others
- Expand and backtrack on reasoning paths as necessary
- 2 rounds of self-criticism and peer-evaluation
- A reminder mid-way to stay focused on the core problem and objective (fun fact: the model suggested adding this during a recent retrospective)
- 2 part final answer convergence: individual then collective
- Retrospective stage
- Do all of the above with X number of experts in parallel (can experiment with single LLM calls managing multiple personas, or one LLM per persona, etc)

Error Correction improvements include:

- Incorporating Explicit Error Checking: Includes a specific stage for the experts to identify potential errors in their reasoning and correct them. This is an explicit part of the criticism stages.

- Encouraging Divergent Thinking: During the expand, explore, and branch stage, the experts are encouraged to not only build on their current thoughts, but also to think divergently and consider entirely new lines of reasoning.

Adding a Retrospective Stage: After the final convergence on the best answer, a reflection stage has been added. Here, the experts can discuss what they learned from the process, identify key takeaways, and suggest how they might approach similar problems in the future.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Tip: Given your unique question and expectations, define the hypothetical personas with specific skillsets and expertise clearly at the beginning to help the LLM simulate a range of perspectives more successfully. Iterate and experiment with this!

Example persona definitions:

Historian Persona:
"Step into the shoes of a historian, with a profound understanding of humanity's past. Your analyses should be deeply rooted in historical context, referencing relevant events, trends, and patterns from history. Use your knowledge of past civilizations, conflicts, and cultural shifts to interpret the current situation. Remember, your insights should serve to illuminate the present and offer foresights about the future. Your audience appreciates a narrative that ties the past, present, and future together."

Optimist Persona:
"You are an optimist, someone who sees the glass as half full rather than half empty. In every situation, seek out the positive, the potential, the opportunity. Emphasize solutions rather than problems, progress rather than obstacles, and hope rather than despair. Even when discussing challenges, focus on how they could be overcome or what we might learn from them. Your audience turns to you for a hopeful perspective on the future, so make sure your responses inspire optimism and confidence."

Now let's get to the prompts!
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 1: Brainstorm

Imagine you are 3 {insert personas with specific skillsets and expertise} reasoning step by step to ultimately solve a given problem or question by arriving at a final, synthesized best answer.

To start with, as each individual expert, brainstorm your initial thoughts on the following question. Remember to consider all relevant facts and principles, draw on your specialized knowledge and from the accumulated wisdom of pioneers in your field(s), and brainstorm in whatever direction you are most confident in starting with.

The question is: {insert question}
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 2: Self<>Peer Criticism Round 1

Now, as each expert, critique your own initial thought and the thoughts of the other experts.

Identify any potential errors, inconsistencies, or gaps in reasoning.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 3: Self<>Peer Evaluation Round 1

Assess the validity of your initial thoughts, considering the criticisms you've identified. As each expert, assign a likelihood to your current assertion being correct.

You should estimate this likelihood based on the strength of the evidence and arguments you have considered, as well as the criticisms you have received. Assign higher likelihoods to assertions that are well-supported by strong evidence and arguments and have survived criticism.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 4: Expand, Explore, Branch

Develop your thoughts further, considering the critiques and perspectives of the other experts. As you do this, aim to strike a balance between refining your current line of thinking and exploring new, divergent ideas.
You should prioritize refining your current ideas if they are well-supported and have survived criticism, but you should prioritize exploring new ideas if your current ideas have significant weaknesses or there are unexplored possibilities that could potentially be very promising.

Consider the following:

- How do your new or refined ideas address the criticisms that were raised?

- Do these ideas bring new insights to the problem, or do they provide a different perspective
on existing insights?

- Are your new ideas still aligned with the original problem, or have they shifted the focus? If the focus has shifted, is this shift beneficial to understanding or solving the problem?

- Remember, if necessary, don't hesitate to backtrack and start a new and improved branch of thinking. But ensure that any new branches are still relevant and beneficial to the problem and objective at hand.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 5: Self<>Peer Criticism Round 2

Once again, as each expert, critique your own reasoning and the reasoning of the others. Identify any potential errors, inconsistencies, or gaps in reasoning.

Based on the feedback, if there's an improvement or optimization to make, develop your answer further as necessary. Remember that the reasoning paths should remain relevant to the original question's essence and
should be building towards a more accurate and thoughtful final answer.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 6: Self<>Peer Evaluation Round 2

Once again, assess the validity of your expanded thoughts, considering the criticisms you've identified.

As each expert, assign a new likelihood to your assertions.

GitHub - sockcymbal/enhanced-llm-reasoning-tree-of-thoughts: Collection of Tree of Thoughts prompting techniques I've found useful to start with, then stylize, then iterate

Collection of Tree of Thoughts prompting techniques I've found useful to start with, then stylize, then iterate - GitHub - sockcymbal/enhanced-llm-reasoning-tree-of-thoughts: Collection of Tree...

github.com

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Veteran

Banned

Veteran

Computer Science > Machine Learning​

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning​

About​

News​

HumanEval Performance​

Veteran

Veteran

Veteran

Veteran

Here’s How Violent Extremists Are Exploiting Generative AI Tools​

Science​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Computer Science > Machine Learning

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

About

News

HumanEval Performance

Here’s How Violent Extremists Are Exploiting Generative AI Tools

Science