Bard gets its biggest upgrade yet with Gemini {Google A.I / LLM}

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240







1/11
This is fast. Chrome running Gemini locally on my laptop. 2 lines of code.

2/11
No library or anything, it's a native part of some future version of Chrome

3/11
Does it work offline?

4/11
This is Chrome 128 Canary. You need to sign up for "Built-In AI proposal preview" to enable it

5/11
Seems very light on memory and CPU

6/11
wait why are they putting this into Chrome lol

are they trying to push this as a web standard of sorts or are they just going to keep this for themselves?

7/11
It's a proposal for all browsers

8/11
Query expansion like this could be promising

9/11
This is a great point!

10/11
No API key required? That would be great, I can run tons of instances of Chrome on the server as the back end of my wrapper apps.

11/11
Free, fast and private for everyone


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







Jun 25, 2024


Get Access to Gemini Nano Locally Using Chrome Canary​

You can access Gemini Nano locally using Chrome Canary. It lets you use cutting-edge AI in your browser.

Gemini Nano

Stay up to date​

Subscribe to AI Newsletter

Explore the power of Gemini Nano that is now available in Chrome Canary. While the official release is coming soon, you can already use Gemini Nano on your computer using Chrome Canary.

What is Gemini Nano?

Gemini Nano is a streamlined version of the larger Gemini model, designed to run locally. It uses the same datasets as as its predecessors. Gemini Nano keeps the original models' multimodal capabilities, but in a smaller form. Google had promised this in Chrome 126. But, it's now in Chrome Canary. This hints that an official release is near.

Benefits of Using Nano Locally

Using Nano locally offers numerous advantages. It enhances product quality drastically. Nano materials display unique properties. Locally sourced materials reduce transport costs. This approach minimizes environmental impacts. Fewer emissions result from local production. It streamlines supply chains efficiently. Locally produced goods boost economies. Customers appreciate nearby product origin. It increases trust with local sourcing. This practice supports community growth.

Running Gemini Nano locally offers several benefits.

  • Privacy: Local processing means data doesn't have to leave your device. This provides an extra layer of security and privacy.
  • Speed and Responsiveness: You don't need to send data to a server. So, interactions can be quicker, improving user experience.
  • Accessibility: Developers can add large language model capabilities to applications. Users don't need constant internet access.



What is Chrome Canary?

It's the most experimental version of the Google Chrome web browser, designed primarily for developers and tech enthusiasts who want to test the latest features and APIs before they are widely available. While it offers cutting-edge functionality, it is also more prone to crashes and instability due to its experimental nature.

  • Canary is updated daily with the latest changes, often with minimal or no testing from Google.
  • It is always three versions ahead of the Stable channel.
  • Canary includes all features of normal Chrome, plus experimental functionality.
  • It can run alongside other Chrome versions and is available for Windows, macOS, and Android.



Launching Gemini Nano Locally with Chrome Canary

To get started with Gemini Nano locally using Chrome Canary, follow these steps:

  1. Download and set up Chrome Canary, ensuring the language is set to English (United States).
  2. In the address bar, enter chrome://flags
  3. Set:
    • the 'Enables optimization guide on device' to Enabled BypassPerfRequirement
    • the 'Prompt API for Gemini Nano' to Enabled



Chrome Flags Enabled

  1. Restart Chrome.
  2. Wait for the Gemini Nano to download. To check the status, navigate to chrome://components and ensure that the Optimization Guide On Device Model shows version 2024.6.5.2205 or higher. If not, click 'Check for updates'.
  3. Congratulations! You're all set to explore Gemini Nano for chat applications. Although the model is significantly simpler, it's a major stride for website developers now having access to a local LLM for inference.
  4. You can chat with chrome AI model here: https://www.localhostai.xyz


Chat Gemini Nano Locally


Conclusion

Gemini Nano is now available on Chrome Canary, a big step forward for local AI. It processes data on your device, which increases privacy and speeds things up. This also makes advanced technology easier for more people to use. Gemini Nano gives developers and tech fans a new way to try out AI. This helps create a stronger and more efficient local tech community and shows what the future of independent digital projects might look like.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240

1/1
🚨LLM Alert 🚨

💎 @GoogleDeepMind " Gemma Team" has officially announced the release of Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters.

💎The 9 billion and 27 billion parameter models are available today, with a 2 billion parameter model to be released shortly.

🌟Few pointers from the Announcement

🎯 In this new version, they have provided several technical modifications to their architecture, such as interleaving local-global attentions and group-query attention.

🎯They also train the 2B and 9B models with knowledge distillation instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3× bigger.

🎯They trained Gemma 2 27B on 13 trillion tokens of primarily-English data, the 9B model on 8 trillion tokens, and the 2.6B on 2 trillion tokens. These tokens come from a variety of data sources, including web documents, code, and science articles.

🎯There models are not multimodal and are not trained specifically for state-of-the-art multi-
lingual capabilities. The final data mixture was determined through ablations similar to the approach in Gemini 1.0.

🎯Just like the original Gemma models, Gemma 2 is available under the commercially-friendly Gemma license, giving developers and researchers the ability to share and commercialize their innovations.

1️⃣Blog: Gemma 2 is now available to researchers and developers

2️⃣Technical Report: https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240

Google opens up Gemini 1.5 Flash, Pro with 2M tokens to the public​

Ken Yeung @thekenyeung

June 27, 2024 6:00 AM

Sir Demis Hassabis introduces Gemini 1.5 Flash. Image credit: Screenshot

Sir Demis Hassabis introduces Gemini 1.5 Flash. Image credit: Screenshot



Google Cloud is making two variations of its flagship AI model—Gemini 1.5 Flash and Pro—publicly accessible. The former is a small multimodal model with a 1 million context window that tackles narrow high-frequency tasks. It was first introduced in May at Google I/O. The latter, the most powerful version of Google’s LLM, debuted in February before being notably upgraded to contain a 2 million context window. That version is now open to all developers.

The release of these Gemini variations aims to showcase how Google’s AI work empowers businesses to develop “compelling” AI agents and solutions. During a press briefing, Google Cloud Chief Executive Thomas Kurian boasts the company sees “incredible momentum” with its generative AI efforts, with organizations such as Accenture, Airbus, Anthropic, Box, Broadcom, Cognizant, Confluent, Databricks, Deloitte, Equifax, Estée Lauder Companies, Ford, GitLab, GM, the Golden State Warriors, Goldman Sachs, Hugging Face, IHG Hotels and Resorts, Lufthansa Group, Moody’s, Samsung, and others building on its platform. He attributes this adoption growth to the combination of what Google’s models are capable of and the company’s Vertex platform. It’ll “continue to introduce new capability in both those layers at a rapid pace.”

Google is also releasing context caching and provisioned throughput, new model capabilities designed to enhance the developer experience.

Gemini 1.5 Flash​

Gemini 1.5 Flash offers developers lower latency, affordable pricing and a context window suitable for inclusion in retail chat agents, document processing, and bots that can synthesize entire repositories. Google claims, on average, that Gemini 1.5 Flash is 40 percent faster than GPT-3.5 Turbo when given an input of 10,000 characters. It has an input price four times lower than OpenAI’s model, with context caching enabled for inputs larger than 32,000 characters.

Gemini 1.5 Pro​

As for Gemini 1.5 Pro, developers will be excited to have a much larger context window. With 2 million tokens, it’s in a class of its own, as none of the prominent AI models has as high of a limit. This means this model can process and consider more text before generating a response than ever before. “You may ask, ‘translate that for me in real terms,'” Kurian states. “Two million context windows says you can take two hours of high-definition video, feed it into the model, and have the model understand it as one thing. You don’t have to break it into chunks. You can feed it as one thing. You can do almost a whole day of audio, one or two hours of video, greater than 60,000 lines of code and over 1.5 million words. And we are seeing many companies find enormous value in this.”

Kurian explains the differences between Gemini 1.5 Flash and Pro: “It’s not just the kind of customers, but it’s the specific [use] cases within a customer.” He references Google’s I/O keynote as a practical and recent example. “If you wanted to take the entire keynote—not the short version, but the two-hour keynote—and you wanted all of it processed as one video, you would use [Gemini 1.5] Pro because it was a two-hour video. If you wanted to do something that’s super low latency…then you will use Flash because it is designed to be a faster model, more predictable latency, and is able to reason up to a million tokens.”

Context caching now for Gemini 1.5 Pro and Flash​

To help developers leverage Gemini’s different context windows, Google is launching context caching in public preview for both Gemini 1.5 Pro and Flash. Context caching allows models to store and reuse information they already have without recomputing everything from scratch whenever they receive a request. It’s helpful for long conversations or documents and lowers developers’ compute costs. Google reveals that context caching can reduce input costs by a staggering 75 percent. This feature will become more critical as context windows increase.

Provisioned throughput for Gemini​

With provisioned throughput, developers can better scale their usage of Google’s Gemini models. This feature determines how many queries or texts a model can process over time. Previously, developers were charged with a “pay-as-you-go model,” but now they have the option of provisioned throughput, which will give them better predictability and reliability when it comes to production workloads.

“Provision throughput allows us to essentially reserve inference capacity for customers,” Kurian shares. “But if they want to reserve a certain amount of capacity, for example, if they’re running a large event and they’re seeing a big ramp in users, as we’re seeing with some of our social media platform customers, that are able to reserve capacity at a time, so they don’t start seeing exceptions from a service-level point of view. And that’s a big step forward in assuring them when we take our models into general availability, or giving them an assurance on a service-level objective, both with regard to response time, as well as availability up-time.”

Provisioned throughput is generally available starting today with an allowlist.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240

Google’s Imagen 3 text-to-image foundation model comes to Vertex AI​

Ken Yeung @thekenyeung

June 27, 2024 6:00 AM

Screenshot-2024-05-14-at-11.19.00%E2%80%AFAM.png



Google’s next-generation text-to-image foundation model is coming to the company’s Vertex AI platform. Imagen 3 will be available for select customers in preview, offering developers faster image generation, better prompt understanding, a photo-realistic generation of people, and greater text rendering control within an image compared to its predecessor.

Introduced at Google I/O in May, Imagen 3 was initially available to select creators in a private preview in ImageFX. However, Google promised that the AI model would come to Vertex AI.

“It’s our most capable image generation model yet,” Douglas Eck, senior research director of Google DeepMind, said at the time. “Imagen 3 is more photorealistic, with richer details and fewer visual artifacts or distorted images. It understands prompts written the way people write—the more creative and detailed you are, the better. And Imagen 3 remembers to incorporate small details…in longer prompts. Plus, this is our best model yet for rendering text, which has been a challenge for image generation models.”

With its launch on Vertex AI, Imagen 3 comes with multi-language support, safety features such as Google DeepMind’s SynthID digital watermarking, and multiple aspect ratio support.

Stock photography provider Shutterstock is one company using this model. “Since adding Imagen to our AI image generator, our users have generated millions of pictures with the model,” Justin Hiza, the company’s vice president of data services, remarks in a statement. “We’re excited by the enhancements Imagen 3 promises as it enables our users to execute their ideas faster without sacrificing quality. As an important enhancement to Shutterstock’s launch of the first ethically-sourced AI image generator, we also appreciate how safety is built in and that the content that is created is protected under Google Cloud’s indemnification for generative AI.”

And while Google continues to innovate on Imagen, it declined to state when it would allow its Gemini AI to resume generating images following backlash over notable “inaccuracies.” When asked during a press briefing, Google Cloud Chief Executive Thomas Kurian pointed out that Imagen and Gemini are different types of models: “Gemini is a multimodal model, meaning you can give it input of many different modalities it can reason on it, and…allows you to reason across images and video and audio…This is not the same as what we do with Imagen. Imagen is a diffusion model. A diffusion model is…used to generate super high-fidelity text-to-image…Imagen is not a replacement for the image functionality in Gemini. Two different technologies for two different purposes.”

Another question posed by another journalist asking when Google would reenable Gemini’s image functionality went unanswered.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240


Google makes its Gemini chatbot faster and more widely available​


Kyle Wiggers

9:00 AM PDT • July 25, 2024

Comment

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.
Image Credits: Lorenzo Di Cola/NurPhoto / Getty Images

In its bid to maintain pace with generative AI rivals like Anthropic and OpenAI, Google is rolling out updates to the no-fee tier of Gemini, its AI-powered chatbot. The updates are focused on making the platform more performant — and more widely available.

Starting Thursday, Gemini 1.5 Flash — a lightweight multimodal model Google announced in May — will be available on the web and mobile in 40 languages and around 230 countries. Google claims that Gemini 1.5 Flash delivers upgrades in quality and latency, with especially noticeable improvements in reasoning and image understanding.

In a boon for Google, it might also be cheaper to run on the back end.

At Gemini 1.5 Flash’s unveiling, Google emphasized that the model was a “distilled” and highly efficient version of Gemini 1.5 Pro, built for what the company described as “narrow,” “high-frequency” generative AI workloads. Given the overhead of serving a chatbot platform such as Gemini (see: OpenAI’s ChatGPT bills), Google’s no doubt eager to jump on cost-reducing opportunities, particularly if those opportunities have the fortunate side effect of boosting performance in other areas.

Beyond the new base model, Google says that it’s expanding Gemini’s context window to 32,000 tokens, which amounts to roughly 24,000 words (or 48 pages of text).

Gemini 1.5 Flash
Image Credits:Google

Context, or context window, refers to the input data (e.g., text) that a model considers before generating output (e.g., additional text). A few of the advantages of models with larger contexts are that they can summarize and reason over longer text snippets and files (at least in theory), and that — in a chatbot context — they’re less likely to forget topics that were recently discussed.

The ability to upload files to Gemini for analysis previously required Gemini Advanced, the paid edition of Gemini gated behind Google’s $20-per-month Google One AI Premium Plan. But Google says that it’ll soon enable file uploads from Google Drive and local devices for all Gemini users.

“You’ll be able to do things like upload your economics study guide and ask Gemini to create practice questions,” Amar Subramanya, VP of engineering at Google, wrote in a blog post shared with TechCrunch. “Gemini will also soon be able to analyze data files for you, allowing you to uncover insights and visualize them through charts and graphics.”

To attempt to combat hallucinations — instances where a generative AI model like Gemini 1.5 Flash makes things up — Google is previewing a feature that displays links to related web content beneath certain Gemini-generated answers. English-language Gemini users in select territories will see a “chip” icon at the end of a Gemini-generated paragraph with a link to websites — or emails, if you’ve given Gemini permission to access your Gmail inbox — where you can dive deeper.

The move comes after revelations that Google’s generative AI models are prone to hallucinating quite badly — for example, suggesting nontoxic glue in a pizza recipe and inventing fake book reviews attributed to real people. Google earlier this year released a “double check” feature in Gemini designed to highlight Gemini-originated statements that other online sources corroborate or contradict. But the related content links appear to be an effort to make more transparent which sources of info Gemini might be drawing from.

The question in this reporter’s mind is how often and accurately Gemini will surface related links. TBD.

Google’s not waiting to flood the channels, though.

After debuting Gemini in Messages for select devices earlier in the year, Google is rolling the feature in the European Economic Area (EEA), U.K. and Switzerland, with the ability to chat in newly added languages such as French, Polish and Spanish. Users can pull up Gemini in Messages by tapping the “Start chat” button and selecting Gemini as a chat partner.

Google’s also launching the Gemini mobile app in more countries, and expanding Gemini access to teenagers globally.

The company introduced a teen-focused Gemini experience in June, allowing students to sign up using their school accounts — though not in all countries. In the coming week, that’ll change as Gemini becomes available to teens in every country and region that Gemini is normally available to adults.

Coinciding with the rollout, Google says that it’s putting “additional policies and safeguards” in place to protect teens — without going into detail. A new teen-tailored onboarding process is also in tow, along with an “AI literacy guide” to — as Google phrases it — “help teens use AI responsibly.”

It’s the subject of great debate whether kids are leveraging generative AI tools in the ways they were intended, or abusing them. Google is surely eager to avoid headlines suggesting Gemini is a plagiaristic essay generator or capable of giving teens poorly conceived advice on personal problems, and thus taking what steps it can to prevent the worst from occurring.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240


Free Gemini users can finally chat in a flash​


Emilia David@miyadavid

July 25, 2024 9:00 AM

Sir Demis Hassabis introduces Gemini 1.5 Flash. Image credit: Screenshot


Sir Demis Hassabis introduces Gemini 1.5 Flash. Image credit: Screenshot


Google made several updates to the free version of its Gemini chatbot, including making its low-latency multimodal model Gemini 1.5 Flash available and adding more source links to reduce hallucinations.

Gemini 1.5 Flash, previously only available to developers, is best suited for tasks requiring quick responses, such as answering customer queries. Google announced the model during its annual developer conference, Google I/O, in May but has since opened it up to the public.

The model has a large context window, referring to how much information or words it processes at a time, of around 1 million tokens. Google said Gemini 1.5 Flash on the Gemini chatbot will have a context window of 32K tokens. A large context window allows for more complex questions and longer back-and-forth conversations.

To take advantage of this, Google is updating the free version of Gemini to handle file uploads from Google Drive or devices. This has been a feature in Gemini Advanced, the paid version of the chatbot.

When it first launched, Google claimed Gemini 1.5 Flash was 40% faster than OpenAI’s fast model GPT-3.5 Turbo. Gemini 1.5 Flash is not a small model like the Gemma family of Google models; instead, it is trained with the same data as Gemini 1.5 Pro.

Gemini 1.5 Flash will be available on both mobile and desktop versions of Gemini. It can be accessed in more than 230 countries and territories and in 40 languages.


Reducing hallucinations with links​


Hallucinations continue to be a problem for AI models. Google is following the lead of other model providers and chatbots by adding related links to prompts asking for information. The idea is to show the AI models did not create the information without reference.

“Starting today for English language prompts in certain countries, you can access this additional information on topics directly within Gemini’s responses. Just click on the chip at the end of a paragraph to see websites where you can dive deeper on a certain topic,” Google said in a blog post.

The company said Gemini will add links to the relevant email if the information is in an email.

Google will also add a double-check feature that “verifies responses by using Google Search to highlight which statements are corroborated or contradicted on the web.”

Google is not the only company that adds links for attribution in line with the responses on a chatbot. ChatGPT and Perplexity regularly add citations and links to websites where they find information.

However, a report from Nieman Labs found that the chatbots hallucinated some links, in some cases attaching links to news stories that do not exist or are completely unrelated.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240

Google brings Gemini-powered search history and Lens to Chrome desktop​


Ivan Mehta

9:00 AM PDT • August 1, 2024

Comment

Chrome-AI-blog.jpg
Image Credits: Google

Google Thursday said that it is introducing new Gemini-powered features for Chrome’s desktop version, including Lens for desktop, tab compare for shopping assistance, and natural language integration for search history.

Years after introducing and evolving Google Lens on mobile, the feature is finally coming to desktop. Rolling out to users across the world in the coming days, Lens will live in the address bar, as well as the three-dot menu. After clicking, you can select a part of a page and ask more questions to get search results.



You can also tap on objects, such as someone’s backpack in a picture, and ask questions through multi-search to find a similar item in different colors or brands. Depending on the question you ask, you might also get AI Overviews in answers.

In addition to searching for shoppable items, users can also find out how much sunlight a plant needs, for example, or get help understanding a math equation.

GIF_Search_with_Lens_Recognition_Plant.gif

Image Credits:Google

Google is also introducing a new feature called Tab Compare to aid shopping. In the coming weeks, Chrome will offer an AI-powered summary of similar items you might be searching across different tabs. For instance, if you are searching for a new Bluetooth speaker, the feature will show details such as product specs, features, price and ratings in one place, even when you’re looking at these details across different pages.

A tab in Chrome comparing the price, user reviews and other specs of three different portable speakers.

Image Credits:Google

One of the most useful updates of this lot is the ability to search your browsing history through natural language queries. Sometimes you don’t remember what page you visited apart from a few details. The company is rolling out AI-powered history search in the coming weeks as an opt-in feature for U.S. users.

Shortcut for “Search History” in the Chrome address bar with the input “what was that ice cream shop I looked at last week?. The drop down results provide the URL to the correct website “Emerald City Cones”.

Image Credits:Google

An example of a natural language query is, “What was that ice cream shop I looked at last week?” Google uses a combination of URL, title, and contents of the page to show search results.

The company said that it doesn’t use this data to train Gemini and won’t surface any information from the incognito session. Google currently can’t process AI-powered search history locally, however, so it uses cloud capacity to return results.

In January, the company introduced AI-powered features such as a writing assistant, tab organizer, and theme creator. In May, it rolled out a way to mention Gemini and ask the chatbot questionsdirectly from the address bar.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240


Gemini 1.5 Flash price drop with tuning rollout complete, and more​


AUG 08, 2024

Logan Kilpatrick Senior Product Manager Gemini API and Google AI Studio

Shrestha Basu Mallick Group Product Manager Gemini API

Gemini-1p5-Flash-Pricing-HeaderV3@2x



Gemini 1.5 Flash price drop, tuning rollout complete and improvements to Gemini API and Google AI Studio​


Last week, we launched an experimental updated version of Gemini 1.5 Pro (0801) that ranked #1 on the LMSYS leaderboard for both text and multi-modal queries. We were so excited by the immediate response to this model that we raised the limits to test with it. We will have more updates soon.

Today, we’re announcing a series of improvements across AI Studio and the Gemini API:


  • Significant reduction in costs for Gemini 1.5 Flash, with input token costs decreasing by 78% and output token costs decreasing by 71%



  • Expanding the Gemini API to support queries in 100+ additional languages


  • Expanded AI Studio access for Google Workspace customers


  • Revamped documentation UI and API reference and more!

Gemini 1.5 Flash price decrease


1.5 Flash is our most popular Gemini model amongst developers who want to build high volume, low latency use cases such as summarization, categorization, multi-modal understanding and more. To make this model even more affordable, as of August 12, we’re reducing the input price by 78% to $0.075/1 million tokens and the output price by 71% to $0.3/1 million tokens for prompts under 128K tokens (cascading the reductions across the >128K tokens tier as well as caching). With these prices and tools like context caching, developers should see major cost savings when building with Gemini 1.5 Flash’s long context and multimodal capabilities.

Gemini 1.5 Flash price list effective August 8 2024

Gemini 1.5 Flash reduced prices effective August 12, 2024. See full price list at ai.google.dev/pricing.


Expanded Gemini API language availability


We’re expanding language understanding for both Gemini 1.5 Pro and Flash models to cover more than 100 languages so developers across the globe can now prompt and receive outputs in the language of their choice. This should eliminate model “language” block finish reasons via the Gemini API.



Google AI Studio access for Google Workspace


Google Workspace users can now access Google AI Studio without having to enable any additional settings by default, unlocking frictionless access for millions of users. Account admins will still have control to manage AI Studio access.



Gemini 1.5 Flash tuning rollout now complete


We have now rolled out Gemini 1.5 Flash text tuning to all developers via the Gemini API and Google AI Studio. Tuning enables developers to customize base models and improve performance for tasks by providing the model additional data. This helps reduce the context size of prompts, reduces latency and in some cases cost, while also increasing the accuracy of the model on tasks.



Improved developer documentation


Our developer documentation is core to the experience of building with the Gemini API. We recently released a series of improvements, updated the content, navigation, look and feel, and released a revamped API reference.

Updated developer documentation for the Gemini API

Improved developer documentation experience for the Gemini API on ai.google.dev/gemini-api


We have many more improvements to the documentation coming soon so please continue to send us feedback!



PDF Vision and Text understanding


The Gemini API and AI Studio now support PDF understanding through both text and vision. If your PDF includes graphs, images, or other non-text visual content, the model uses native multi-modal capabilities to process the PDF. You can try this out via Google AI Studio or in the Gemini API.



Google AI Studio improvements


Over the last few weeks, we have released many improvements to AI Studio, including overhauling keyboard shortcuts, allowing dragging and dropping images into the UI, decreased loading time by ~50%, added prompt suggestions, and much more!

Developers are at the heart of all our work on the Gemini API and Google AI Studio, so keep building and sharing your feedback with us via the Gemini API Developer Forum.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240

Google debuts Pixel Studio AI image-making app​


Pixel phones get AI image generation and editing.​


By Wes Davis, a weekend editor who covers the latest in tech and entertainment. He has written news, reviews, and more as a tech journalist since 2020.
Aug 13, 2024, 2:05 PM EDT

A hands-on photo of Google’s Pixel 9 Pro Fold.

Photo by Chris Welch / The Verge

Google announced a new image generation app during its Pixel 9 event today. The company says the app, called Pixel Studio, will come preinstalled on every Pixel 9 device.

Pixel Studio, much like Apple’s forthcoming Image Playground app that’s set to roll out on iOS 18 at some point after the operating system launches, lets you create an image from a prompt. Users can edit images after the fact, using the prompt box to add or subtract elements, and change the feel or style of the picture.

Screenshot_2024_08_13_at_12.53.17_PM.png


During the onstage demo, a picture of a bonfire gradually changed into a beach hangout invite with the Golden Gate Bridge and fireworks in the background, made in a pixel art style and complete with invite details and stickers of the presenter’s friends pasted over it. The feature is built on Google’s Imagen 3 text-to-image model.

The feature joins other AI features that Google debuted for Pixel phones, like a new Pixel Screenshots feature that acts like Microsoft’s Recall feature, except instead of taking constant screenshots, you take each one manually.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240

Google Gemini’s voice chat mode is here​


Gemini Advanced subscribers can use Gemini Live for conversational voice chat.​


By Wes Davis, a weekend editor who covers the latest in tech and entertainment. He has written news, reviews, and more as a tech journalist since 2020.
Aug 13, 2024, 1:00 PM EDT

A Gemini logo with Gemini Live screen on a phone next to it.


Gemini gets a new voice chat mode. Image: Google

Google is rolling out a new voice chat mode for Gemini, called Gemini Live, the company announced at its Pixel 9 event today. Available for Gemini Advanced subscribers, it works a lot like ChatGPT’s voice chat feature, with multiple voices to choose from and the ability to speak conversationally, even to the point of interrupting it without tapping a button.

Google says that conversations with Gemini Live can be “free-flowing,” so you can do things like interrupt an answer mid-sentence or pause the conversation and come back to it later. Gemini Live will also work in the background or when your phone is locked. Google first announced that Gemini Live was coming during its I/O developer conference earlier this year, where it also said Gemini Live would be able to interpret video in real time.

Gemini Live adds voice chatting to Google’s AI assistant.


Gemini Live adds voice chatting to Google’s AI assistant. GIF: Google

Google also has 10 new Gemini voices for users to pick from, with names like Ursa and Dipper. The feature has started rolling out today, in English only, for Android devices. The company says it will come to iOS and get more languages “in the coming weeks.”

In addition to Gemini Live, Google announced other features for its AI assistant, including new extensions coming later on, for apps like Keep, Tasks, Utilities, and YouTube Music. Gemini is also gaining awareness of the context of your screen, similar to AI features Apple announced at WWDC this year. After users tap “Ask about this screen” or “Ask about this video,” Google says Gemini can give you information, including pulling out details like destinations from travel videos to add to Google Maps.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240


Google’s AI surprise: Gemini Live speaks like a human, taking on ChatGPT Advanced Voice Mode​


Carl Franzen@carlfranzen

August 13, 2024 11:17 AM

Hand holding smartphone displaying video call with robot as other robot stands in background


Credit: VentureBeat made with ChatGPT



Google sometimes feels like it’s playing catchup in the generative AI race to rivals such as Meta, OpenAI, Anthropic and Mistral — but not anymore.

Today, the company leapfrogged most others by announcing Gemini Live, a new voice mode for its AI model Gemini through the Gemini mobile app, which allows users to speak to the model in plain, conversational language and even interrupt it and have it respond back with the AI’s own humanlike voice and cadence. Or as Google put it in a post on X: “You can now have a free-flowing conversation, and even interrupt or change topics just like you might on a regular phone call.”

We’re introducing Gemini Live, a more natural way to interact with Gemini. You can now have a free-flowing conversation, and even interrupt or change topics just like you might on a regular phone call. Available to Gemini Advanced subscribers. #MadeByGoogle pic.twitter.com/eNjlNKubsv

— Google (@Google) August 13, 2024

If that sounds familiar, it’s because OpenAI in May demoed its own “Advanced Voice Mode” for ChatGPT which it openly compared to the talking AI operating system from the movie Her, only to delay the feature and begin to roll it out only selectively to alpha participants late last month.

Gemini Live is now available in English on the Google Gemini app for Android devices through a Gemini Advanced subscription ($19.99 USD per month), with an iOS version and support for more languages to follow in the coming weeks.

In other words: even though OpenAI showed off a similar feature first, Google is set to make it more available to a much wider potential audience (more than 3 billion active users on Android and 2.2 billion iOS devices) much sooner than ChatGPT’s Advanced Voice Mode.



Yet part of the reason OpenAI may have delayed ChatGPT Advanced Voice Mode was due to its own internal “red-teaming” or controlled adversarial security testing that showed the voice mode in particular sometimes engaged in odd, disconcerting, and even potentially dangerous behavior such as mimicking the user’s own voice without consent — which could be used for fraud or malicious purposes.

How is Google addressing the potential harms caused by this type of tech? We don’t really know yet, but VentureBeat reached out to the company to ask and will update when we hear back.


What is Gemini Live good for?​


Google pitches Gemini Live as offering free-flowing, natural conversation that’s good for brainstorming ideas, preparing for important conversations, or simply chatting casually about “various topics.” Gemini Live is designed to respond and adapt in real-time.

Additionally, this feature can operate hands-free, allowing users to continue their interactions even when their device is locked or running other apps in the background.

Google further announced that the Gemini AI model is now fully integrated into the Android user experience, providing more context-aware assistance tailored to the device.

Users can access Gemini by long-pressing the power button or saying, “Hey Google.” This integration allows Gemini to interact with the content on the screen, such as providing details about a YouTube video or generating a list of restaurants from a travel vlog to add directly into Google Maps.

In a blog post, Sissie Hsiao, Vice President and General Manager of Gemini Experiences and Google Assistant, emphasized that the evolution of AI has led to a reimagining of what it means for a personal assistant to be truly helpful. With these new updates, Gemini is set to offer a more intuitive and conversational experience, making it a reliable sidekick for complex tasks.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240


Google quietly opens Imagen 3 access to all U.S. users​

Michael Nuñez@MichaelFNunez

August 15, 2024 10:42 AM

Credit: Google Imagen


Credit: Google Imagen

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Google has quietly made its latest text-to-image AI model, Imagen 3, available to all U.S. users through its ImageFX platform and published a research paper detailing the technology.

This dual release marks a significant expansion of access to the AI tool, which was initially announced in May at Google I/O and limited to select Vertex AI users in June.



1/1
Google announces Imagen 3

discuss: Paper page - Imagen 3

We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.
GU6DMxqWEAAsumc.jpg

The company’s research team stated in their paper, published on arxiv.org, “We introduce Imagen 3, a latent diffusion model that generates high-quality images from text prompts. Imagen 3 is preferred over other state-of-the-art models at the time of evaluation.”

This development comes in the same week as xAI’s launch of Grok-2, a rival AI system with notably fewer restrictions on image generation, highlighting the divergent approaches to AI ethics and content moderation within the tech industry.


Imagen 3: Google’s latest salvo in the AI arms race​


Google’s release of Imagen 3 to the broader U.S. public represents a strategic move in the intensifying AI arms race. However, the reception has been mixed. While some users praise its improved texture and word recognition capabilities, others express frustration with its strict content filters.

One user on Reddit noted, “Quality is much higher with amazing texture and word recognition, but I think it’s currently worse than Imagen 2 for me.” They added, “It’s pretty good, but I’m working harder with higher error results.”

The censorship implemented in Imagen 3 has become a focal point of criticism. Many users report that seemingly innocuous prompts are being blocked. “Way too censored I can’t even make a cyborg for crying out loud,” another Reddit user commented. Another said, “[It] denied half my inputs, and I’m not even trying to do anything crazy.”

These comments highlight the tension between Google’s efforts to ensure responsible AI use and users’ desires for creative freedom. Google has emphasized its focus on responsible AI development, stating, “We used extensive filtering and data labeling to minimize harmful content in datasets and reduced the likelihood of harmful outputs.”


Grok-2: xAI’s controversial unrestricted approach​


In stark contrast, xAI’s Grok-2, integrated within Elon Musk’s social network X and available through premium subscription tiers, offers image generation capabilities with virtually no restrictions. This has led to a flood of controversial content on the platform, including manipulated images of public figures and graphic depictions that other AI companies typically prohibit.

The divergent approaches of Google and xAI underscore the ongoing debate in the tech industry about the balance between innovation and responsibility in AI development. While Google’s cautious approach aims to prevent misuse, it has led to frustration among some users who feel creatively constrained. Conversely, xAI’s unrestricted model has reignited concerns about the potential for AI to spread misinformation and offensive content.

Industry experts are closely watching how these contrasting strategies will play out, particularly as the U.S. presidential election approaches. The lack of guardrails in Grok-2’s image generation capabilities has already raised eyebrows, with many speculating that xAI will face increasing pressure to implement restrictions.


The future of AI image generation: Balancing creativity and responsibility​


Despite the controversies, some users have found value in Google’s more restricted tool. A marketing professional on Reddit shared, “It’s so much easier to generate images via something like Adobe Firefly than digging through hundreds of pages of stock sites.”

As AI image generation technology becomes more accessible to the public, the industry faces critical questions about the role of content moderation, the balance between creativity and responsibility, and the potential impact of these tools on public discourse and information integrity.

The coming months will be crucial for both Google and xAI as they navigate user feedback, potential regulatory scrutiny, and the broader implications of their technological choices. The success or failure of their respective approaches could have far-reaching consequences for the future development and deployment of AI tools across the tech industry.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,402
Reputation
8,326
Daps
158,240

1/1
Today, we are rolling out three experimental models:

- A new smaller variant, Gemini 1.5 Flash-8B
- A stronger Gemini 1.5 Pro model (better on coding & complex prompts)
- A significantly improved Gemini 1.5 Flash model

Try them on Google AI Studio | Gemini API | Google for Developers | Google AI for Developers, details in


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






Google drops ‘stronger’ and ‘significantly improved’ experimental Gemini models​


Taryn Plumb@taryn_plumb

August 27, 2024 6:25 PM

VentureBeat/Ideogram


VentureBeat/Ideogram


Google is continuing its aggressive Gemini updates as it races towards its 2.0 model.

The company today announced a smaller variant of Gemini 1.5, Gemini 1.5 Flash-8B, alongside a “significantly improved” Gemini 1.5 Flash and a “stronger” Gemini 1.5 Pro. These show increased performance against many internal benchmarks, the company says, with “huge gains” with 1.5 Flash across the board and a 1.5 Pro that is much better at math, coding and complex prompts.

Today, we are rolling out three experimental models:

– A new smaller variant, Gemini 1.5 Flash-8B
– A stronger Gemini 1.5 Pro model (better on coding & complex prompts)
– A significantly improved Gemini 1.5 Flash model

Try them on Google AI Studio | Gemini API | Google for Developers | Google AI for Developers, details in ?

— Logan Kilpatrick (@OfficialLoganK) August 27, 2024
“Gemini 1.5 Flash is the best… in the world for developers right now,” Logan Kilpatrick, product lead for Google AI Studio, boasted in a post on X.


‘Newest experimental iteration’ of ‘unprecedented’ Gemini models​


Google introduced Gemini 1.5 Flash — the lightweight version of Gemini 1.5 — in May. The Gemini 1.5 family of models was built to handle long contexts and can reason over fine-grained information from 10M and more tokens. This allows the models to process high-volume multimodal inputs including documents, video and audio.

Today, Google is making available an “improved version” of a smaller 8 billion parameter variant of Gemini 1.5 Flash. Meanwhile, the new Gemini 1.5 Pro shows performance gains on coding and complex prompts and serves as a “drop-in replacement” to its previous model released in early August.

Kilpatrick was light on additional details, saying that Google will make a future version available for production use in the coming weeks that “hopefully will come with evals!”

He explained in an X thread that the experimental models are a means to gather feedback and get the latest, ongoing updates into the hands of developers as quickly as possible. “What we learn from experimental launches informs how we release models more widely,” he posted.

The “newest experimental iteration” of both Gemini 1.5 Flash and Pro feature 1 million token limits and are available to test for free via Google AI Studio and Gemini API, and also soon through the Vertex AI experimental endpoint. There is a free tier for both and the company will make available a future version for production use in coming weeks, according to Kilpatrick.

Beginning Sept. 3, Google will automatically reroute requests to the new model and will remove the older model from Google AI Studio and the API to “avoid confusion with keeping too many versions live at the same time,” said Kilpatrick.

“We are excited to see what you think and to hear how this model might unlock even more new multimodal use cases,” he posted on X.

Google DeepMind researchers call Gemini 1.5’s scale “unprecedented” among contemporary LLMs.

“We have been blown away by the excitement for our initial experimental model we released earlier this month,” Kilpatrick posted on X. “There has been lots of hard work behind the scenes at Google to bring these models to the world, we can’t wait to see what you build!”


‘Solid improvements,’ still suffers from ‘lazy coding disease’​


Just a few hours after the release today, the Large Model Systems Organization (LMSO) posted a leaderboard update to its chatbot arena based on 20,000 community votes. Gemini 1.5-Flash made a “huge leap,” climbing from 23rd to sixth place, matching Llama levels and outperforming Google’s Gemma open models.

Gemini 1.5-Pro also showed “strong gains” in coding and math and “improve[d] significantly.”

The LMSO lauded the models, posting: “Big congrats to Google DeepMind Gemini team on the incredible launch!”

Chatbot Arena update⚡!

The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes!

Highlights:
– New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall!
– New Gemini-1.5-Pro (0827) shows strong gains in coding, math over… x.com pic.twitter.com/D3XpU0Xiw2

— lmsys.org (@lmsysorg) August 27, 2024

As per usual with iterative model releases, early feedback has been all over the place — from sycophantic praise to mockery and confusion.

Some X users questioned why so many back-to-back updates versus a 2.0 version. One posted: “Dude this isn’t going to cut it anymore :| we need Gemini 2.0, a real upgrade.”

2.0-tweet.png


On the other hand, many self-described fanboys lauded the fast upgrades and quick shipping, reporting “solid improvements” in image analysis. “The speed is fire,” one posted, and another pointed out that Google continues to ship while OpenAI has effectively been quiet. One went so far as to say that “the Google team is silently, diligently and constantly delivering.”

Some critics, though, call it “terrible,” and “lazy” with tasks requiring longer outputs, saying Google is “far behind” Claude, OpenAI and Anthropic.

The update “sadly suffers from the lazy coding disease” similar to GPT-4 Turbo, one X user lamented.

Lazy-coding.png


Another called the updated version “definitely not that good” and said it “often goes crazy and starts repeating stuff non-stop like small models tend to do.” Another agreed that they were excited to try it but that Gemini has “been by far the worst at coding.”

Not-that-good.png


Some also poked fun at Google’s uninspired naming capabilities and called back to its huge woke blunder earlier this year.

“You guys have completely lost the ability to name things,” one user joked, and another agreed, “You guys seriously need someone to help you with nomenclature.”

And, one dryly asked: “Does Gemini 1.5 still hate white people?”

Hate-white-people.png
 
Top