<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Generative AI on FastDataScience.eu</title>
    <link>https://fastdatascience.eu/tags/generative-ai/</link>
    <description>FastDataScience.eu (Generative AI)</description>
    <generator>Hugo -- gohugo.io</generator>
    <copyright>en-us</copyright>
    <lastBuildDate>Fri, 23 Aug 2024 00:00:00 +0000</lastBuildDate>
    
    <atom:link href="https://fastdatascience.eu/tags/generative-ai/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Local LLMs are getting easier</title>
      <link>https://fastdatascience.eu/post/2024-08-20-local_llms/</link>
      <pubDate>Fri, 23 Aug 2024 00:00:00 +0000</pubDate>
      <guid>https://fastdatascience.eu/post/2024-08-20-local_llms/</guid>
      <description>&lt;p&gt;There is increasing interest in using smaller large language models (LLMs),
hosted locally instead accessed from cloud-based vendors such as OpenAI.
My clients have been interested in these either from a cost point of view,
or for data protection reasons (since no data goes to OpenAI or other vendors).&lt;/p&gt;
&lt;p&gt;Although this has been done for a while from Python using (mainly) the excellent
&lt;a href=&#34;https://huggingface.co&#34;&gt;Hugging Face&lt;/a&gt;, new options have come available that
makes this easier and more flexible, especially from other languages such as Go
and Rust.  Here are observations and tips on a few alternatives that I&amp;rsquo;ve been
trying.&lt;/p&gt;
&lt;h2 id=&#34;ollama&#34; &gt;Ollama
&lt;span&gt;
    &lt;a href=&#34;#ollama&#34;&gt;
        &lt;svg viewBox=&#34;0 0 28 23&#34; height=&#34;100%&#34; width=&#34;19&#34; xmlns=&#34;http://www.w3.org/2000/svg&#34;&gt;&lt;path d=&#34;M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;path d=&#34;M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;/svg&gt;
    &lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;My favourite has been &lt;a href=&#34;https://ollama.com&#34;&gt;Ollama&lt;/a&gt;, a very clean and easy
to use open-source tool (written in Go!) that downloads a select number of
LLMs, then runs them, making available an input line for executing prompts,
as well as exposing an API that is similar to the one we are used to from
OpenAI.&lt;/p&gt;
&lt;p&gt;The tool can be downloaded from the &lt;a href=&#34;https://ollama.com/download&#34;&gt;web site&lt;/a&gt;,
and is easy to install on Mac OS and Linux. Then, just start it by typing
&lt;code&gt;ollama serve&lt;/code&gt; in a separate window.&lt;/p&gt;
&lt;p&gt;You first need to download one of the 50 or so supported models, listed
&lt;a href=&#34;https://ollama.com/library&#34;&gt;here&lt;/a&gt;. These include Llama 3.1 and 3.0, several
variants of Mistral, phi, and others. For example:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ollama pull llama3.1&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Type &lt;code&gt;ollama list&lt;/code&gt; to see a list of models that have been downloaded (there
does not seem to be a command to list all available models, see the above
linked web page for that).&lt;/p&gt;
&lt;p&gt;As soon as a model is downloaded, run it with &lt;code&gt;ollama run llama3.1&lt;/code&gt; and it will
start up, with an input prompt that allows you to enter prompts.  Type &lt;code&gt;ollama info&lt;/code&gt; to show information about the model, such as the number of parameters and
context length.&lt;/p&gt;
&lt;p&gt;It also exposes an OpenAI-compatible API on port 11434, with endpoints
&lt;code&gt;generate&lt;/code&gt; and &lt;code&gt;chat&lt;/code&gt;, making this an easy option for calling the LLM for any
language using a REST API call. For example, from Go:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;package main

import (
	&amp;#34;encoding/json&amp;#34;
	&amp;#34;fmt&amp;#34;
	&amp;#34;io/ioutil&amp;#34;
	&amp;#34;net/http&amp;#34;
	&amp;#34;strings&amp;#34;
)

// Structure for a generate request
type Generate struct {
	Model  string `json:&amp;#34;model&amp;#34;`
	Prompt string `json:&amp;#34;prompt&amp;#34;`
}

// Structure of one token returned per line
type Token struct {
	Model     string `json:&amp;#34;model&amp;#34;`
	CreatedAt string `json:&amp;#34;created_at&amp;#34;`
	Response  string `json:&amp;#34;response&amp;#34;`
	Done      bool   `json:&amp;#34;done&amp;#34;`
}

func main() {

	// Parameters for the query
	prompt := &amp;#34;What is time?&amp;#34;
	model := &amp;#34;llama3&amp;#34;
	url := &amp;#34;http://localhost:11434/api/generate&amp;#34;

	// Formulate a request to generate response to prompt, as string
	msg := Generate{model, prompt}
	b, err := json.Marshal(msg)
	if err != nil {
		fmt.Println(err.Error())
		return
	}

	// Needs to be an io.Reader for the Post request
	data := strings.NewReader(string(b))

	// Make a POST request to the API
	response, err := http.Post(url, &amp;#34;application/json&amp;#34;, data)
	if err != nil {
		fmt.Println(err.Error())
		return
	}

	// Retrieve response
	responseData, err := ioutil.ReadAll(response.Body)
	if err != nil {
		fmt.Println(err.Error())
		return
	}

	// Extract and show individual tokens, one per line
	lines := strings.Split(string(responseData), &amp;#34;\n&amp;#34;)
	tokens := []string{}
	for _, l := range lines {

		// End of input
		if len(l) == 0 {
			break
		}

		// Parse JSON
		tkn := Token{}
		err := json.Unmarshal([]byte(l), &amp;amp;tkn)
		if err != nil {
			fmt.Println(err.Error())
			return
		}

		// Stop after final token
		if tkn.Done {
			break
		}

		// Add to list of tokens
		tokens = append(tokens, tkn.Response)
	}

	// Show result
	fmt.Println(tokens)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Running this returns the following rather fanciful answer:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;What a profound and complex question!

Time is a fundamental concept in our experience of the universe, but its nature has puzzled philosophers, scientists, and scholars for centuries. Here&amp;#39;s an attempt to provide a comprehensive answer:

**Philosophical Perspectives:**

1. **Classical view:** Time is an objective feature of the universe, a linear progression from past to present to future. This perspective assumes time is a fixed, one-way dimension that governs our experience.
2. **Relational view:** Time is relative and dependent on the observer&amp;#39;s frame of reference. According to this perspective, time is not absolute, but rather an emergent property of physical laws and human perception.
3. **Subjective view:** Time is a subjective experience, shaped by individual experiences, emotions, and memories.

**Scientific Theories:**

1. **Newtonian Mechanics:** Time is an absolute concept, independent of the observer&amp;#39;s frame of reference. According to Newton&amp;#39;s laws, time flows at the same rate everywhere in the universe.
2. **Relativity Theory (Einstein):** Time and space are intertwined as a single entity called spacetime. The laws of physics are the same for all observers, regardless of their relative motion or position. Time can appear to pass differently depending on the observer&amp;#39;s frame of reference (e.g., time dilation).
3. **Quantum Mechanics:** Time is not a fixed dimension but rather an emergent property of quantum systems. Quantum phenomena, such as wave functions and entanglement, operate outside our classical understanding of time.

**Paradigm Shifts:**

1. **Kaluza-Klein Theory:** Higher-dimensional theories propose that our universe has more than the four dimensions we experience (three spatial and one temporal). This could imply that time is not an absolute concept.
2. **Fractal Time:** The idea that time is fractal, with repeating patterns at different scales, challenges our classical understanding of time as a linear progression.

**Philosophical Implications:**

1. **Free will vs. determinism:** If time is relative or subjective, does this imply free will, or are events predetermined?
2. **The nature of change:** Does the relativity of time imply that change is an illusion, or that reality is fundamentally dynamic?

**Conclusion:**

Time is a multifaceted concept that has been debated and explored across various disciplines. While our understanding of time has evolved significantly, the fundamental nature of time remains a subject of ongoing research and philosophical inquiry.

What&amp;#39;s your take on time? Do you have any questions or perspectives to share?
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Ollama also has options for importing GGUF files, creating models with a
built-in system prompt, and more. See the
&lt;a href=&#34;https://github.com/ollama/ollama&#34;&gt;GitHub page&lt;/a&gt; for more.&lt;/p&gt;
&lt;h2 id=&#34;llamafile&#34; &gt;LlamaFile
&lt;span&gt;
    &lt;a href=&#34;#llamafile&#34;&gt;
        &lt;svg viewBox=&#34;0 0 28 23&#34; height=&#34;100%&#34; width=&#34;19&#34; xmlns=&#34;http://www.w3.org/2000/svg&#34;&gt;&lt;path d=&#34;M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;path d=&#34;M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;/svg&gt;
    &lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Another good alternative is
&lt;a href=&#34;https://github.com/Mozilla-Ocho/llamafile&#34;&gt;LlamaFile&lt;/a&gt;, which provides an
executable that contains the model inside it. To run this model, just download
one of the models from the web site, make it executable, and run it directly.&lt;/p&gt;
&lt;p&gt;This option exposes a web interface for exploring chats (at
http://localhost:8080), as well as an API compatible with the OpenAI one.&lt;/p&gt;
&lt;p&gt;This is an attractive way to explore local LLMs, but I have since found Ollama
easier to use and it offers a broader range of models.&lt;/p&gt;
&lt;h2 id=&#34;llamacpp&#34; &gt;LlamaCPP
&lt;span&gt;
    &lt;a href=&#34;#llamacpp&#34;&gt;
        &lt;svg viewBox=&#34;0 0 28 23&#34; height=&#34;100%&#34; width=&#34;19&#34; xmlns=&#34;http://www.w3.org/2000/svg&#34;&gt;&lt;path d=&#34;M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;path d=&#34;M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;/svg&gt;
    &lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Most of the adaptions described above are derived from
&lt;a href=&#34;https://github.com/ggerganov/llama.cpp&#34;&gt;Llama.cpp&lt;/a&gt;, and amazing C++
program that loads and runs Llama and some other transformer models inside a
single program. It exposes both a web interface and an API. A large selections of models have been ported to this option.&lt;/p&gt;
&lt;p&gt;It is more fiddly than Ollama, because it requires you to separately obtain the
LLM in GGUF format, and specify this on the command line when running it. Most
GGUF models are available on Hugging Face, but it&amp;rsquo;s still an extra step with
some hassle.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll create a separate post describing the running of LlamaCPP here, but you
can probably figure out from from the GitHub page linked above.&lt;/p&gt;
&lt;h2 id=&#34;candle&#34; &gt;Candle
&lt;span&gt;
    &lt;a href=&#34;#candle&#34;&gt;
        &lt;svg viewBox=&#34;0 0 28 23&#34; height=&#34;100%&#34; width=&#34;19&#34; xmlns=&#34;http://www.w3.org/2000/svg&#34;&gt;&lt;path d=&#34;M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;path d=&#34;M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;/svg&gt;
    &lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Another interesting and ambitious option implemented in Rust is
&lt;a href=&#34;https://huggingface.github.io/candle/&#34;&gt;Candle&lt;/a&gt;, which supports about 20
models, and due to its support by Hugging Face is well documented and
supported.&lt;/p&gt;
&lt;p&gt;I plan on creating a separate post for this as I explore it further.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
