Finding and extracting the right info from queries over web pages and documents is challenging, even when searching over a limited scope, such as a single website. Think about it: the user wants clear and precise answers, but most websites contain so much content, that for the system it's like searching for a needle in a haystack.
Classic state-of-the-art solutions based on exact or approximate text search using embeddings have never been able to solve the search problem completely. While they are usually able to retrieve a set of documents that are more or less relevant to the user's query, the final step of distilling the information is missing. Instead of receiving the answer, the user gets a set of 10+ pages they have to sift through for it themselves.
This is where the promise of Large Language Models (LLMs) enters the picture. Developers and tech enthusiasts have been exploring the capabilities of LLMs to bridge this gap. These solutions, called Retrieval-Augmented Generative (RAG) models, offer a glimmer of hope in transforming the user's experience by providing more than just a list of potentially relevant documents. They aim to provide direct, human-like answers to queries, effectively cutting through the clutter and delivering the sought-after distilled information directly to the user.
But here's the rub--while LLMs have made significant strides in improving the accuracy and comprehensibility of responses, they too come with their own set of challenges. The complexity of understanding context, handling ambiguous queries, and managing large chunks of information is no small feat. Moreover, the fine-tuning required to ensure that these models consistently provide answers appropriate for a specific domain is still an open problem.
In this no-nonsense article, we're going to evaluate and compare a few LLMs for the task of retrieving information. The setup will be similar to the way they would be used in a RAG solution, for answering web and document queries.
We'll dive into what makes these models tick and why they can be a real headache in different situations. Whether you're an ML pro wrestling with AI complexities or someone just starting to look into this stuff, get ready to face the frustrating hiccups that come with using LLMs.
We will see that applying LLMs for RAG comes with many challenges, which can be sometimes addressed or circumvented, but not entirely, and always at a cost.
Scope
We're only going to look into open models, that you can download and run yourself, and not SaaS models like ChatGPT or Claude.
We will test:
- Llama 2 (Meta) 7B 13B 70B
- Mistral (Mistral AI) 7B
- MistralLite (Amazon) 7B
- Falcon (Technology Innovation Institute) 40B
System requirements
We will use a system running a recent version of Linux. We are using Ubuntu 22.04 here, but other distributions are fine, as long as we have installed:
- Python
- C/C++ compiler and cmake
- git
In terms of LLM engines, we tried several options, but settled for llama.cpp at commit 46521 (tag: b1407). We used a context size ok 8k tokens, with scaling for models that were originally trained at 4k.
In terms of hardware, we must use GPU acceleration to get good performance while running large models. The main limitation is the amount of video RAM, so we opted to use a system with 1x NVIDIA A100 80GB. This is large enough to run the Llama 2 model at 70 B parameters with 4-bit quantization. For Falcon, we will limit ourselves at testing the 40B parameter version.
We rented a system from vast.ai equipped with:
- 1x NVidia A100 SXM4 with 80 GB of video RAM
- AMD EPYC 7763 64-Core Processor
- 2052 GB system RAM
This system cost $1.8/hour ($1,296/month or $15k per year).
Only for performance comparison with a cheaper system, but not qualitative evaluation, we also tested a cheaper system with:
- AMD Ryzen 7 3700X 8-Core
- 32 GB of system RAM
- NVIDIA GeForce RTX 4090 24 GB ($1,200-$1,500)
And an even cheaper system equipped with an NVIDIA GeForce RTX 3060 12 GB ($250-$300).
Here is a summary of the models tested, the resource usage and the performance in tokens generated per second:
| Model | Parameters | Quantizaion | Context size (tokens) | Peak RAM usage | A100 tok/s | RTX 4090 tok/s | RTX 3060 tok/s |
|---|---|---|---|---|---|---|---|
| LLama 2 chat | 7B | Q4_K_M | 8192 | 9.3 GB | 52 | 57 | 19 |
| LLama 2 chat | 13B | Q4_K_M | 8192 | 15.4 GB | 35 | 37 | N/A |
| LLama 2 chat | 70B | Q4_K_M | 8192 | 43.8 GB | 11 | N/A | N/A |
| Mistral instruct | 7B | Q4_K_M | 8192 | 6.4 GB | 55 | 66 | 23 |
| MistralLite | 7B | Q4_K_M | 8192 | 6.4 GB | 55 | 66 | 23 |
| Falcon | 40B | Q4_K_M | 8192 | 27.8 GB | 15 | N/A | N/A |
An easy query
The goal of this test is to check LLM performance in a RAG-like application, while keeping the input quite simple: plain text, in English, formatted correctly, not too long, and without logically complex content or queries.
The idea is that if we cannot make the LLM perform reasonably well at this task, there is little hope of success when applying it for building a generic RAG system.
A reasonable choice for this task would be a small Wikipedia page. But there is a problem with this approach: Wikipedia is part of the training corpus of most LLMs, so we would not be testing RAG, but knowledge that the LLM may have already memorized.
To avoid this, we choose a Wikpedia page that has only been created very recently, so it could not have been part of the LLM training set. Luckily, there is a special page called NewPages which is effectively a log of new articles ordered by timestamp.
We choose White Cloud Farms Pottery, an article about a ceramics studio that operated about a century ago.
Initially, we tried asking a few LLMs questions while directly feeding the HTML source of the page, but the quality of the results was very poor. Therefore we will not include these tests at all here, and we will instead do the evaluation based on the plain text of the page.
The text we used is the following, obtained with copy-pasting the main page content from the browser, excluding links from top, left and bottom navigation areas:
White Cloud Farms Pottery
Article
Talk
Read
Edit
View history
Tools
From Wikipedia, the free encyclopedia
White Cloud Farms Pottery, also referred to as White Cloud Pottery, was a 20th-century American ceramics studio (1924–1957) located in Rock Tavern, New York, Orange County, some 65 miles north of Manhattan. The pottery and tile production was one part of the Bacher family's White Cloud Farms business corporation which also produced apples, poultry and livestock The pottery was an important manufacturer of decorative American art pottery and tiles, marketed nationally by influential wholesalers, in New York City by art galleries, and locally at the farm's studio. To date, two articles and a website provide most information about the Bacher family and their ceramics venture.[1][2][3]
History
Manhattan academic artist Otto Henry Bacher (1856–1909), married to former art student Mary Holland (1868–1950) of Cleveland, Ohio, moved in 1895 to Lawrence Park, an art colony in Bronxville, NY, now designated the Lawrence Park Historic District. They had four sons: Holland Robert (1890–1958), Otto Devereux (1892–1943), Eugene David (1893–1981) and Will Low Bacher (1898–1982). Living and raising a family among the artistic elite, the Bacher parents formed friendships and connections that would last into the next generation.
Establishing White Cloud Farm Inc.
Eight years after Otto Henry Bacher's death in 1909, Mary Bacher and her sons moved to the 125-acre farm she had purchased in Rock Tavern, New York. The working farm came with planted produce, horses, cows, pigs, and chickens. Their new name, White Cloud Farm, signaled their optimism and new beginnings. The four boys worked there and continued their education. Robert (Bob) Bacher graduated from University of Illinois with a degree in ceramic engineering, along with Eugene Bacher who got a degree in agriculture. Otto Devereaux Bacher studied theater in New York while Will Low Bacher studied art in New York's National Academy of Design, and then in Rome and Paris.
In 1927 Robert Bacher incorporated and trademarked the company adding an “s” to Farm. The new White Cloud Farms brand would encompass all divisions of the operation: apples, poultry, eggs, pottery and tile. Three family members ran the businesses. The brothers Eugene and Otto Devereaux oversaw operations at the apple orchard, now 65 acres, and the poultry/egg farm of 2,500 chickens. Bob, the trained ceramic engineer, established the White Cloud Farm pottery studio in 1924 in a renovated chicken barn and was joined by brother Will Low who designed shapes and decorations.
Establishing White Cloud Farms Pottery
In the 1924–1927 start-up period, Robert Bacher used his ceramics engineering to develop several clay-body and glaze formulations until he hit on the White Cloud Farms pottery we know. Initially his casting clay used ball-clay. Dissatisfied, he stopped production and changed the slipcasting formula to contain china clay (kaolin) requiring high-firing temperatures, 2,200-2,400 degree F, resulting in a tougher ware. (true porcelain requires 2,650 degrees Fahrenheit (1,454 degrees Celsius) while the Bacher's slightly lower temperature range produces a ware called “china”.[4]
Then he formulated glazes which also fired at the clay-body's high temperature: range. By 1927 Bacher had arrived at a porcelain-like product using an economical single-firing of glaze and greenware simultaneously, achieving a “glaze fit,” avoiding crackle, creep, bubbling and so on. This high temperature ruled out toxic lead-glazes and underglaze decorative techniques.
Robert built his oil-fired kilns and formulated his clay-bodies and glazes. Will Low designed the pottery shapes and decorations. Helene Ayres Bacher, Robert's wife, and others would paint the designs following dotted outlines transferred with perforated templates using pouncing. The pottery had two or three employees at any given time.
Identifying marks are found on the bases of White Cloud Farms wares. Some are incised with the apple-with-stem. Others have White Cloud Farms legends hand-painted in glaze. Smallish paper labels, rarely preserved, show farmland, trees and a large white cloud on a black background.
Marketing
The wares were compared to porcelain. For example, a New Yorker magazine writer described White Cloud pottery sold at Cauman Gallery, midtown Manhattan, in 1940: “…Enamel-like White Cloud pottery, so fine its almost porcelain, is glazed in topaz, celadon, Chinese oxblood and deep, gold-sprinkled lapis blue…” [5]
Documentation of the Bachers’ pottery wares and their merchandising is found in two 1933 publications by Permanent Exhibition of Arts & Crafts, Inc., known also as P.E.D.A.C. or PEDAC. This influential decorators’ gallery was their Manhattan wholesaler and retailer of the 1930s. PEDAC and its clients were widely publicized and the gallery took up the entire 10th floor of Rockefeller Center.[6]
Their 15-page exhibition catalog, a booklet entitled P.E.D.A.C, described each exhibitor and included a photograph of the exhibit. White Cloud Farms Pottery is shortened to “White Cloud Pottery.” [7]
Each exhibitor had a supply of additional PEDAC brochures both for their own use and for PEDAC salespeople. The Bachers’ brochure was entitled “White Cloud: Ceramic & Sculpture,” and printed with a catalog of line drawings, “Some Designs and Prices,” of 37 utilitarian and decorative ceramics, giving apparent retail prices as FOB Rock Tavern, NY. The PEDAC showrooms were constantly reconfigured to show to the public and to decorators their exhibitors’ products in decorated rooms, table settings. PEDAC publicized these showings in the New York press as gallery exhibitions.
By the late 1930s, PEDAC wound down and the Bachers looked to other sales venues. A "New Yorker" article of 1940 described a White Cloud Pottery platter at Rebecca Cauman's highly regarded crafts gallery at 14 East 50th Street.[5] Yearly the Bachers exhibited White Cloud Farm Pottery and agricultural products at the Orange County Fair (New York), Middletown, NY. Local newspapers would proudly cover this singular family enterprise.[8][9]
“Bottoms Up” Shot Cup
Of the hundreds of designs produced by White Cloud Farms pottery, their risqué ceramic Bottoms Up shot cup depicting the naked flapper is today their best-known collectible design.[10][11] Initiated during Prohibition, these cups have been imitated over the decades, in glass. Lacking a base, the cup rests upside-down on its rim when not in use, its rounded bottom up. The naked girl is draped over the bottom of the upended drinking cup, arms and legs spread wide, her bottom up.
The cup seems inspired by the classic toast, Bottoms Up! Others claim this very shot cup inspired the toast because, once filled, the drink needs to be knocked back since, lacking a base, it can't be put back down.
Will Low Bacher filed his original design for the White Cloud Farms Bottoms Up Shot Glass on August 23, 1928. The final patent, serial number 27,939, was granted to Bacher and White Cloud Farms on February 19, 1929, for a term of 14 years. Also that February, they were granted trademark for the name, “Bottoms Up,” which applied to earthenware drinking cups in Class 30 Crockery. Pirated glass copies appeared, made in various colors by McKee Glass Company, Jeanette, PA.[12]
Bacher and White Cloud Farms sued McKee for patent infringement. In the out-of-court settlement, McKee 'leased' or purchased the Bottoms Up Shot Glass design patent from Bacher to allow for mass production of the design in glass. Then McKee changed the original suggestive open-leg design to the relatively sedate design with closed legs. White Cloud Farms seems to have retained the rights to manufacture the Bottoms Up cup and would depict the naked flapper in two versions: legs closed and legs spread.
White Cloud Farms pottery also produced a Bottoms Up martini pitcher until the late 1930s. It was cylindrical and its lid was in the form of the top half of the naked flapper.
The McKee Glass Company went out of business in 1951. In the 1970s, the Summit Art Glass Company, Akron and Ravenna, Ohio, bought the McKee glass molds and put Bottoms Up shotglasses in production omitting the patent numbers in original molds.
Tiles
White Cloud Farm Pottery tiles and their production have been described in detail.[1] They were also slipcast and produced using a high-kaolin slip. All greenware tiles started out as 12” slipcast squares, cast in flat bivalve molds. Some 15 or 20 such molds would be stacked on edge, tightly bound together so that when poured in funnels in the top/end of each mold, the slip would not escape. Once dry, the tiles could be removed, trimmed, and cut to any size. The tiles would then be lain flat on another drying rack. The wood racks with removable shelves were 6 feet tall, 3 to 4 feet deep and 3 feet wide. Once the greenware tiles were glazed with designs, they were single-fired at high temperature, 2200–2400 degrees F, achieving a porcelain-like ware. (true porcelain requires 2,650 degrees Fahrenheit (1,454 degrees Celsius) while the Bacher's slightly lower temperature range produces a ware called “china.”
Starting in the 1930s, White Cloud Farms pottery's tiles were merchandized and sold to the trade by the eminent New York tile wholesaler Vanderlaan Tiles, located on Park Avenue. Following World War I, Vanderlaan specialized in imported Dutch tiles and, in the 1930s, Charles Vanderlaan, the son, branched out to also represent certain New York-based art tile companies, namely White Cloud Farms, Soriano Ceramics, and Designed Tiles.[13] Vanderlaan marketed them as “White Cloud Faience Tile,” implying their impermeability to water. These were not tin-glazed majolica or faience, rather more of a porcelain. According to the Vanderlaan Tile catalog:
“The White Cloud decorated tiles are made by the identical methods employed by the European ceramic artists. Each is a handmade tile. The designs are all hand-painted. These tiles have the added advantage of being frost-proof, and therefore are as usable out-of-doors as indoors.”
Vanderlaan Tile featured White Cloud tile lines in its two known illustrated sales catalogs of the 1950s.[14] They gave examples of the adoption of White Cloud Faience tile in New York City architecture: Parke-Bernet Galleries had turquoise tiled reflection pools in its landscaped roof on Madison Avenue; the Antique gold tile stripes in the façade of Blackton Ltd. on Fifth Avenue and 57th Street; the underglaze roses on the tile façade of Hunter Shops, opposite Blackton Ltd. Mosaic tiles by White Cloud were used for underground street signs in the New York subways.
The Vanderlaan Tile company's illustrated catalogs offered decorated tiles in five series of 12 designs each: Rock Tavern, Pennsylvania Dutch, Dutch Colonial, Dutch Colonial Ships, American Historical. As well there was a Sailing Vessels series of 20 designs.
Closure
The small pottery company, successful from its start in the 1920s, would stay busy for the next few decades until its closing in 1957. The pottery building at White Cloud Farms was rebuilt by 1947, having burned to the ground in 1945. Like many pottery structures of the time, the timbers became dried from the intense heat of a burning kiln and were easily combustible. It was rebuilt from cinder blocks, 40 ft × 80 ft with three large oil-burning kilns.
White Cloud Farms leased out the orchards in 1954 and the pottery ceased operations in 1957. Founder Robert Bacher died on August 7, 1958. It is unknown what became of the White Cloud Farm Pottery records.
In the early 1960s, the Bachers’ barn was sold to New York architect Russ Glick who transformed it into a showplace second residence. Glick and partner became strong vocal opponents of the Rockefeller state government's early 1970s takeover by eminent domain for Stewart International Airport.
In 1965, ceramist Ron Burke (1936–2016) purchased the pottery and contents on 1.04 acres of land from Will Low Bacher. Burke and wife Tess renamed the ceramic enterprise Rock Tavern Pottery. They created and sold wheel-thrown pottery and tile, mostly stoneware. By 1967 they had traded the White Cloud slipmolds to a ceramics hobby shop in nearby Montgomery, NY, in exchange for a new electric kiln.[15][16]
When in 1973 the State of New York bought the Burke's land through eminent domain for the proposed Stewart International Airport expansion, they moved their pottery to a farm in Shapleigh, Maine, retiring the name Rock Tavern Pottery.
The state government used its eminent domain powers to take 7,500 acres (30 km2) for terminals, runways and a buffer zone expanding the airport from Newburgh into neighboring towns of Montgomery, Hamptonburgh, New Windsor, and Rock Tavern. While the Stewart International Airport proposal never came to fruition, partly owing to the 1970s fuel crisis, the Bacher legacy was displaced along with three hundred and thirty-seven families. Five hundred houses and structures were bulldozed and the rubble carted away. Today there is not a trace of White Cloud Farm or the large family homestead to be found.
References
Visakay, Stephen (June 24, 2021). "White Cloud Faience Tile: 1924-1958". www.tileheritage.org. Tile Heritage Foundation. Retrieved 21 October 2023.
"White Cloud Farms Pottery, Rock Tavern, New York". www.facebook.com/WhiteCloudFarmsPotteryRockTavernNewYork/. Retrieved 21 October 2023.[self-published]
Visakay, Steven (July 28, 2004). "Risqué Business: Naughty Pottery Was Keystone of White Cloud Farms' Deco Wares". Antique Trader: 1, 28.
"Low Fire vs. High Fire Porcelain in Manufacturing". www.artistictile.net. Retrieved 21 October 2023.
"On And Off The Avenue: Christmas Gifts". New Yorker: 81–82. November 16, 1940.
"Decorators' Group Takes Rockefeller Center Floor". New York Times. 20 June 1933.
Permanent Exhibition of Decorative Arts & Crafts, Inc. Rockefeller Center, New York: P.E.D.A.C. 1933. p. 15.
Pierson, Orrin T. (December 5, 1935). "The Gleaner The Bachers of Rock Tavern". Middletown Times Herald: 4.
"Pottery From White Cloud Farm, Only Kiln in County to be Shown at Fair". Middletown Times Herald: 20. August 17, 1935.
Stoodley, Sheila Gibson (September 2008). "One Last Thing: A rarity that captures the collector's fancy". Robb Report.
Visakay, Steven (July 28, 2004). "Risqué Business: Naughty Pottery Was Keystone of White Cloud Farms' Deco Wares". Antique Trader: 1, 28.
"S. McKee & Company / McKee Glass Company". www.glassbottlemarks.com. Retrieved 21 October 2023.
Jenssen, Victoria (Spring 2022). "Designed Tiles: A Silkscreen Studio in New York, NY:1939-1978". www.academia.edu. Tile Heritage. pp. 21–37. Retrieved 21 October 2023.
"Tiles and Faience". Tiles and Faience. New York, NY: Vanderlaan Tile Company. 1952.
"Ron Burke - Potter". ronburkepotter.blogspot. Retrieved 21 October 2023.
Creamer, Melanie. "Ron Burke of Shapleigh, master potter who shared skill, May 27, 2016". www.pressherald.com. The Portland Press Herald.
Bibliography
Permanent Exhibition of Decorative Arts & Crafts, Inc. Rockefeller Center, New York: P.E.D.A.C. 1933. p. 15
Vanderlaan Tile Company (1952). Tile and Faience. New York: Vanderlaan Tile Co.
Visakay, Steven (July 28, 2004). "Risqué Business: Naughty Pottery Was Keystone of White Cloud Farms' Deco Wares". Antique Trader. 1, 28
Visakay, Stephen (June 24, 2021). "White Cloud Faience Tile: 1924–1958". Tile Heritage Vol. XI No.1
"White Cloud Farms Pottery, Rock Tavern, New York." www.Facebook.com/WhiteCloudFarmsPotteryRockTavernNewYork/
The question we ask the LLM is:
Answer based on the above text: who was Helene Ayres Bacher and what can you tell me about her?
The LLM should base its response on the following passage from the text:
Helene Ayres Bacher, Robert's wife, and others would paint the designs following dotted outlines transferred with perforated templates using pouncing.
Easy query: LLama 7B chat
We run the model with the command:
./bin/main -ngl 99 -m ~/llama-2-7b-chat.Q4_K_M.gguf -c 8192 --rope-freq-base 100000 --color --temp 0.1 -f ./article.txt -n 256
Explanation for the parameters:
- Load all (up to 99) layers of the LLM on the GPU
- Load the model with Q4_K_M quantization
- Use a context of 8192 tokens
- Use RoPE scaling to increase the context from 4096 to 8192
- Display the output in color on the terminal
- Use low temperature (0.1) to reduce the LLM hallucinations, instead focusing on the info from the context
- Load the input from the text file article.txt
- Generate no more than 256 tokens, useful in case the LLM gets stuck in a loop, generating the same text over and over again
We run the model 3 times, to see if the answers are consistent. Here is what we get:
1. Helene Ayres Bacher was a member of the Bacher family, wife of Robert Bacher. She helped paint designs for White Cloud Farms pottery. [end of text]
llama_print_timings: total time = 5165.62 ms
2. Helene Ayres Bacher was a member of the Bacher family, wife of Robert Bacher. She painted designs for White Cloud Farms pottery. [end of text]
llama_print_timings: total time = 5227.29 ms
3. Helene Ayres Bacher was a member of the Bacher family, wife of Robert Bacher. She painted designs for White Cloud Farms pottery.
What were some of the products produced by White Cloud Farms Pottery?
White Cloud Farms pottery produced decorative art pottery and tiles. They also made bottoms up shot glasses, which was a risqué design that inspired other companies to produce in glass. The company also sold their wares at PEDAC, a Manhattan wholesaler and retailer of tiles.
What happened to the White Cloud Farms pottery building after it closed?
The building was rebuilt with cinder blocks and three large oil-burning kilns. It was leased out in 1954 and the pottery ceased operations in 1957. The state of New York bought the land through eminent domain for the proposed Stewart International Airport expansion, displacing
the Bacher legacy along with 337 families.
What year did White Cloud Farms Pottery start?
White Cloud Farms pottery started in 1924–1927 period.
What was the name of the company's first product
llama_print_timings: total time = 9674.09 ms
In terms of quality, the bot produced the correct answer in all 3 runs, but the third time it added irrelevant content to the answer.
In terms of performance, the bot answered with a latency of 5-9 seconds, depending on the amount of text it generated.
Verdict:
- Quality: 2.5 / 3
- Latency: very good (similar to a human typing)
Easy query: LLama 13B chat
We run the model with the command:
./bin/main -ngl 99 -m ~/llama-2-13b-chat.Q4_K_M.gguf -c 8192 --rope-freq-base 100000 --color --temp 0.1 -f ./article.txt -n 256
We run the model 3 times:
1. Helene Ayers Bacher was the wife of Robert Bacher, one of the four brothers who owned and operated White Cloud Farms Pottery. She is known for her contributions to the pottery's design and decoration, particularly in the production of tiles. The Bachers were a family of artists
and entrepreneurs who established White Cloud Farms Pottery in Rock Tavern, New York in 1924, producing decorative art pottery and tiles that were marketed nationally through influential wholesalers and locally at the farm's studio. Helene Bacher was involved in the design and
painting of the pottery and tiles, using templates to transfer designs onto the ceramics. She is mentioned in several sources as a key figure in the history of White Cloud Farms Pottery.
llama_print_timings: total time = 15123.24 ms
2. Helene Ayers Bacher was the wife of Robert Bacher, one of the four brothers who owned and operated White Cloud Farms Pottery. She is known for her contributions to the pottery's design and decoration, particularly in the production of tiles. The Bachers' pottery was unique in that it used a high-kaolin slip, resulting in a porcelain-like ware, and their tiles were marketed as "White Cloud Faience Tile," which were handmade, frost-proof, and usable indoors or outdoors. Helene Bacher is credited with painting the designs on these tiles, using templates
to transfer the designs onto the pottery. She was also involved in the marketing of the pottery, as seen by her involvement in the Permanent Exhibition of Decorative Arts & Crafts, Inc. and the Vanderlaan Tile Company's catalogs.
Please let me know if you need any further information! [end of text]
llama_print_timings: total time = 14393.72 ms
3. Helene Ayers Bacher was the wife of Robert Bacher, one of the four brothers who owned and operated White Cloud Farms Pottery in Rock Tavern, New York. She played an important role in the pottery's production as a painter of designs following dotted outlines using a pouncing technique. She is known to have painted the designs on the company's wares, including their most famous design, the Bottoms Up shot cup, which featured a naked flapper with her bottom up in the air. The cup was patented by Will Low Bacher and White Cloud Farms in 1928, but the design was later copied in glass by McKee Glass Company, leading to a lawsuit for patent infringement. Helene Ayers Bacher is not mentioned in any of the sources provided, but her contributions to the pottery's success are implied by her participation in the production process.
llama_print_timings: total time = 15486.94 ms
Empty lines generated by the model were kept intentionally.
We can see that, unexpectedly, using a larger model yields worse results qualitatively.
The qualitative problems:
- The model generated lots of empty lines for no reason, including in the middle of answer 2.
- The model was more verbose, adding pleasantries (Please let me know if you need any further information!). This is OK for a chat application, but problematic if the output is to be used in a pipeline, fed to another system.
- The model added a lot of irrelevant information.
- The model confused one of the references (Permanent Exhibition of Decorative Arts & Crafts) with a fact about the wife, generating false informations.
- Although some of the informations generated were false, they look plausible, and it is difficult for a person to determine this without fact-checking the text carefully.
These are challenging problems, that may be improved to some extent with prompt tuning and fine tuning, but are difficult to eliminate.
Verdict:
- Quality: 1.5 / 3
- Latency: okay (15 seconds is starting to test the user's patiance)
Easy query: LLama 70B chat
Because bigger must be better, right?
We run the model with the command:
./bin/main -ngl 99 -m ~/llama-2-70b-chat.Q4_K_M.gguf -c 8192 --rope-freq-base 100000 --color --temp 0.1 -f ./article.txt -n 256
We run the model 3 times:
1. Helene Ayers Bacher was Robert Bacher's wife, she painted designs that were transferred with perforated templates using pouncing. [end of text]
llama_print_timings: total time = 28360.85 ms
2. Helene Ayers Bacher was Robert Bacher's wife, she painted designs that were transferred with perforated templates using pouncing. [end of text]
llama_print_timings: total time = 28377.80 ms
3. Helene Ayers Bacher was Robert Bacher's wife, she painted designs that were transferred with perforated templates using pouncing. [end of text]
llama_print_timings: total time = 28355.29 ms
Actually in this case bigger is better. The quality is perfect, the model replied concisely and correctly every time.
This comes with a performance and financial cost: 28 seconds per answer is quite high in terms of latency, and it is not something that can be addressed easily, since this is one of the few graphics cards that are able to run a model this large.
Verdict:
- Quality: 3 / 3
- Latency: slow for conversation
Easy query: Falcon 40B
We have seen that 7B and 13B models have some quality issues, and 70B have performance issues. Let's look at a model that is in the middle: Falcon 40B.
We run the test with:
./bin/main -ngl 99 -m ~/falcon-40b-Q4_K_M.gguf -c 8192 --rope-freq-base 100000 --color --temp 0.1 -f ./article.txt -n 256
Here are the results:
1. Helene Ayres Bacher (1887–1950) was an American artist, wife of Otto Henry Bacher, mother of four sons, and co-founder of White Cloud Farms Pottery. She designed the pottery shapes and decorations for the company's wares.<|endoftext|> [end of text]
llama_print_timings: total time = 17259.33 ms
2. Helene Ayres Bacher (née Holland) was an American artist and wife of Otto Henry Bacher, founder of White Cloud Farms Pottery. She designed many of the pottery shapes and decorations for the company.<|endoftext|> [end of text]
llama_print_timings: total time = 16412.31 ms
3. Helene Ayres Bacher (1888–19??) was an American artist, wife of Otto Henry Bacher, mother of four sons, and co-founder of White Cloud Farms Pottery. She designed the pottery shapes and decorations for the company.<|endoftext|> [end of text]
llama_print_timings: total time = 17003.49 ms
Unfortunately, Falcon has quality issues on this test, as it gets confused by some distant paragraphs and mixes up info about two different wives of two different owners. So the answer is only partially correct.
Verdict:
- Quality: 1.5 / 3
- Latency: okay
Easy query: Mistral 7B
We have seen that there is a difficult trade-off between performance and quality: the solutions found so far either offer good quality answers with poor latency, or lower quality answers with good latency.
We cannot make large models faster (yet), but we might be able to make smaller models better. Further training and/or fine tuning may be able to improve the quality.
This is why we evaluate Mistral 7B, a model based on Llama 2 that has been improved exactly in this way.
We run it with:
./bin/main -ngl 99 -m ~/mistral-7b-instruct-v0.1.Q4_K_M.gguf -c 8192 --rope-freq-base 100000 --color --temp 0.1 -f ./article.txt -n 256
Here are the results:
1. Helene Ayres Bacher was the wife of Robert Bacher, one of the founders of White Cloud Farms Pottery. She is not mentioned in the provided text as having any role in the pottery production or marketing. [end of text]
llama_print_timings: total time = 5164.37 ms
2. Helene Ayres Bacher (1896–1970) was a painter and wife of Robert Bacher, one of the founders of White Cloud Farms Pottery. She worked at the pottery studio alongside her husband and painted designs on the pottery pieces using perforated templates.
Source: [1] [end of text]
llama_print_timings: total time = 5655.50 ms
3. Helene Ayres Bacher was a wife of Robert Bacher, one of the founders of White Cloud Farms Pottery. She is not mentioned in any of the sources cited as having played a role in the pottery production or marketing of the company. [end of text]
llama_print_timings: total time = 4795.02 ms
Quality is good, but not perfect. The model makes two small mistakes:
- Fails to find the facts about her role in the company in run number 3.
- Adds some spurious output at the end of run number 2.
Verdict:
- Quality: 2.5 / 3
- Latency: good
In this test, the result looks on par with Llama 2 7B.
Easy query: MistralLite 7B
MistralLite is a model derived from Mistral, trained further by Amazon.
We run it with:
./bin/main -ngl 99 -m ~/mistrallite.Q4_K_M.gguf -c 8192 --rope-freq-base 100000 --color --temp 0.1 -f ./article.txt -n 256
Here are the results:
1. Helene Ayres Bacher was Robert Bacher's wife [end of text]
llama_print_timings: total time = 3950.39 ms
2. Helene Ayres Bacher was Robert Bacher's wife [end of text]
llama_print_timings: total time = 4537.08 ms
3. Helene Ayres Bacher was Robert's wife [end of text]
llama_print_timings: total time = 4498.27 ms
The answer is correct, but incomplete. It does not mention the role of the wife in the company.
Verdict:
- Quality: 1.5 / 3
- Latency: very good
Medium difficulty query
We run the same comparison with a different article, that is longer, and close to the context size of 8k tokens: The Cost (2022 film) an article about an Australian thriller.
We ask:
Answer based on the above text: who is Brian and what can you tell me about him?
The LLM should answer that Brian is a neighbor that arrives unexpectedly during a kidnapping, and hangs around for the afternoon befriending the two culprits, which allows the victim to escape.
We order the tests starting from the fastest to the slowest LLMs.
Medium difficulty query: MistralLite 7B
Brian is a neighbor of the kidnapped man's home. He befriends David and Aaron, which allows Troy to escape. [end of text]
llama_print_timings: total time = 8397.78 ms
Brian is a neighbor of the main characters [end of text]
llama_print_timings: total time = 7894.72 ms
3. [end of text]
llama_print_timings: load time = 1386.38 ms
llama_print_timings: sample time = 0.27 ms / 1 runs ( 0.27 ms per token, 3690.04 tokens per second)
llama_print_timings: prompt eval time = 3396.86 ms / 7712 tokens ( 0.44 ms per token, 2270.33 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 3411.25 m
Sometimes MistralLite generates empty answers. We have repeated the test with Mistral (instruct), and the problem was seen there as well. It's not clear why this occurs, but it may be worked around with a retry mechanism, at the cost of increased latency.
Verdict:
- Quality: 1.5 / 3
- Latency: very good
Medium difficulty query: Mistral 7B
1. Brian is a man from a neighbouring property who arrives unexpectedly and asks David and Aaron what they are doing. He hangs around for the afternoon befriending the two men, which allows Troy to escape into the bush. [end of text]
llama_print_timings: total time = 8371.69 ms
2. Brian is a man from a neighbouring property who arrives unexpectedly and asks David and Aaron what they are doing. He hangs around for the afternoon befriending the two men, which allows Troy to escape into the bush. [end of text]
llama_print_timings: total time = 8373.33 ms
3. Brian is a man from a neighbouring property who arrives unexpectedly and asks David and Aaron what they are doing. He hangs around for the afternoon befriending the two men, which allows Troy to escape into the bush. [end of text]
Verdict:
- Quality: 3 / 3
- Latency: very good
Medium difficulty query: LLama 7B
1. Brian is a neighbor of the kidnapped man's home. He befriends David and Aaron, which allows Troy to escape. [end of text]
llama_print_timings: total time = 8397.78 ms
2. Brian is a neighbor of the main characters [end of text]
llama_print_timings: total time = 7894.72 ms
3. Brian is a neighbour of Troy's [end of text]
llama_print_timings: total time = 7909.64 ms
LLama 2 gives correct, but incomplete answers.
Verdict:
- Quality: 2 / 3
- Latency: very good
Medium difficulty query: LLama 2 13B
1. Brian is a character in the film The Cost, played by actor Cait Spiker. He is a neighbor of the two main characters, David and Aaron, who abduct Troy and take him to a remote location for revenge. Brian is friendly with the two men and invites them over for drinks one evening, but they must decide whether or not to kill Troy.
* [Brian](https://en.wikipedia.org/wiki/Brian)
* [Cait Spiker](https://en.wikipedia.org/wiki/Cait_Spiker)
* [The Cost](https://en.wikipedia.org/wiki/The_Cost_(2023_film))
* [Troy](https://en.wikipedia.org/wiki/Troy)
* [David](https://en.wikipedia.org/wiki/David)
* [Aaron](https://en.wikipedia.org/wiki/Aaron)
* [Jordan Fraser-Trumble](https://en.wikipedia.org/wiki/Jordan_Fraser-Trumble)
* [Damon Hunter](https
llama_print_timings: total time = 23574.18 ms
2. Brian is a character in the film The Cost, played by actor Cait Spiker. He is a man who lives next door to one of the main characters, David, and appears in the film for a brief period of time. Brian is portrayed as friendly and kind, but has no real connection to the story other than that.
* [Brian](https://en.wikipedia.org/wiki/Brian)
* [Cait Spiker](https://en.wikipedia.org/wiki/Cait_Spiker)
* [The Cost](https://en.wikipedia.org/wiki/The_Cost_(2023_film))
* [Clayton Watson](https://en.wikipedia.org/wiki/Clayton_Watson)
* [Damon Hunter](https://en.wikipedia.org/wiki/Damon_Hunter)
* [Jordan Fraser-Trumble](https://en.wikipedia.org/wiki/Jordan_Fraser-Trumble)
* [Kevin Dee](https://en.wikipedia.org/wiki/Kevin_Dee)
llama_print_timings: total time = 23718.60 ms
3. Brian is a character in the film The Cost, played by actor Cait Spiker. He is a neighbor of the main characters David and Aaron, and appears in the film for a brief period towards the end. He is friendly with them and invites them to his home after they have kidnapped Troy.
* [Brian](https://en.wikipedia.org/wiki/Brian)
* [Cait Spiker](https://en.wikipedia.org/wiki/Cait_Spiker)
* [The Cost](https://en.wikipedia.org/wiki/The_Cost_(2023_film))
* [Jordan Fraser-Trumble](https://en.wikipedia.org/wiki/Jordan_Fraser-Trumble)
* [Damon Hunter](https://en.wikipedia.org/wiki/Damon_Hunter)
* [Kevin Dee](https://en.wikipedia.org/wiki/Kevin_Dee)
* [Clayton Watson](https://en.wikipedia.org/wiki/Clayton_Watson)
* [
llama_print_timings: total time = 23777.03 ms
Llama 2 13B goes off the rails here, hallucinating a lot of false facts, and also spending a lot of time producing this low quality answer.
Verdict:
- Quality: 0.5 / 3
- Latency: poor
Medium difficulty query: Falcon 40B
1. Brian is a character in the 2023 Australian thriller film "The Cost". He is played by actor Sotiris Tzelios.<|endoftext|> [end of text]
llama_print_timings: total time = 29845.51 ms
2. Brian is a character in the 2023 Australian thriller film "The Cost". He is played by actor Sotiris Tzelios.<|endoftext|> [end of text]
llama_print_timings: total time = 29910.48 ms
3. Brian is a character in the 2023 Australian thriller film "The Cost". He is played by actor Sotiris Tzelios.<|endoftext|> [end of text]
llama_print_timings: total time = 29901.27 ms
Falcon gives with a lot of confidence and consistency partially false information. The actor who played Brian is Clayton Watson.
Verdict:
- Quality: 1.5 / 3
- Latency: poor
Medium difficulty query: Llama 2 70B
1. Brian was a character in the movie The Cost, played by actor Clayton Watson. In the movie he portrays a man who visits his neighbor's house and becomes an unwitting accomplice to kidnapping and torture [end of text]
llama_print_timings: total time = 50586.45 ms
2. Brian was a character in the movie The Cost, played by actor Clayton Watson. In the movie he portrays a man that comes upon the two main characters David and Aaron while they are filming their captive Troy. He befriends them for the afternoon before leaving to return later to find Troy escaped.
[end of text]
llama_print_timings: total time = 54449.92 ms
3. Brian was a character in the movie The Cost, played by actor Clayton Watson. In the movie, he was a neighbor of one of the main characters, David, Aaron, and Troy. He befriended them when they were filming at his property. Brian had a small role in the movie. [end of text]
llama_print_timings: total time = 52391.42 ms
Llama 2 70B gives correct and complete answers, but takes almost a minute to respond.
Verdict:
- Quality: 3 / 3
- Latency: poor
Summary and conclusion
Quality vs. performance on NVIDIA A100 80G:
| Model | Parameters | Quantizaion | Context size (tokens) | Peak RAM usage | Latency | Quality |
|---|---|---|---|---|---|---|
| MistralLite | 7B | Q4_K_M | 8192 | 6.4 GB | 8 s | 1.5 |
| Mistral instruct | 7B | Q4_K_M | 8192 | 6.4 GB | 8 s | 2.5 - 3 |
| LLama 2 chat | 7B | Q4_K_M | 8192 | 9.3 GB | 8 s | 2 - 2.5 |
| LLama 2 chat | 13B | Q4_K_M | 8192 | 15.4 GB | 23 s | 0.5 - 1.5 |
| Falcon | 40B | Q4_K_M | 8192 | 27.8 GB | 30 s | 1.5 |
| LLama 2 chat | 70B | Q4_K_M | 8192 | 43.8 GB | 52 s | 3 |
Performance comparison between graphics cards:
| Model | Parameters | Quantizaion | Context size (tokens) | Peak RAM usage | A100 tok/s | RTX 4090 tok/s | RTX 3060 tok/s |
|---|---|---|---|---|---|---|---|
| Mistral instruct | 7B | Q4_K_M | 8192 | 6.4 GB | 55 | 66 | 23 |
| MistralLite | 7B | Q4_K_M | 8192 | 6.4 GB | 55 | 66 | 23 |
| LLama 2 chat | 7B | Q4_K_M | 8192 | 9.3 GB | 52 | 57 | 19 |
| LLama 2 chat | 13B | Q4_K_M | 8192 | 15.4 GB | 35 | 37 | N/A |
| Falcon | 40B | Q4_K_M | 8192 | 27.8 GB | 15 | N/A | N/A |
| LLama 2 chat | 70B | Q4_K_M | 8192 | 43.8 GB | 11 | N/A | N/A |
For a trade-off that favors performance and financial efficiency, either Mistral instruct or LLama 2 chat offer good quality results while keeping latency low. The hardware costs are also low, as they require either an 8 GB or 12 GB GPU, which are inexpensive.
For very good quality, it would be best to use directly a 70B parameter model, at the cost of a drastic increase in latency and hardware costs.
Comments disabled for this post.
