{"id":137,"date":"2023-10-11T08:52:10","date_gmt":"2023-10-11T13:52:10","guid":{"rendered":"https:\/\/blog.jdkendall.com\/?p=137"},"modified":"2023-10-11T09:20:10","modified_gmt":"2023-10-11T14:20:10","slug":"bytes-to-bites","status":"publish","type":"post","link":"https:\/\/blog.jdkendall.com\/?p=137","title":{"rendered":"Bytes to Bites"},"content":{"rendered":"\n<p>How do we get from a user&#8217;s message to an informed response from Chom? I&#8217;ve been chewing on this one for a while and I think I have an idea for a first draft. So let&#8217;s go ahead and just hit you with the sequence diagram right away:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><a href=\"https:\/\/blog.jdkendall.com\/wp-content\/uploads\/2023\/10\/User-Chat-Message-Interaction-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"927\" src=\"https:\/\/blog.jdkendall.com\/wp-content\/uploads\/2023\/10\/User-Chat-Message-Interaction-1-1024x927.png\" alt=\"\" class=\"wp-image-139\" srcset=\"https:\/\/blog.jdkendall.com\/wp-content\/uploads\/2023\/10\/User-Chat-Message-Interaction-1-1024x927.png 1024w, https:\/\/blog.jdkendall.com\/wp-content\/uploads\/2023\/10\/User-Chat-Message-Interaction-1-300x272.png 300w, https:\/\/blog.jdkendall.com\/wp-content\/uploads\/2023\/10\/User-Chat-Message-Interaction-1-768x695.png 768w, https:\/\/blog.jdkendall.com\/wp-content\/uploads\/2023\/10\/User-Chat-Message-Interaction-1-850x770.png 850w, https:\/\/blog.jdkendall.com\/wp-content\/uploads\/2023\/10\/User-Chat-Message-Interaction-1.png 1281w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>(Go ahead and click to get a closer look.)<\/p>\n\n\n\n<p>So, a quick overview. <\/p>\n\n\n\n<p>Chom runs requests through multiple AI models (known as LLMs) to perform a handful of different natural language processing tasks, before preparing a response based on a database of recipes, ingredients, and allergens.<\/p>\n\n\n\n<p>When a user interacts with Chom, the system initiates by fetching past chat context for continuity purposes. A steering model then checks the new message&#8217;s request for appropriateness &#8211; are we on topic for a culinary discussion, are we being asked for any illicit activities, and so on. (Anything failing the sniff test will be rejected with a polite response.)<\/p>\n\n\n\n<p>Once we&#8217;re sure we&#8217;re kosher (heh), then a command parser discerns the user&#8217;s intent and relevant entities involved in the request. The example in the sequence diagram is extracting information about substituting allergens out in favor of other ingredients, but other actions the model might identify could be altering recipes, adding\/removing recipes from the list, removing already-on-hand groceries and so on. The AI will be picking from a predefined list of actions, so &#8220;launch the nukes&#8221; won&#8217;t be on the docket. &#8230;Probably.<\/p>\n\n\n\n<p>Once we know what action we&#8217;re taking, we can consult our embeddings database for relevant Real World Data\u2122. This contains info about recipes, ingredients, and so on that lets us turn a wide array of vocabulary into specific and uniform concepts (&#8220;peanut allergy&#8221; -&gt; Tree Nuts Allergy) that the system can work with. <\/p>\n\n\n\n<p>In our example here, the system consults the embeddings database to comprehend related allergen groups and updates the user preferences database accordingly. Saving allergens on hand ensures that even if the user forgets to mention this in the future, we&#8217;ll be accommodating their needs. Allergens will also be able to be toggled in the user&#8217;s profile, along with other dietary restrictions (vegan, vegetarian, carnivore, a diet of pure antimatter&#8230; the usual.)<\/p>\n\n\n\n<p>Chom then crafts a suitable response, keeping prior interactions in mind via the chat context, and lastly logs the entire exchange in the chat history database to keep a record of the user&#8217;s conversation on hand so the user can review their chat history with Chom in case they want to look back at some earlier recommendations or retrieve grocery lists at a later date, etc.<\/p>\n\n\n\n<p>For purposes of the alpha, Chom is powered by ChatGPT for all of its LLM usage. The per-request API costs are much lower than hosting models myself on a GPU-powered cloud instance or server blade. However, Chom is designed to swap in the latest LLM tech from its local machine, allowing the use of self-hosted models such as Vicuna instead. This will be very important for scaling up later, and keeps us from being tied to a specific provider like OpenAI.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What the heck&#8217;s a steering model, anyway?<\/h2>\n\n\n\n<p>The steering model is a sophisticated mechanism designed to influence or &#8220;steer&#8221; the responses of a larger foundational language model (like ChatGPT or Vicuna). In its training phase, it absorbs insights from a mix of human feedback and reinforcement learning. This equips the model with the capability to predict how best to guide the base model to produce the desired outputs.<\/p>\n\n\n\n<p>Once this training is complete, the steering model seamlessly integrates with the base language model. When they work in tandem, any incoming query is first shaped by the steering model, providing the foundational model with additional context or guidance. This ensures the final response aligns with the intended behaviors.<\/p>\n\n\n\n<p>In our case, we&#8217;re using this for filtering off inappropriate discussions, but it could easily be extended to catch vague statements and short-circuit with clarifying questions before passing the request forward to the rest of the system. We can also specialize our steering model by training it specifically against additional data from future user interactions (opt-in only, of course) so that it better recognizes those weird edge cases that leave people scratching their heads at the generated response.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Embeddings? Like, putting a thing inside another thing?<\/h2>\n\n\n\n<p>Yeah, I don&#8217;t know. There&#8217;s a good reason for it in the literature but I&#8217;m still wrapping my head around it. The way I think of it is as semantic grouping &#8211; also a jargon-y term, but broken apart, &#8220;semantic&#8221; is another word for &#8220;what does it mean&#8221;, and grouping is&#8230; well, you group things together. So embeddings are a way of grouping things together based on their meaning. <\/p>\n\n\n\n<p>For example, &#8220;cat&#8221; and &#8220;tiger&#8221; would be grouped closer together than &#8220;cat&#8221; and &#8220;dog&#8221; on account of both being felines. Likewise, &#8220;cat&#8221; and &#8220;dog&#8221; would be grouped closer together than &#8220;cat&#8221; and &#8220;cow&#8221; on account of both being pets.<\/p>\n\n\n\n<p>The technical details are gory and filled with math, but the idea is that these LLM models are made up of an incredibly complex network of semantic groupings based on all of the training data. You can imagine that it&#8217;s like if you wrote down every meaning you can think of for a concept like &#8220;cat&#8221; onto individual index cards labeled &#8220;Cat&#8221;, then did it again for another one like &#8220;dog&#8221;, and then another for &#8220;cow&#8221;, etc&#8230; then spaced each set of concepts out in your room connected by strings to each other where the length of the string is how close or far apart the ideas are to each other. Now do it in 768 dimensions instead of just three. (And that&#8217;s just for BERT models. Some models use even more. It&#8217;s crazy.)<\/p>\n\n\n\n<p>Thanks to math we can wrangle all of that and end up with a way to take a word or phrase, turn it into those crazy strings, and then follow the strings to one of our embedding database&#8217;s known words.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Well, what next?<\/h2>\n\n\n\n<p>As mentioned in my first Chom post, I&#8217;m working through steering and command parsing right now. The current iteration of Chom is still a wild west of &#8220;making stuff up&#8221; at times, and I&#8217;m ironing that out. It&#8217;s largely accurate in its responses to the user, but where that data ends up in the data format it&#8217;s supposed to output is&#8230; imaginative, and it will happily talk about any topic after enough probing to get it to break away from its culinary assistant prompting.<\/p>\n\n\n\n<p>I&#8217;m also thinking about how I can visualize the processing from the API calls so I can share that on a blog post as well. It would be very interesting to see what Chom&#8217;s different brains are thinking at each stage and to talk about how I adjust those. We&#8217;ll see what I come up with.<\/p>\n\n\n\n<p>Alright, that&#8217;s all for now. Ctrl-S, Ctrl-Q.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How do we get from a user&#8217;s message to an informed response from Chom? I&#8217;ve been chewing on this one for a while and I think I have an idea for a first draft. So let&#8217;s go ahead and just hit you with the sequence diagram right away: (Go ahead and click to get a&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-137","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=\/wp\/v2\/posts\/137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=137"}],"version-history":[{"count":10,"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=\/wp\/v2\/posts\/137\/revisions"}],"predecessor-version":[{"id":152,"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=\/wp\/v2\/posts\/137\/revisions\/152"}],"wp:attachment":[{"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=137"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=137"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.jdkendall.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}