I found a new way to break it
It did figure out Bub was going wrong here (Bubs b2wf journal - #40 by bub).
PROMPT:
A forum member wrote this: “”" [… entire post above …] “”"
Is he understanding actualism and actual freedom properly? What would you say to him?
RESPONSE:
Not too bad!
Had to take the app down for the time being due to costs, I might end up implementing something where it goes through 9$ each month with the gpt-4o, then switches to the gpt 3.5 version for the last 1$, but presently that is all I can really afford.
If anyone wants a private version set up with their own openai api key and stuff I can do that too.
What about using the new freely available models?
Can try at https://groq.com/
The models are fully open source
Groq seems cool, it will definitely be a larger project to get it working well though, since the openai assistants api method handles the document chunking, vector embedding, query rephrasal, hybrid similarity search, reranking, and response generation all out of the box… I think if I went with groq it would be a good way to handle the response generation part but I’d need to put together the storage, indexing, and retrieval parts on my own. I have done this once before, but it didn’t quite work as well as the super simple and effective assistants api, but probably with enough customized efforts it could. The reason I have been using the assistants api for everything is that by handling all those parts on its own, i can focus efforts on better curating and cleaning the knowledge base. Plus, whenever the assistants api ‘v3’ comes out, it would probably make obsolete whatever I put together on a more custom level.
I think to use groq for this, I would need to use the mixtral 8x7b model also to have a big enough context window for handling a bunch of relevant actualism webpages simultaneously like the current chatbot has been doing, and its performance is probably similar to chat gpt 3.5 turbo. My general opinion right now is that open source is probably only worthwhile if dealing with extremely sensitive data, or if you have a bigger team of people to optimize every step.
All in all I think more experimentation with openai assistant powered by gpt 3.5 might be more fruitful. Possibly just a better system prompt, or a two step ‘agentic’ approach where gpt 3.5 finds the right documents, and then separate gpt 3.5 call pulls the direct quotes from the cited documents.
All that said, it’s probably a slightly biased opinion that comes from me mostly having experience doing things via the openai api.
edit: Another possibility would be starting to save all conversations and ‘cache’ answers. basically if the same question has asked twice (or extremely similar) just serve the user the previously written answer to that question. For this though I would probably include an opt-in checkbox to save answers for product improvement.
I’m willing to pay for this to try it out for the private version
I’d be happy to help set up a private one for you (or anyone interested) there will be a few steps that you will have to do on your side so that you maintain control of your own paid account, as I don’t want to be getting paid directly for this. But basically I can help you set up the app and then turn everything back over to your control once its working. If you are willing to potentially put in a couple hours to get everything working (after which it should just be a matter of topping off your openai credit whenever it runs low) then shoot me a dm and I’ll help you through it all.
I am also making some progress on the ‘answer caching’ approach, which might be the best way to proceed with lowering costs for a communal chatbot. if everyone is fine with chatbot conversations being saved (no identifiable info, just prompts and responses in a database), we’d gradually develop a huge database of FAQs and whenever someone asks a similar question they get the cached response, which would have minimal costs associated.
I’d make the ‘similarity threshold’ very high so that it only returns a cached response for a question that is near-identical to one previously asked. With this approach I could also eventually add a thumbs up/down to cached responses and if they get maybe 20%+ down they could get manually reviewed.
Maybe the query the cached response was based on could be included so the user could evaluate the relevance
yes good idea!
Here is where I am at for a overall vision for making the communal chatbot powerful and affordable, I’m interested in feedback.
Implement the caching system and run the bot using gpt-4o for maybe the first 20$ of usage each month, once it uses that amount it switches automatically to gpt 3.5. It will still serve cached gpt-4o responses to previously asked questions, but it will never save a gpt 3.5 response since they are pretty mediocre.
Over time we will gradually build up that massive data base of Q&As. If the ‘caching similarity threshold’ is quite high it will save every little variation, ‘what is actualism’ and ‘what is the actualism method’ are presently classed as two different questions for instance. The costs should be under 25$ a month even if usage gets very high with this approach, and the performance will gradually improve.
It will also be interesting to eventually see what it is like adding the cached responses back into the knowledge base, as it might allow for higher level concepts and connections drawn by the chatbot… because part of its knowledge base will be these higher level concepts and summaries drawn out of conversations with people.
What purpose will the bot-generated FAQ serve on top of the human-generated (& fully-free-person curated) FAQ on the AFT site already?
For me its utility is pulling direct quotes from AFT with links. It is much more effective than google search because it is faster, and since I can skim excerpts, I am able to more accurately get to the content I need on the AFT.
I am not really looking for an FAQ or chat GPT’s understanding of actualism tbh.
The gpt-4o version of the bot is, in my opinion, mostly doing a good job of synthesizing the writings of those same fully-free-person-curated answers to any question I can think of. The gpt-3.5 version would be easily affordable but doesn’t really do a good enough job (partly the context window is just far smaller so it is typically only dealing with 3-5 excerpts at a time).
The caching (creation of a FAQ) is just a way to save resources if two people happen to ask the same (or nearly the same) question. So basically the user experience will be a seamless question and answer experience drawing from the AFT writings, whether it is something someone has asked before or not. I foresee a much larger set of saved Q&As than is presently available, because it will gradually include anything anyone ever thinks to ask, it won’t just be a frequently asked questions (FAQ) it will be more of a infrequently asked questions (IAQ) ;).
I feel like the key difference is a seamless user experience where when you ask a question, if it has been asked before in nearly the same way, you get a cached answer, if it hasn’t then you get an original attempt to synthesize existing resources.
I think the caching approach saves costs without diminishing user experience. So if you are at all happy with the quality of the gpt-4 level responses and citations then I can’t see a reason to not use the caching approach. Also with the rating system it will be possible to implement some automatic answer improvement processes, such that the cached answers may gradually become better than originals.
Does that make sense? Maybe I am being crazy and just really want my two great passions to intersect successfully: chatgpt and actualism haha.
I’ve noticed that it tends to ‘reset’ and I lose an answer after 20 minutes or so, is it possible to extend that time? It takes me awhile to puzzle through a complex result
Edit: for now I can work around by sending results to a PDF
That might have just been me making some changes and rebooting the app, does it happen often? Does it give you like a loading screen and say ‘your app is in the oven’
I didn’t get a message like that, I was just switching between tabs and when I came back to it the conversation was wiped. It has happened a few times
Gotcha, probably it is unavoidable due to using streamlit sharing which is a free hosting service, i could probably provide the option to pick up where you left a conversation off though, if you save a ‘thread id’ and re enter it when coming back. i might add that feature if i have time
I like the idea of just typing any question and getting a good response
The main thing is not to build a library of responses that aren’t so good . Maybe the cached answers can be curated and if they’re not then add a disclaimer?
Also how does it detect similar questions such as to give the cached answer?
Also one thing that could be useful is if it could be trained to take someone through the process of tracing back when they last felt good and seeing the trigger
The main thing is not to build a library of responses that aren’t so good . Maybe the cached answers can be curated and if they’re not then add a disclaimer?
So right now the cached answers can be rated by users, if there is a bad answer it will eventually get into the negatives. The app is live now with the caching and rating features active. So a low/high rating can speak to a ‘crowdsourced’ opinion of the chatbot answers. I might try and just using a script that replaces any answers below -3 maybe with a new answer, I have a feeling if the chatbot is given the old response and told it is a bad one, it will do better on the second try. If not manual review perhaps
Also how does it detect similar questions such as to give the cached answer?
I am using ‘dense vectors’ to represent each question as a distinct numerical representation. When a question is asked, it gets stored as a dense vector and is compared in terms of cosine similarity to previously answered questions. If it is sufficiently similar a cached answer is given. Using pinecone API to store and index these embeddings.
I agree, i might experiment with just giving the same chatbot a different set of instructions and see if it does a good job.
The chatbot has been taken down at the request of Vineeto FYI.
When I have some time I am going to develop a AFT searching app that operates on the same basic principles as the chatbot (semantic search), but without adding any chatgpt reasoning or input, which will need to be OK’d by Vineeto before it can be released.