Typefully

GPT-4 Code Interpreter: Running Vector DB for the first time

Avatar

Share

ย โ€ขย 

2 years ago

ย โ€ขย 

View on X

๐Ÿ”ฅ GPT-4 #CodeinterpreterCan run a Vector DB! for the first time! See the end result video, and then read the process thread below ๐Ÿ‘‡ twitter.com/i/status/1678976497544212480/video/1 We pondered if it's possible w/ @simonw @swyx and @nisten and it is indeed! @trychroma is dope ๐Ÿ”ฅ For process read ๐Ÿงต๐Ÿ‘‡
From here on own, the thread is my running contemporaneous notes: So far the limitations are, it cannot install packages it doesn't have, and doesn't have access to the web, so here's my game plan, will use @typefully as my running notes, let's go ๐Ÿ‘‡ twitter.com/jeffreyhuber/status/1678590954880782339
Figured I'd try and use @trychroma for this, it's a cool vector DB, runs in python, and many devs prefer it. And honestly even tho I've been chatting w/ @jeffreyhuber and @atroyn for a minute, I never actually used chroma (๐Ÿซฃ) so this is a good way to learn on the fly!
Right so, we have a problem, #CodeInterpreterCan (not) install from pip. twitter.com/i/status/1678863631910604805/video/1 So we need... to package chroma. I'm currently not sure if wheels include other requirements or nah, so checking w/ chatGPT real quick by uploading the whole repo ๐Ÿ˜‚
Crap, so even if we do have a wheel, it will include only metadata for other requirements. And chatGPT doesn't have any other ideas, it's like, use FAISS or something bro ๐Ÿ˜…
haha. ok I had a dumb but brilliant idea, on a similar environment (x86_64) w/ same python, I will just create a venv, install @trychroma , and then it'll install deps (there are many ๐Ÿ‘€) and I'll gzip that sucker and upload to #codeinterpreterCan and it agreed that it may work!
It's incredible to have #codeinterpreterCan as a partner in crime in this, from saying "I cannot operate in this restricted environment" it became my friend encouraging me to try shit!
Ok, venv uploaded! Now I need to give it a way to learn about chroma, luckily chroma links to a colab notebook right on their website! ๐Ÿซก Downloading that sucker as .py and uploading to CI
Sometimes it doesn't know it's own strength, so it struggles, and needs encouragement, just like a junior dev would!
๐Ÿ”ฅ Success! Am I the first person to run @trychroma (or really, any vector DB) inside #codeInterpreterCan ? maybe? IDK but feels good man, def. better than Doom cc ( @jeffreyhuber @atroyn @nisten @swyx @simonw )
Ok but making sure I have chroma is not enough right? we need to do some... data analysis. For that, we need, some embeddings! I have my tweets.json with all my tweets, and I've asked #codeinterpreterCan to .. generate a python file that will run this via @OpenAI ada-002 ๐Ÿ‘€
While it generates, I will tell you, that the mere fact you can upload and download files is such a game changer! So so good!
Ok so we have chroma running, but #codeinterpreter always forgets haha, a neat trick is to scroll up to a point it remembers, edit your message, and it'll effectively "fork" the conversation from there. Another neat trick is, after you've done so, generate a share link.
One additional thing to note, the environment and files often reset! Especially if you refreshed the page etc so make sure to ask chatGPT if it has the right files, otherwise it'll tell you it can't do things, but often won't know to tell you that it's because the file are gone!
Btw, it's been an hour or so and I'm still fighting the different error states, I know we can run @trychroma ,but I am unable to get chatGPT to instantiate it properly and persistently so I can actually... use it. ๐Ÿ˜… Oh ok finally , we golden!
Now let's load the f-ing embeddings! ... It's a constant struggle with env. reset, the model forgetting it can do it, and me getting tired and not being familiar with @trychroma (and GPT not knowing about all the capabilities) I think I can so something w/ collections? ๐Ÿค”
I downloaded the API cheatsheet from chroma github, and will feed all of it to chatGPT to see if it can learn, cause I'm gettin tired and my kids won't drive themselves from daycare! It seems... like it works, using @goodside amazing notalk;justgo bitchslap of a trick
Interestingly, I seem to have noticed a strong confabulation issue, so things need to be double checked! These are NOT my tweets and the error had a huge print, so it likely exceeded the context window of the model ๐Ÿ‘€
Ok kids in bed, it was a fun hang, but now back to pacifying this ... babyAGI ๐Ÿ˜‚ My trick from before works, editing my message from a previous state where things were ok, asking #codeinterpreter to verify that all directories exist (they were deleted!) and then re-uploading LFG
Allrighty, we have a collection confirmed. And hopefully persisted to disk via persist_directory="/path/to/persist/directory" It needed some convincing to ask #codeinterpreter to save it somewhere we have write access, let's try to actually load some embeddings!
YES! Victory! Embeddings are in the @trychroma collection we were able to list them, count them and.... could not query them ๐Ÿ˜‚ Wamp Wamp, but I still declare a success! We can't query because I also need to embed my query, and the environment doesn't have internet access DOH
Avatar

Alex Volkov (Thursd/AI)

@altryne

โœŒ๏ธ Vibe Coder ๐ŸŽ™๏ธ Host of @thursdai_pod โœจ AI Evangelist with @weights_biases ๐Ÿช„๐Ÿ working on @weave_wb Founder and CEO @ targum.video