Craft and publish engaging content in an app built for creators.
NEW
Publish anywhere
Post on LinkedIn, Threads, & Mastodon at the same time, in one click.
Make it punchier 👊
Typefully
@typefully
We're launching a Command Bar today with great commands and features.
AI ideas and rewrites
Get suggestions, tweet ideas, and rewrites powered by AI.
Turn your tweets & threads into a social blog
Give your content new life with our beautiful, sharable pages. Make it go viral on other platforms too.
+14
Followers
Powerful analytics to grow faster
Easily track your engagement analytics to improve your content and grow faster.
Build in public
Share a recent learning with your followers.
Create engagement
Pose a thought-provoking question.
Never run out of ideas
Get prompts and ideas whenever you write - with examples of popular tweets.
@aaditsh
I think this thread hook could be improved.
@frankdilo
On it 🔥
Share drafts & leave comments
Write with your teammates and get feedback with comments.
NEW
Easlo
@heyeaslo
Reply with "Notion" to get early access to my new template.
Jaga
@kandros5591
Notion 🙏
DM Sent
Create giveaways with Auto-DMs
Send DMs automatically based on engagement with your tweets.
And much more:
Auto-Split Text in Posts
Thread Finisher
Tweet Numbering
Pin Drafts
Connect Multiple Accounts
Automatic Backups
Dark Mode
Keyboard Shortcuts
Creators love Typefully
170,000+ creators and teams chose Typefully to curate their Twitter presence.
Marc Köhlbrugge@marckohlbrugge
Tweeting more with @typefully these days.
🙈 Distraction-free
✍️ Write-only Twitter
🧵 Effortless threads
📈 Actionable metrics
I recommend giving it a shot.
Jurre Houtkamp@jurrehoutkamp
Typefully is fantastic and way too cheap for what you get.
We’ve tried many alternatives at @framer but nothing beats it. If you’re still tweeting from Twitter you’re wasting time.
DHH@dhh
This is my new go-to writing environment for Twitter threads.
They've built something wonderfully simple and distraction free with Typefully 😍
Santiago@svpino
For 24 months, I tried almost a dozen Twitter scheduling tools.
Then I found @typefully, and I've been using it for seven months straight.
When it comes down to the experience of scheduling and long-form content writing, Typefully is in a league of its own.
Luca Rossi ꩜@lucaronin
After trying literally all the major Twitter scheduling tools, I settled with @typefully.
Killer feature to me is the native image editor — unique and super useful 🙏
Visual Theory@visualtheory_
Really impressed by the way @typefully has simplified my Twitter writing + scheduling/publishing experience.
Beautiful user experience.
0 friction.
Simplicity is the ultimate sophistication.
Queue your content in seconds
Write, schedule and boost your tweets - with no need for extra apps.
Schedule with one click
Queue your post with a single click - or pick a time manually.
Pick the perfect time
Time each post to perfection with Typefully's performance analytics.
Boost your content
Retweet and plug your posts for automated engagement.
Start creating a content queue.
Write once, publish everywhere
We natively support multiple platforms, so that you can expand your reach easily.
Check the analytics that matter
Build your audience with insights that make sense.
Writing prompts & personalized post ideas
Break through writer's block with great ideas and suggestions.
Never run out of ideas
Enjoy daily prompts and ideas to inspire your writing.
Use AI for personalized suggestions
Get inspiration from ideas based on your own past tweets.
Flick through topics
Or skim through curated collections of trending tweets for each topic.
Write, edit, and track tweets together
Write and publish with your teammates and friends.
Share your drafts
Brainstorm and bounce ideas with your teammates.
NEW
@aaditsh
I think this thread hook could be improved.
@frankdilo
On it 🔥
Add comments
Get feedback from coworkers before you hit publish.
Read, Write, Publish
Read, WriteRead
Control user access
Decide who can view, edit, or publish your drafts.
Is Gemini Pro better than GPT-4? What can we learn from a multi-dataset comparison?
First, what does it cost?
Put it this way: With "Language models are few shot learners" as input, and the Gettysburg address as the output, here's what it would cost against the 16K version of GPT-3.5.
Input: Gemini Pro ($0.059), GPT-3.5 ($0.064)
Output: Gemini Pro ($0.000906), GPT-3.5 ($0.00105)
Ouch.
But there's a lot to be learned here. The paper goes into great detail (with data) about performance across multiple domains.
Here are my takeaways:
1. Evidence suggests to me that Gemini is simply a very different model. 3.5, 4 and even Mixtral often have the same behavior patterns with output length, task type, etc. while Gemini just looks very different.
This could be architecture, could also be pretraining.
It could also just be prompting - most existing prompt literature (as well as quite a few benchmarks) are tuned for OpenAI's GPT, simply because it was one of the first and cheapest to exist.
I've observed differences with Claude this way.
Unfortunately without too compelling a model, or doing this work yourself, it's hard to figure out what the right way to prompt Gemini would be compared to GPT-3.5.
So is it worth using? Let's check the data.
Gemini is pretty good at translation (on the languages it supports), but on most other things you're probably right to presume it's worse or equal to gpt-3.5.
It's pretty egregiously bad at agent work - which is really perplexing.
In an age where we suspect OpenAI is finetuning models and retraining to almost always go through a Chain of Thought, Gemini seems to want to skip over reasoning even when prompted. It also marks a strangely high number of tasks as unachievable, perhaps due to lopsided training?
What's wonderful about this paper is that they provide proper pages with all the data and interactive graphs for the evaluation. I'll link each.
First, MMLU
Weird: on MCQs Gemini is biased to selecting D - likely lack of specific tuning for MCQs.
hub.zenoml.com/report/2674/Gemini%20MMLU
This is really interesting. Not sure if it's finetuning, but GPT-4 is suggested to go into CoT unprompted.
Also interesting that Gemini underperforms GPT-3.5 on most tasks, except things like College & High School Bio, MacroEc and Security Studies. GPT-4 still wins
💡Something new to me: They use the length of the Chain-of-Thought segment as a proxy for reasoning complexity. Seems all models degrade on accuracy as this increases. GPT-4 wins, but Gemini degrades the least.
They label this as "Gemini handles more complex reasoning chains"
Not fully convinced. Models can be verbose in reasoning for other reasons, as suggested by the difference in output length distribution.
Nevertheless it's a useful finding overall & a quick and dirty way to measure task complexity in CoT situations. Definitely using that!
Reasoning: BIG-Bench Hard
Once again they use question length as a proxy for complexity. Not sure how accurate this is, but the results are useful even if we read them based on input length (question verbosity + complexity + data).
hub.zenoml.com/report/2575/Gemini%20BBH
Gemini accuracy degraded heavily on longer questions, while the others did not. Mixtral and GPT-4 are notable for going up around the middle, might be a result of MoE.
We also have our first instance of Gemini beating GPT-4: Sports understanding.
Not sure where that's useful but good to know I guess
Another reason to take benchmark results with a big grain of salt: Evaluating responses is still pretty hard to do. Models will often give the right or wrong answer, without respecting the format. Proper LLM-based evals (finetuned models) might be the only solution here.
Next is code: hub.zenoml.com/report/2641/Gemini%20Code
Same story as before, mostly worse or equal to 3.5. However, Gemini gets almost linearly worse by output length, and the previous win on I/O length doesn't hold here.
Man I'm still amazed that Mixtral actually punches in this weight class
Next is Machine Translation. Gemini is good when it works, but it's also likely being held back by outright blocking of certain languages.
This might have been a way of adjusting reported performance by blocking low performing languages in the model perhaps?
Agents: hub.zenoml.com/report/2608/Gemini%20Webarena
Gemini tends to label tasks as unachievable, way more than GPTs. It also tended to give shorter responses and terminate in way shorter steps than GPT or even Mistral. It also tends to skip reasoning, even when asked to 'think step-by-step'.
This is really weird to me. In an age where we suspect OpenAI is finetuning models and retraining to almost always go through a Chain of Thought, Gemini seems to want to skip over that part even when prompted, and overall refuse to try to solve problems.
Agree with most of the conclusions, but from looking at the actual questions and answers in the dataset I'm not sure I agree with 3. I think length might be a poor proxy for complexity, in both the input and the output.
Hope that was helpful! Clearing out my bookmarks folder for the year has been a journey, I'm about 2% through. I'll post them as I go through them on Twitter.
Here's the full paper, it's well worth the read.
arxiv.org/abs/2312.11444