How does IPFS work?

•

IPFS is powering the Distributed Web. But do you know how it works? 🤔 A thread! 🧵

Table of Content: 1️⃣ What is IPFS 2️⃣ Content Addressing in IPFS 3️⃣ File System in IPFS 4️⃣ Finding peers in IPFS

1️⃣ First of all, ✨what is IPFS?✨ IPFS is a peer-to-peer (P2P) protocol for storing, accessing, and sharing data in a distributed file system. Instead of finding data by its location, IPFS finds it by its contents. Let me explain!

2️⃣ Content Addressing in IPFS We all love Twitter (the little bird app is so cool). To access Twitter, you put "www-twitter-dot-com" in your browser's URL bar. And since this URL points to Twitter's IP address, we get whatever webpage the Twitter server serves us.

But if we put Twitter on IPFS, we don't get a URL or IP address. Instead, we get a content identifier, or CID, to access Twitter that looks something like this: 👇🏻 /ipfs/QmWzi8evEPm1fUamM2UboxirvZ48hu7HfV7uEDGoGFCfcN

Unlike URLs, IPFS CIDs are not accessible by standard web browsers and require an IPFS gateway. One easy workaround to access files hosted on IPFS is to use an HTTP gateway (developed by Protocol Labs) like so: ipfs.io/ipfs/QmWzi8evEPm1fUamM2UboxirvZ48hu7HfV7uEDGoGFCfcN

✨How does Content Addressing work?✨ Right now, there's a problem that exists on the "Web2" Internet that the content is found by its location. You wanna Netflix & chill? Cool, go to netflix-dot-com! You wanna read CryptoShuriken's blog? Visit cryptoshuriken-dot-com!

But what if the location (or the URL) of the content changes for some reason? What if Netflix changes its URL to "netflix-dot-lol"? Now, you can't "netflix & chill!" Content addressing fixes this problem.

By content addressing, every piece of content has a CID, which is based on the data's cryptographic hash, that points to data on IPFS. That means two exactly similar files will have the same CID.

Any slight difference in the content of a file will generate a completely different CID. IPFS uses the SHA256 hashing algorithm to generate these CIDs.

As a result, if you try storing two identical files in the same IPFS node, they would only be stored once, thus eliminating duplication.

So, now we know that IPFS uses content addressing to identify and find content rather than looking at where it's located.

3️⃣ But how does IPFS store content on the File-System? 🤔 IPFS takes advantage of a data structure called Directed Acyclic Graphs (DAGs for short) to maintain links between files and directories. Specifically, it uses Merkle DAGs.

IPFS uses a Merkle DAG (very much similar to BitTorrent) that is optimized for representing files and directories. IPFS splits a file into smaller blocks, and the blocks are stored in different sources.

Ever downloaded a file from BitTorrent? Remember how a file was split into smaller chunks, and those chunks used to get downloaded from multiple peers at once? That's the same concept.

Merkle DAGs are a bit of a "turtles all the way down" scenario in a way that: A folder has its own CID. Every file in the folder has its own CID, Different smaller blocks of that file have their own CID.

4️⃣ Finding & Moving content in IPFS To find peers who are hosting the content you're looking for, IPFS uses a Distributed Hash Table (DHT for short).

✨What's a Hash Table?✨ A hash table is a database of keys to values. ✨What's a Distributed Hash Table?✨ A DHT is a hash table that is split across all the peers in a distributed network. If you wish to find a particular piece of content, you ask these peers.

IPFS uses LibP2P to provide the DHT and handle peers connecting and talking to each other. Once you know which peers are storing each block of the content you're looking for, you use the DHT to find the current location of those peers.

Once you've discovered your content and found the current location of that content, Now all that is left is to connect to that content and retrieve it.

To retrieve a piece of content, you have to request blocks of that content from the peers that are storing it. To do this, IPFS uses BitSwap. Bitswap is a module that allows us to connect to peers that have the content we want and make a request to access the content.

So, to recap: IPFS gives you a CID to the content stored, & then it links that content together in a Merkle DAG. It then uses DHTs to find peers who are hosting the content. It uses Bitswap to connect to those peers & retrieve the content. Hope that makes clear sense!

Also, please consider subscribing to my blog & get content like this delivered right into your inbox. ⬇️ cryptoshuriken.com/#/portal/

PS - I'm doing a #30Days30Threads challenge! This was the third one, 27 more to go! Stick around for tweets on: → Blockchain Development → Web3, in general → Content Creation 3/30 ✅