Is it really decentralized?
Yes, anyone can setup a node for the network, nodes cannot be tampered with and if by some grace of god someone manages to, that node will be black listed by all clients on the network right away. Nodes are only there as a way to tell clients which hash to download and seed for other clients.
I do not really understand how this is possible. Why can not a malicious node start mucking with people?
It’s not, really; ejecting bad nodes is a byzantine consensus problem. But depending on how it’s structured, it could at least raise the bar beyond what a single node acting alone can do to upset things in ways that are easily detectable. Also, a DoS with a very large quantity of false positive items is the most obvious form for an attack to take, and it’s possible that wouldn’t serve the interests of most prospective attackers.
I wonder whether it really runs the “fake item” test on every node, for every new item, as opposed to running it once and propagating that rejection? I also wonder about the topology in which the nodes connect to each other; saying it’s peer-to-peer still leaves a lot of open questions.
I’m guessing the author says this because values are keyed by their hash which makes it easy to validate the integrity of the data you get from any node.
I think there are other attacks on a Kademlia DHT which still work though, detailed in the S/Kademlia paper, for which S/Kademlia proposes some solutions and mitigations.
It makes it easy to validate that the address matches the data. It is still hard to determine if the data represents the desired content, so in this case they “crawling the raw text of a plethora of websites”. So its only as distributed as whatever that boils down to.