Hello everyone!
I am developing an application that uses Redis Streams to store large volumes of data collected through web scraping. One of the challenges I'm facing is how to check and prevent the insertion of redundant data into the Stream without need to have a "SQL thoughtment". Something simple, already exists in Redis
Simple example add to stream
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = redis::Client::open("redis://127.0.0.1/")?;
let mut conn = client.get_multiplexed_async_connection().await?;
let title = "BULLONERIE GALVIT2";
let group_name = "raworld2";
let description = "[IA generated] BULLONERIE GALVIT is a company specializing in the production and distribution of fasteners and metal components. Known for its high-quality standards, the company offers a wide range of products including bolts, nuts, and screws, catering to various industries such as construction and manufacturing. Their focus on innovation and customer service has established them as a trusted name in the sector.";
// Criando uma chave única baseada nos dados
let mut hasher = Sha256::new();
hasher.update(format!("{}{}{}", title, group_name, description));
let hash = hasher.finalize();
// Verificando se já existe
let exists: bool = conn.sismember("exploit_hashes", hash.as_slice()).await?;
if !exists {
// Adiciona o hash ao set de controle
conn.sadd("exploit_hashes", hash.as_slice()).await?;
// Adiciona os dados ao stream
conn.xadd_maxlen(
"queue_exploit",
StreamMaxlen::Approx(1000),
"*",
&[
("title", title),
("group_name", group_name),
("description", description)
]
).await?;
println!("Dados adicionados ao stream");
} else {
println!("Dados duplicados encontrados");
}
Ok(())
}
I would like to know if exists this approach is the most efficient or if there are other techniques that could be utilized to manage data duplication in Redis Streams. Additionally, how can I ensure that the verification logic works well with the high volume of information I am collecting?
Thank you in advance for any suggestions and guidance!