Get Inspired
Educate yourself
Start the journey
Collaborate
Background
In different scenarios, Centiro is involved in answering on questionnaires regarding Centiro products and policies. These tasks require a lot of manually work today which involves cross-checking information from different sources. We would like to investigate if some this work (or parts of it) can be automated. Additionally, we are interested in understanding how such an automation can impact the organization. Suggested research questions Q1: How can generative AI models be optimized to generate accurate and context-specific responses for questionnaires? Q2: How does automating questionnaire responses using AI impact: - organizational efficiency and - what risk are associated with such an automation? Read more here: https://career.centiro.com/jobs/5072071-leveraging-generative-ai-for-automating-questionnaire-responses/d9d5672a-f82d-4365-a78e-33bdb947072f |
Background
Every year, billions of parcels are being delivered. All of these become data points in an ever changing, complicated world. This makes it interesting from a data science perspective. With the use of shipment data, we might be able to gain valuable insight into how a transportation network behaves and changes. Moreover, it may prove useful for predicting future behavior. Carriers’ transportation networks can be interpreted as directed graphs. Edges are defined as routes containing information about distance, transport mode, and shipments’ weight. Nodes are defined as hubs/warehouses and source/destination points. A transport chain in a network is equivalent to a shipment travelling from a source- to a destination node and can be viewed as a Markov chain process. Furthermore, the shipments’ tracking events can be used to model and assert the intermediary nodes between the source and destination nodes. Areas of interest to explore: - Learn or find existing policies in a transportation network determining the transport chains. - Analysis of correlation between lead time, emission and distance. - Applying machine learning for predicting future behavior. - Simulate values for performance metrics using stochastic sampling from learned distributions with injecting, removing, or modifying nodes in the network and analyzing its impacts. |
IntroductionAs machine learning models become integral to various industries, from healthcare and finance to social media and autonomous systems, the importance of data privacy and security has never been more critical. One of the most pressing concerns in this domain is the risk posed by model inversion attacks [1]. These attacks represent a significant threat to the privacy of individuals whose data may have been used in training machine learning models. By leveraging the outputs or internal states of a trained model, adversaries aim to reverse-engineer and reconstruct specific data points, revealing sensitive information about individuals or class representatives. The danger of model inversion attacks lies in their ability to recreate underlying training data. For instance, an attacker may infer private attributes such as medical conditions, financial transactions, or personal behaviours based solely on the model’s predictions or intermediate outputs. This risk is particularly acute for models deployed in privacy-sensitive domains, where exposure of private data can have severe ethical, legal, and regulatory consequences. Read more here: https://careers.ai.se/jobs/5066843-master-thesis-pulling-sensitive-data-from-trained-models |
IntroductionTraining deep-learning models require large amounts of data. When this data is sensitive, e.g., containing personal information, it is important to ensure that no information can be extracted from the trained models. Lately, adversarial attempts at extracting training data have grown in interest. Two prominent attacks are membership inference attacks, attempting to guess if a given data point was present in the training data, and reconstruction attacks, also called model-inversion attacks, which attempt to recreate training data by interacting with the trained model. Although such attacks are relevant for any data modality, perhaps the most pressing issue pertains to text-data where the issue of copy-right has recently been in the media attention due to the lawsuit against Open-AI [8]. In light of this, there is a pressing need for content creators to confidently test whether or not their outputs have been included and leveraged in the training of commercial models. Membership inference attacks offer a promising venue for such an assessment. Read more here: https://careers.ai.se/jobs/5076281-master-thesis-semantically-aware-attacks-on-text-based-modes |
IntroductionIn recent years, the application of machine learning models to analyze time series data has seen rapid growth across numerous industries, including finance, healthcare, energy, and the Internet of Things (IoT) [5]. However, as the use of these models becomes increasingly widespread, concerns regarding the privacy and security of the underlying data have intensified. One particular privacy threat that has gained attention is the risk of membership inference attacks (MIAs) [2]. These attacks allow adversaries to determine whether a specific data point was part of a machine learning model’s training set. Such a capability can have serious consequences, especially when dealing with sensitive information such as financial transactions, medical histories, or proprietary business data. If adversaries can exploit these vulnerabilities, they may expose private information about individuals or gain insights into confidential datasets, thereby posing significant legal and ethical While much research has been conducted on membership inference attacks in domains like image classification and natural language processing, the vulnerability of time series-based models to these attacks have only been considered in few works [4, 3]. Time series data has unique properties—such as temporal dependencies and correlations—that could potentially influence the efficacy and nature of MIAs. Given the increasing reliance on machine learning models to process time series data in critical applications, it is essential to investigate the extent to which such models are susceptible to MIAs. Read more here: https://careers.ai.se/jobs/5076413-master-thesis-privacy-risks-in-time-series-models |
IntroductionMachine learning models are now indispensable across numerous sectors, from healthcare to finance, where they routinely handle sensitive personal data. While these models offer significant benefits, they also raise critical privacy concerns. One of the most pressing issues is the potential for adversaries to deduce whether specific data points were part of a model’s training set, a vulnerability exploited through Membership Inference Attacks (MIAs). These attacks pose serious privacy risks, allowing malicious actors to infer sensitive information about individuals. Much of the existing research focuses on MIAs that target exact data points in the training set [4, 8]. However, an important and often overlooked threat lies in range membership inference attacks [7, 6]. These attacks exploit the similarities between new data and training data, allowing adversaries to infer information about data points that are close—but not identical—to those used in training. This gap in the literature represents a significant privacy risk, as these near-identical data points can contain similarly sensitive information. This thesis will investigate range membership inference attacks and evaluate their impact on the privacy of machine learning models. By extending the scope of traditional MIAs, the goal is to establish a broader understanding of actual risks when adversaries are not interested in inferring exact data points but only approximations. Read more here: https://careers.ai.se/jobs/5076487-master-thesis-similarity-based-inference-attacks |
Upptäck hur AI revolutionerar media och underhållning genom att förändra kreativa processer, skapa nya roller och hantera etiska utmaningar i artikeln Hur AI revolutionerar media och underhållning. Läs mer om framtida trender.
Den här artikel publicerades först av Ampliro Insights. Läs originalartikeln här. |
Bill Gates explains why AI is as revolutionary as personal computers, mobile phones, and the Internet, and he gives three principles for how to think about it.
|
Exposing Vulnerabilities in Automatic LLM Benchmarks: The Need for Stronger Anti-Cheating Mechanisms
2024-10-13 12:45
MarkTechPost
Automatic benchmarks like AlpacaEval 2.0, Arena-Hard-Auto, and MTBench have gained popularity for evaluating LLMs due to their affordability and scalability compared to human evaluation. These benchmarks use LLM-based auto-annotators, which align well with human preferences, to provide timely assessments of new models. However, high win rates on these benchmarks can be manipulated by altering output […]
The post Exposing Vulnerabilities in Automatic LLM Benchmarks: The Need for Stronger Anti-Cheating Mechanisms appeared first on MarkTechPost. |
2024-10-13 12:30
MarkTechPost
Large language models (LLMs) have demonstrated impressive capabilities in in-context learning (ICL), a form of supervised learning that doesn’t require parameter updates. However, researchers are now exploring whether this ability extends to reinforcement learning (RL), introducing the concept of in-context reinforcement learning (ICRL). The challenge lies in adapting the ICL approach, which relies on input-output […]
The post Stochastic Prompt Construction for Effective In-Context Reinforcement Learning in Large Language Models appeared first on MarkTechPost. |
2024-10-13 12:15
MarkTechPost
Model merging is an advanced technique in machine learning aimed at combining the strengths of multiple expert models into a single, more powerful model. This process allows the system to benefit from the knowledge of various models while reducing the need for large-scale individual model training. Merging models cuts down computational and storage costs and […]
The post This AI Paper Introduces a Comprehensive Study on Large-Scale Model Merging Techniques appeared first on MarkTechPost. |
2024-10-13 12:00
MarkTechPost
Robotic task execution in open-world environments presents significant challenges due to the vast state-action spaces and the dynamic nature of unstructured settings. Traditional robots struggle with unexpected objects, varying environments, and task ambiguities. Existing systems, often designed for controlled or pre-scanned environments, lack the adaptability required to respond effectively to real-time changes or unfamiliar tasks. […]
The post ConceptAgent: A Natural Language-Driven Robotic Platform Designed for Task Execution in Unstructured Settings appeared first on MarkTechPost. |
2024-10-13 11:30
Wired
Back in 2021, Morgan Neville thought using AI to recreate the late Anthony Bourdain’s voice would be an interesting Easter egg in his documentary. He ended up being a canary in Hollywood’s AI coal mine.
|
2024-10-13 07:15
MarkTechPost
High latency in time-to-first-token (TTFT) is a significant challenge for retrieval-augmented generation (RAG) systems. Existing RAG systems, which concatenate and process multiple retrieved document chunks to create responses, require substantial computation, leading to delays. Repeated computation of key-value (KV) caches for retrieved documents further exacerbates this inefficiency. As a result, RAG systems struggle to meet […]
The post Researchers from Moore Threads AI Introduce TurboRAG: A Novel AI Approach to Boost RAG Inference Speed appeared first on MarkTechPost. |
2024-10-10 16:48
ScienceDaily
A new computer simulation of how our brains develop and grow neurons has been built. Along with improving our understanding of how the brain works, researchers hope that the models will contribute to neurodegenerative disease research and, someday, stem cell research that helps regenerate brain tissue.
|
University: Chalmers University of Technology
With the drastic spread of smartphones and other mobile devices capable of continuously collecting data, the concept of responsible mining has become an increasingly important topic. Older data anonymization methods are quickly becoming obsolete as powerful machine learning models, capable of leveraging the exponentially increasing amount of publicly available data to deanonymize sensitive information, emerge. Current anonymization methods such as differential privacy may have unforeseen consequences when training on sensitive machine models, as it utilizes data perturbations, that can significantly affect the performance of the decentralized system. This has led to increasing privacy concerns among users, and more strict regulatory privacy laws such as the European Union's GDPR legislation and the state of California's CCPA act.
One proposed method for overcoming these issues is through distributed ML models such as federated learning. This has great potential as it enables collaborative training without sharing any of the raw data, thus allowing for model training even with small local datasets. The benefits of these systems have also been shown in practice, as some hospitals, retail stores, and even Google have obtained valuable insights into their respective operations through it.
However, current FL models build on training one central model, which introduces a computational bottleneck as well as a single point of vulnerability for potential attackers. These models also generally assume that the data is drawn from a time-independent and stationary distribution, which is seldom the case. Since each unique client typically has some bias to their data, and there is no guarantee that the data belonging to the clients are identically and independently distributed. This results in differing client data distributions (the non-IID data paradigm), which hinders efficient training of deep learning models. Further, user sentiment and preference may also change drastically due to impactful events such as the pandemic or macroeconomic events, further impairing the training. All this leads to what is called concept drift, where data distributions change over time. This causes a dilemma, for when different clients experience data drifts at different times, no single global model can perform well for all clients. And similarly, when multiple concepts exist simultaneously, no centralized training decision works well for all clients.
Decentralized learning's strength is that it removes the necessity for a central node, as clients communicate peer-to-peer and store their own individual model and data. These frameworks also remove the central point of vulnerability that federated frameworks have, making them more robust against attacks, and also personalized for each client. As continual learning for decentralized models is still an unexplored research area, this project aims to explore methods that mitigate catastrophic forgetting in the decentralized setting and develop models that can adapt to distribution shifts in a decentralized setting.
Engineering students looking for a real-world challenge
2024-05-01
-
2025-06-06
|
|
VISS.AI is looking for problems to solve
2024-08-01
-
2024-11-30
|
Feature Store Summit
2024-10-15
Conference
|
||
AI & Futureproofing Your Business
2024-10-17
Conference
|
||
OpenTech Talk - AI
2024-10-17
Meetup
|
||
AI med Petra
2024-10-17
Workshop
|
||
Azure Red Hat Openshift AI Workshop
2024-10-18
Workshop
|