Privacy Defense

User privacy on the internet is an important and unsolved problem. So far, no sufficient and comprehensive solution has been proposed that helps a user to protect his or her privacy while using the internet. Data is collected and assembled by numerous service providers. Solutions so far focused on the side of the service providers to store encrypted or transformed data that can be still used for analysis. This has a major flaw, as it relies on the service providers to do this. The user has no chance of actively protecting his or her privacy. In this work, we suggest a new approach, giving the user the same tool the other side has, namely data mining techniques to produce data which obfuscates the user’s identity. We apply this approach to search engine queries and use feedback of the search engines in terms of personalized advertisements in an algorithm inspired by reinforcement learning to generate new queries potentially confusing the search engine. We evaluated the approach using a real-world data set. While evaluation is a hard task, we achieve promising results that indicate that it is possible to influence the user’s prof i le that the search engine generates. This shows that it is feasible to defend a user’s privacy from a new and more practical perspective.

 

2017

Wicker, Jörg; Kramer, Stefan

The Best Privacy Defense is a Good Privacy Offense: Obfuscating a Search Engine User's Profile Journal Article

Data Mining and Knowledge Discovery, 31 (5), pp. 1419-1443, 2017, ISSN: 1573-756X.

Abstract | Links | BibTeX

 

The Smell of Fear

While the physiological response of humans to emotional events or stimuli is well-investigated for many modalities (like EEG, skin resistance, …), surprisingly little is known about the exhalation of so-called Volatile Organic Compounds (VOCs) at quite low concentrations in response to such stimuli. VOCs are molecules of relatively small mass that quickly evaporate or sublimate and can be detected in the air that surrounds us. The paper introduces a new field of application for data mining, where trace gas responses of people reacting on-line to films shown in cinemas (or movie theaters) are related to the semantic content of the films themselves. To do so, we measured the VOCs from a movie theater over a whole month in intervals of thirty seconds, and annotated the screened films by a controlled vocabulary compiled from multiple sources. To gain a better understanding of the data and to reveal unknown relationships, we have built prediction models for so-called forward prediction (the prediction of future VOCs from the past), backward prediction (the prediction of past scene labels from future VOCs), which is some form of abductive reasoning, and Granger causality. Experimental results show that some VOCs and some labels can be predicted with relatively low error, and that hints for causality with low p-values can be detected in the data.

The data set is available at Github.

KDD 2015 posterslides

2016

Williams, Jonathan; Stönner, Christof; Wicker, Jörg; Krauter, Nicolas; Derstorff, Bettina; Bourtsoukidis, Efstratios; Klüpfel, Thomas; Kramer, Stefan

Cinema audiences reproducibly vary the chemical composition of air during films, by broadcasting scene specific emissions on breath Journal Article

Scientific Reports, 6 , 2016.

Abstract | Links | BibTeX | Altmetric

2015

Wicker, Jörg; Krauter, Nicolas; Derstorff, Bettina; Stönner, Christof; Bourtsoukidis, Efstratios; Klüpfel, Thomas; Williams, Jonathan; Kramer, Stefan

Cinema Data Mining: The Smell of Fear Inproceedings

Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235-1304, ACM ACM, New York, NY, USA, 2015, ISBN: 978-1-4503-3664-2.

Abstract | Links | BibTeX | Altmetric

enviPath

enviPath is both, a database and a prediction system, for the microbial biotransformation of organic environmental contaminants. The database provides the possibility to store and view experimentally observed biotransformation pathways, and supports the annotation of pathways with experimental and environmental conditions. The pathway prediction system provides different relative reasoning models to predict likely biotransformation pathways and products.

Database

The enviPath database stores reviewed pathways from the scientific literature and predicted or user-entered pathways. The default package currently consists mainly of the pathways from the former EAWAG-BBD system. You can browse using the top menu. The database is organized in packages. Each package has an owner who can grant reading or writing permissions. We list data only as reviewed if it is reviewed by one of the organisations or groups in the reviewer group.

Prediction

enviPath can be used to predict biotransformation pathways. You can do this by simply using the input field on the start page. Enter a compound in SMILES format, or draw it using the molecule editor (by clicking on the dropdown on the left), and click on “Go!”. If the pathway for this compound was predicted before and is found in the database, a list of corresponding pathways will be returned, otherwise, the system will predict the pathway. Note that for anonymous users there is a limit to computation time and size of the predicted pathways. The resulting pathway will be stored in the database for 30 days and will be accessible and changeable for everyone. If you want to store the pathway for longer, prevent others from changing or seeing your pathways, or use more resources in terms of computation time and size of pathways, create an account (using the login button above) and set appropriate permissions for your data packages (the default settings should be suitable for most users).

enviPath is available at https://envipath.org.

Slides about enviPath are available here.

2017

Latino, Diogo; Wicker, Jörg; Gütlein, Martin; Schmid, Emanuel; Kramer, Stefan; Fenner, Kathrin

Eawag-Soil in enviPath: a new resource for exploring regulatory pesticide soil biodegradation pathways and half-life data Journal Article

Environmental Science: Process & Impact, 19 (3), pp. 449-464, 2017.

Abstract | Links | BibTeX | Altmetric

2016

Wicker, Jörg; Fenner, Kathrin; Kramer, Stefan

A Hybrid Machine Learning and Knowledge Based Approach to Limit Combinatorial Explosion in Biodegradation Prediction Incollection

Lässig, Jörg; Kersting, Kristian; Morik, Katharina (Ed.): Computational Sustainability, pp. 75-97, Springer International Publishing, Cham, 2016, ISBN: 978-3-319-31858-5.

Abstract | Links | BibTeX | Altmetric

Wicker, Jörg; Lorsbach, Tim; Gütlein, Martin; Schmid, Emanuel; Latino, Diogo; Kramer, Stefan; Fenner, Kathrin

enviPath - The Environmental Contaminant Biotransformation Pathway Resource Journal Article

Nucleic Acid Research, 44 (D1), pp. D502-D508, 2016.

Abstract | Links | BibTeX | Altmetric

2013

Wicker, Jörg

Large Classifier Systems in Bio- and Cheminformatics PhD Thesis

Technische Universität München, 2013.

Abstract | Links | BibTeX

2010

Wicker, Jörg; Fenner, Kathrin; Ellis, Lynda; Wackett, Larry; Kramer, Stefan

Predicting biodegradation products and pathways: a hybrid knowledge- and machine learning-based approach Journal Article

Bioinformatics, 26 (6), pp. 814-821, 2010.

Abstract | Links | BibTeX | Altmetric

2008

Wicker, Jörg; Fenner, Kathrin; Ellis, Lynda; Wackett, Larry; Kramer, Stefan

Machine Learning and Data Mining Approaches to Biodegradation Pathway Prediction Inproceedings

Bridewell, Will; Calders, Toon; de Medeiros, Ana Karla; Kramer, Stefan; Pechenizkiy, Mykola; Todorovski, Ljupco (Ed.): Proceedings of the Second International Workshop on the Induction of Process Models at ECML PKDD 2008, 2008.

Links | BibTeX

Scavenger

Machine Learning methods and algorithms are often highly modular in the sense that they rely on a large number of subalgorithms that are in principle interchangeable. For example, it is often possible to use various kinds of pre- and post-processing and various base classifiers or regressors as components of the same modular approach. We propose a framework, called Scavenger, that allows evaluating whole families of conceptually similar algorithms efficiently. The algorithms are represented as compositions, couplings and products of atomic subalgorithms. This allows partial results to be cached and shared between different instances of a modular algorithm, so that potentially expensive partial results need not be recomputed multiple times. Furthermore, our framework deals with issues of the parallel execution, load balancing, and with the backup of partial results for the case of implementation or runtime errors. Scavenger is licensed under the GPLv3 and can be downloaded freely at Github.

2015

Tyukin, Andrey; Kramer, Stefan; Wicker, Jörg

Scavenger - A Framework for the Efficient Evaluation of Dynamic and Modular Algorithms Inproceedings

Bifet, Albert; May, Michael; Zadrozny, Bianca; Gavalda, Ricard; Pedreschi, Dino; Cardoso, Jaime; Spiliopoulou, Myra (Ed.): Machine Learning and Knowledge Discovery in Databases, pp. 325-328, Springer International Publishing, 2015, ISBN: 978-3-319-23460-1.

Abstract | Links | BibTeX | Altmetric

BMaD

Boolean matrix decomposition is a method to obtain a compressed representation of a matrix with Boolean entries. BMaD (Boolean Matrix Decomposition Framework) is a modular framework, written in Java, that unifies several Boolean matrix decomposition algorithms, and provide methods to evaluate their performance. The main advantages of the framework are its modular approach and hence the flexible combination of the steps of a Boolean matrix decomposition and the capability of handling missing values.

BMaD is available at GitHub. MLC-BMaD, a multi-label classifier using Boolean matrix decomposition (implemented using the BMaD library) is also available at GitHub.

2014

Tyukin, Andrey; Kramer, Stefan; Wicker, Jörg

BMaD - A Boolean Matrix Decomposition Framework Inproceedings

Calders, Toon; Esposito, Floriana; Hüllermeier, Eyke; Meo, Rosa (Ed.): Machine Learning and Knowledge Discovery in Databases, pp. 481-484, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44844-1.

Abstract | Links | BibTeX | Altmetric

2012

Wicker, Jörg; Pfahringer, Bernhard; Kramer, Stefan

Multi-label Classification Using Boolean Matrix Decomposition Inproceedings

Proceedings of the 27th Annual ACM Symposium on Applied Computing, pp. 179–186, ACM, Trento, Italy, 2012, ISBN: 978-1-4503-0857-1.

Abstract | Links | BibTeX | Altmetric

OpenTox

The overall objective of the EU FP7 project OpenTox was to develop a framework that provides a unified access to toxicity data, (Q)SAR models, procedures supporting validation and additional information that helps with the interpretation of (Q)SAR predictions. The OpenTox framework has been developed as an open source project to optimize the dissemination and impact, to allow the inspection and review of algorithms and to attract external contributors. We closely collaborated with related projects (e.g. OECD toolbox) and authorities to agree on common standards and to avoid duplicated and redundant work. The project was very international with partners from Switzerland, Bulgaria, Italy, Greece, Russia, India, USA and Germany. The  partners were from academic, government and economic background. Additionally, the project had an advisory board with members from government organizations and industry. I was researcher, algorithm and REST web service developer in this project.

2013

Wicker, Jörg

Large Classifier Systems in Bio- and Cheminformatics PhD Thesis

Technische Universität München, 2013.

Abstract | Links | BibTeX

2010

Hardy, Barry; Douglas, Nicki; Helma, Christoph; Rautenberg, Micha; Jeliazkova, Nina; Jeliazkov, Vedrin; Nikolova, Ivelina; Benigni, Romualdo; Tcheremenskaia, Olga; Kramer, Stefan; Girschick, Tobias; Buchwald, Fabian; Wicker, Jörg; Karwath, Andreas; Gütlein, Martin; Maunz, Andreas; Sarimveis, Haralambos; Melagraki, Georgia; Afantitis, Antreas; Sopasakis, Pantelis; Gallagher, David; Poroikov, Vladimir; Filimonov, Dmitry; Zakharov, Alexey; Lagunin, Alexey; Gloriozova, Tatyana; Novikov, Sergey; Skvortsova, Natalia; Druzhilovsky, Dmitry; Chawla, Sunil; Ghosh, Indira; Ray, Surajit; Patel, Hitesh; Escher, Sylvia

Collaborative development of predictive toxicology applications Journal Article

Journal of Cheminformatics, 2 (1), pp. 7, 2010, ISSN: 1758-2946.

Abstract | Links | BibTeX | Altmetric

SINDBAD

To fully support the analysis of complex and structured data, new efficient computational methods  and suitable interfaces for data exploration have to be developed. Moreover, it is desirable to perform all tasks in the knowledge discovery process, from pre-processing to post-processing, on the basis of query languages. Inductive query languages should allow handling patterns/models as first-class objects, provide the right level of abstraction to the user (i.e., meaningful building blocks of data analysis), and emphasize the compositionality of data mining tasks. In a major development and implementation effort, we created a research prototype of a working inductive database, SINDBAD (Structured Inductive Database Development), to explore research topics in the context of data mining query languages and inductive databases. SINDBAD is built on top of a relational database management system, offers an SQL extension for data pre-processing, mining, and post-processing, and achieves closure by successive transformation of tables.

2010

Wicker, Jörg; Richter, Lothar; Kramer, Stefan

SINDBAD and SiQL: Overview, Applications and Future Developments Incollection

Džeroski, Sašo; Goethals, Bart; Panov, Panče (Ed.): Inductive Databases and Constraint-Based Data Mining, pp. 289-309, Springer New York, 2010, ISBN: 978-1-4419-7737-3.

Abstract | Links | BibTeX | Altmetric

2008

Wicker, Jörg; Brosdau, Christoph; Richter, Lothar; Kramer, Stefan

SINDBAD SAILS: A service architecture for inductive learning schemes Inproceedings

Proceedings of the First Workshop on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery, 2008.

Abstract | BibTeX

Wicker, Jörg; Richter, Lothar; Kessler, Kristina; Kramer, Stefan

SINDBAD and SiQL: An Inductive Database and Query Language in the Relational Model Inproceedings

Daelemans, Walter; Goethals, Bart; Morik, Katharina (Ed.): Machine Learning and Knowledge Discovery in Databases, pp. 690-694, Springer Berlin Heidelberg, 2008, ISBN: 978-3-540-87480-5.

Abstract | Links | BibTeX | Altmetric

Richter, Lothar; Wicker, Jörg; Kessler, Kristina; Kramer, Stefan

An Inductive Database and Query Language in the Relational Model Inproceedings

Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, pp. 740–744, ACM, Nantes, France, 2008, ISBN: 978-1-59593-926-5.

Abstract | Links | BibTeX | Altmetric

2006

Kramer, Stefan; Aufschild, Volker; Hapfelmeier, Andreas; Jarasch, Alexander; Kessler, Kristina; Reckow, Stefan; Wicker, Jörg; Richter, Lothar

Inductive Databases in the Relational Model: The Data as the Bridge Inproceedings

Bonchi, Francesco; Boulicaut, Jean-François (Ed.): Knowledge Discovery in Inductive Databases, pp. 124-138, Springer Berlin Heidelberg, 2006, ISBN: 978-3-540-33292-3.

Abstract | Links | BibTeX | Altmetric