image loading... image loading...
Actualités

La rentrée des M1 parcours DSC, CPS2, MLDM : lundi 4 septembre à 9h, rendez-vous dans la cours du site Carnot/Manufacture devant le bâtiment B. Programme de la semaine d'intégration:
- en M1 DSC
- en M1 MLDM
- en M1 CPS2 .

La rentrée des M2 parcours DSC : lundi 11 septembre 14h00, salle B08, Site Carnot/Manufacture.

La rentrée des M2 parcours MLDM : lundi 25 septembre, salle B09 - horaire et salle à confirmer, Site Carnot/Manufacture.

La rentrée des M2 parcours CPS2 : lundi 11 septembre, 9h, à l'espace Fauriel, Ecole des Mines de Saint-Etienne.

Le 17 février 2017 a eu lieu la session de remise de diplômes du master !




Les étudiants effectuent un stage de 3 à 5 mois en première année de master et un stage de 4 à 6 mois en deuxième année de master (de février à juin ou septembre). En première comme en deuxième année, le stage peut s'effectuer soit dans le cadre d'un projet de recherche et développement en entreprise, soit dans le cadre d'un projet de recherche en milieu académique ou industriel.

Voici des exemples de stages de recherche proposés deuxième année de master


1. Smart Communities in Ambient Intelligence Context

Advisor: Pierre MARET, Gauthier PICARD (potential visit in Japan for the student)

E-Mail: P. Maret and G. Picard

Keywords: Energy, Smart Communities, Ambient intelligence, Simulation, MAS, Optimization, Smart Phone, User interface

Summary

Power consumption peaks are one of the major problems of industrial societies. A provisional and collective management of production and consumption allows to consider reducing the levels of these peaks. We are interested in optimizing the management of energy at the level of a building or location frequented by many users for a time ranging from 1 hour to 1 day. We assume that each of these users have an energy consuming and producing portable device. Consumption takes place in a personal cooling or heating device. The production is based on the exploitation of users' movements. These individual devices are also equipped with communication components to interact with the user's smartphone, and to interact with other devices and the location's information system. After transmitting/receiving individual parameters, each device will (or calculate) a proposed plan of operation for optimal energy management. The user should be able to decide or not to follow the plan.

The first step consist in developing a review of the state of the art in this field: Smart Communities, transient communities, mulit-agent optimization, etc. The aim of the internship will be then to propose a model of this system (with different variations) and demonstrate the conditions under which it allows an effective reduction of energy consumption. Implementation on Smart Phone the actual operation (user-oriented) will be performed.

The model will take into account individual parameters such as the level of battery charge, the desired comfort level (consumption), forecasts of mobility (loading), the planned schedule (length of service), the period of review politics. The parameters in the system will be the number of devices, forecast consumption, the presence or absence of centralization of information, communication and coordination with overall conditioning system of the site.

The project will be conducted in collaboration with a research center in Tokyo. A stay in Tokyo (3 months) may be part of the project especially for the realization of the actual implementation. ’

Expected results

Theoretical results:

  • State-of-the-art
  • Problem modelling
  • Optimization protocol
  • Simulation modelling

Practical results:

  • Simulator
  • Demonstrator on Smartphone

References:

[1] MIT Wristband http://www.wired.com/design/2013/10/an-ingenious-wristband-that-keeps-your-body-at-the-perfect-temperature-no-ac-required/

[2] Pierre Maret, Frédérique Laforest and Dimitri Lanquentin. A Semantic Web Model for Ad Hoc Context-Aware Communities. ICEIS (2) 2014: 591-598.

[3] Persson, C., Picard, G., Ramparany, F., and Boissier, O. (2012a). A jacamo-based governance of machine-to-machine systems. In Demazeau, Y., Müller, J. P., Rodríguez, J. M. C., and Pérez, J. B., editors, Advances on Practical Applications of Agents and Multiagent Systems, Proc. of the 10th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS 12), volume 155 of Advances in Soft Computing Series, pages 161-168. Springer.

[4] Ames Pomeroy. Keep cool with a Peltier on the belt. The Japan Journal. 2011. p.27

2. Semantic modeling in interdisciplinary Cultural Heritage knowledge

Advisor: Pierre Maret, Alain Trémeau

Mail: Pierre.Maret@univ-st-etienne.fr

and Alain.tremeau@univ-st-etienne.fr

Keywords: ontology, semantic web, linked data, cross-refering of entities, concept concomitance, relationships, ambiguities between concepts

Summary

Projects dealing with the Cultural Heritage involve experts of arts, humanities and heritage science, and experts in science and data acquisition technologies. In order to facilitate common understanding a Knowledge Representation COSCHKR has been proposed (COSCH project [1]). This Knowledge Representation is based on an ontology schema mediating information exchange between disciplines. This ontology schema helps bringing together under common semantic umbrella information from heterogeneous sources and structures. The design and structure of the COSCH model exploit Semantic Web framework [2]. This model is the first step towards a common understanding of the interdisciplinary information. The intention is i) to exploit unifying semantics among knowledge sources and ii) to exploit the use of logic to infer new knowledge discovery from the existing knowledge model.

Background description: Previous attempts at structuring knowledge of cultural heritage have primarily focused on information exchange and dissemination through linked data. CIDOC-CRM, the most used conceptual model to document cultural heritage objects, focuses on how the highest level of data interoperability and information exchange could be maintained [3]. Research projects carried out by the Centre de Recherche et de Restauration des Musées de France (C2RMF) rely on CIDOC-CRM for content management and retrieval of information. The CIDOC-CRM is used as a base knowledge model and maps its metadata schema for providing semantics to its high-end data [4]. The Europeana project promotes data sharing in cultural heritage through Linked Open Data (LOD) [5].

A PhD shall follow this Master thesis, possibly in relation with the COSH project

Expected results

Theoretical results:

  • Study if such a structuring knowledge approach has been already used in other domains, especially in domains requiring knowledge documentations (e.g. in biology or in environmental monitoring).

Practical results:

  • to extend the structure of the COSH model to facilitate common understanding between experts in cultural heritage subject domains and experts in technological domains. To address this question we propose to focus on a specific case study related to Romanesque doorways (3D reconstruction of buildings). Through a study of the scientific and technical papers published in this domain, the student will build a system to help discovering underlying knowledge through rule-based semantic interpretations.

  • to extend the metadata model defined to model the knowledge of experts and of technicians. Extensions will tend to make it adapted and relevant to other case studies (e.g. 3D reconstruction of coins or ceramics). Also, rules identified for the first case study could be extended to other case studies.

References:

[1] COSCH (COST Action TD1201), see www.cosch.info.

[2] Berners-Lee, T. et al.: The Semantic Web. Scientific American, pp. 34-43 (2001).

[2] Boeuf, P. L.: Definition of the CIDOC Conceptual Reference Model. ICOM/CIDOC CRM Special Interest Group (2013).

[3] Mauro Mazzieri and Aldo Franco Dragoni. A Fuzzy Semantics for the Resource Description Framework. In Workshops on Uncertainty Reasoning for the Semantic Web I, URSW 2005-2007, Revised Selected and Invited Papers, pages 244-261, 2008.

[4] Pillay, R. et al.: Archive Visualization and Exploration at the C2RMF. International Cultural Heritage Informatics Meeting (ICHIM07). Ontario, Toronto, Canada: Proceedings. Toronto: Archives & Museum Informatics (2007).

[5] Europeana, 2013. Making connections. Europeana Foundation, see http://pro.europeana.eu/documents/858566/af0f9ec1-793f-418a-bd28-ac422096088a

3. Multiagent dynamic energy trading protocol using DCOPs

Advisor: Gauthier Picard

Phone: +33 (0)4 77 42 66 84

Mail: gauthier.picard@emse.fr

Summary

It is now possible to conceive new applications supporting the exchange of energy units in a P2P manner, as the sharing economy promotes for goods. As sharing economy impacts market efficiency and flexibility, such energy prosuming is forecast to impact energy usage by decreasing energy loss and by peak shaving, and therefore significantly reduce CO2 emissions. Energy prosuming is based on the underlying social structures formed by a network of prosumers at block, city, or even larger scale. This structure is highly dynamical (e.g. users join or leave freely), and highly distributed. Therefore it should be trustworthy, so that people adopt such applications. In our research, we promote trustworthiness through an energy allocation mechanism that operates in a decentralized manner (as opposed to centralized authorities that can collect personal information). This mechanism is conceived to ease the preservation of personal data privacy, which is currently a key societal challenge. Privacy will be handled by taking advantages of the intrinsic distributed structure of the network, social mechanisms like trust and reputation, and the multi-agent approaches we envision.

We envision to model peer-to-peer (prosumer-to-prosumer) energy assignment problems in smart grids as DCOP (Distributed Optimization Problems) [1], and to solve them by extending existing algorithms. A DCOP is the distributed analogue to constraint optimization, in which a group of agents must distributively choose values for a set of variables such that the cost of a set of constraints over the variables is either minimized or maximized. Due to the intrinsic characteristics of prosumer communities, and the complexity of the underlying DCOP, approximate algorithms seem promising and relevant solutions to ensure scalability in large communities. Previous investigations (notably by IIIA-CSIC) have shown the relevance of algorithms using graphical models, like belief-propagation (max-sum), to solve DCOPs in producer-consumer networks [2].

We aim at advancing the state of the art by extending the above-mentioned approach to solve DCOP while coping with dynamic ditribution energy networks ---e.g. when prosumers frequently change there offers, and offers are planned over long period. Some DCOP algorithms are ensured to converge on tree-structured networks [1]. However, when dealing with dynamics, these methods may not scale up due to larger message sizes. This master thesis is therefore aimed to provide solutions for realistic dynamic energy networks.

Expected results

Theoretical results:

  • Consistent state-of-the-art ondynamic DCOPs

  • DCOP model handling time in offers for energy trading

  • Characterization of energy networks and the appropriate techniques to handle them

Practical results:

  • A set of algorithms implemented using the SEAS maxsum library (Java, Python, CPLEX) [3]

  • A set of graph generators to experiment the proposed algorithms

  • A set of experimental results about the performances of the proposed algorithms compared to state-of-the-art

Keywords: multiagent systems, DCOP, time, smart grids

References:

[1] https://sites.google.com/site/optmas2011/

[2] T. Penya Alba, M. Vinyals, J. Cerquides, and J.A. Rodríguez-Aguilar. Exploiting max-sum for the decentralized assembly of high-valued supply chains. In AAMAS'14, 2014.

[3] https://bitbucket.org/cerquide/seas-maxsum

4. Multiagent energy trading protocol using loopy belief propagation

Advisor: Gauthier Picard

Phone: +33 (0)4 77 42 66 84

Mail: gauthier.picard@emse.fr

Summary

It is now possible to conceive new applications supporting the exchange of energy units in a P2P manner, as the sharing economy promotes for goods. As sharing economy impacts market efficiency and flexibility, such energy prosuming is forecast to impact energy usage by decreasing energy loss and by peak shaving, and therefore significantly reduce CO2 emissions. Energy prosuming is based on the underlying social structures formed by a network of prosumers at block, city, or even larger scale. This structure is highly dynamical (e.g. users join or leave freely), and highly distributed. Therefore it should be trustworthy, so that people adopt such applications. In our research, we promote trustworthiness through an energy allocation mechanism that operates in a decentralized manner (as opposed to centralized authorities that can collect personal information). This mechanism is conceived to ease the preservation of personal data privacy, which is currently a key societal challenge. Privacy will be handled by taking advantages of the intrinsic distributed structure of the network, social mechanisms like trust and reputation, and the multi-agent approaches we envision.

We envision to model peer-to-peer (prosumer-to-prosumer) energy assignment problems in smart grids as DCOP (Distributed Optimization Problems) [1], and to solve them by extending existing algorithms. A DCOP is the distributed analogue to constraint optimization, in which a group of agents must distributively choose values for a set of variables such that the cost of a set of constraints over the variables is either minimized or maximized. Due to the intrinsic characteristics of prosumer communities, and the complexity of the underlying DCOP, approximate algorithms seem promising and relevant solutions to ensure scalability in large communities. Previous investigations (notably by IIIA-CSIC) have shown the relevance of algorithms using graphical models, like belief-propagation (max-sum), to solve DCOPs in producer-consumer networks [2].

We aim at advancing the state of the art by extending the above-mentioned approach to solve DCOP while coping with loopy ditribution energy networks. Belief-propagation algorithms are ensured to converge on tree-structured networks [3]. However, when dealing with loopy graphs, these methods may fail to find the optimal assignment. This master thesis is aimed to provide solutions for realistic energy networks, using loopy overlay networks (prosumer communities).

Expected results

Theoretical results:

  • Consistent state-of-the-art on belief-propagation over loopy graphs

  • DCOP model of energy trading problems

  • Belief-propagation techniques over loopy graphs to solve energy trading problem

  • Characterization of energy networks and the appropriate techniques to handle them

Practical results:

  • A set of algorithms implemented using the SEAS maxsum library (Java, Python, CPLEX) [4]

  • A set of graph generators to experiment the proposed algorithms

  • A set of experimental results about the performances of the proposed algorithms compared to state-of-the-art

Keywords: multiagent systems, belief propagation, DCOP, smart grids

References:

[1] https://sites.google.com/site/optmas2011/

[2] T. Penya Alba, M. Vinyals, J. Cerquides, and J.A. Rodríguez-Aguilar. Exploiting max-sum for the decentralized assembly of high-valued supply chains. In AAMAS'14, 2014.

[3] D. Koller and N. Friedman. Probabilistic Graphical Models. MIT Press, 2011.

[4] https://bitbucket.org/cerquide/seas-maxsum

5. Learning to clean satellite images

Advisor: Elisa Fromont (LaHC), Pierre Gançarski (Strasbourg), Amaury Habard (LaHC), Damien Muselet (LaHC), Marc Sebban (LaHC)

Phone: +33 (0)4 77 91 57 67 (Elisa fromont)

Mail: Elisa Fromont (LaHC), Pierre Gançarski (Strasbourg), Amaury Habrard (LaHC), Damien Muselet (LaHC), Marc Sebban (LaHC)

Summary

Satellite image data (for example, those provided by the CNES) suffer from clutters, need to be ortho-rectified and processed to remove clouds or to extract surface reflectance from radiometric information. The "orthorectification" is a geometrical correction of images that aims at presenting them as if they had been captured from the vertical. In practice, it transforms the satellite picture in an image that can be superimposed on a map. These pre-processing steps take money and time but they are mandatory to be able to perform any data mining tasks (for example a region classification task) on the image. Generally the geometric corrections are done before data are provided to the end-user. To extract the reflectance of objects, one needs to take into account the atmospheric attenuation. Unfortunately, in some cases, the lack of information about the atmophere state (aerosol, temperature, etc.) when the image has been captured, prevents the user to apply classical atmospheric correction scheme.

The aim of the this Master thesis is to study how machine learning techniques could be used to learn a mapping function from the raw images to the "cleaned" ones. In particular, regression and/or metric learning techniques could be explored. The student would be provided with 15 high dimensional images (Medium Spatial resolution, 1 pixel = 20 or 30meters) in their geometrically corrected and cleaned (of atmopherical perturbances) versions. Information about the attributes whch describe the images (for 3 radiometric bands for multispectral images to 120 or more for hyperspectral images) will be provided. Some of these images could be kept for testing and some of them could be used to learn a mappping function, either from the pixels or from the regions of the original image to the target ones.

Expected results The student will be expected to first review the literature about metric learning [1,2] and regression for satellite images [3,4,5]. Then, he/she will have to investigate possible metric learning and/or regression techniques to learn the function that could be used to automatically clean a satellite image. This function will be implemented. A GUI that would allow a user to upload and visualize a satellite image and compute its cleaned version would be a plus.

Keywords: Image processing, Machine Learning.

References:

[1] A Survey on Metric Learning for Feature Vectors and Structured Data

[2] The M2 Master's thesis of Michaël Perrot

[3] Un livre Traitement des données de télédétection, ISBN: 9782100548507

[4] A book Multispectral Satellite Image Understanding, ISBN 978-0-85729-667-2

[5] Introduction to Statistics and Data Analysis, Roxy Peck (Author), Chris Olsen (Author), Jay L. Devore (Author), ISBN 13: 9780840054906, Publisher: Brooks/Cole

6. Machine Learning for structured shape prediction

Advisor: Amaury Habrard (UJM, LaHC), Sylvain Lefebvre (INRIA Nancy)

Phone:

Mail: Amaury Habrard (fistname [dot] lastname [at] univ-st-etienne [dot] fr)

Summary

Several techniques in Computer Graphics propose to synthesize new shapes (2D / 3D objects) from examples. They typically start from a set of building blocks (base shapes), extracted from existing objects, that can be composed together to form new objects. These techniques often rely on generative models, e.g. shape grammars or graphical models, defining how the blocks may be composed together. Recently, machine learning techniques have been applied to build up such models [1-3]: Blocks are represented as symbols and the model is learned from examples of assemblies of the blocks. However, the current contributions suffer generally from two main limitations: (i) they new generated objects are too close to the original examples, (ii) the learned model is not powerful enough to capture all the constraints to build valid objects (the different parts are not stitched correctly or the object does not respect reasonable physical constraints like a chair that cannot stand because because the legs are misplaced).

The objective of this master thesis is to study how machine learning learning can help to design new methods for shape synthesis. We will in particular investigate similarity learning and structured prediction approaches. In the context of shape synthesis, the idea is to learn a model able to "correct" some shapes by proposing new close solutions but valid according to a given style or some constraints. A possible application is to provide a tool able to help users aiming at designing new shapes by making use of a learned model able to propose to the user new correct possible extensions from its current proposal. In this context, some applications involving 3D printing of the generated objects are possible.

The fields covered by this subject are at the crossroads of computer graphics, machine learning, 2D/3D synthesis, structured inference, online learning and optimization. We expect the candidate to be both interested in 3D/2D design and in conception of learning models, and also to have a good knowledge of the C language; ideally, some experience with OpenGL or graphics programming would be appreciated.

This project will be done in collaboration with S. Lefebvre (INRIA Nancy) in the context of the ERC shapeforge and will require to spend half of the internship to INRIA Nancy.

Expected results

Theoretical results:

  • A machine learning based model for shape synthesis.

Practical results:

  • Implementation of the proposed model.

  • Application for 2D/3D object design.

Keywords: machine learning, optimization, computer graphics

References:

[1] J.O. Talton, L. Yang, R. Kumar, M. Lim, N.D. Goodman, and R. Mech. Learning Design Patterns with Bayesian Grammar Induction. Proceedings of the 25th ACM Symposium on User Interface Software and Technology, 2012.

[2] J.O. Talton, Y. Lou, S. Lesser, J. Duke, R. Mech, and V. Koltun. Metropolis Procedural Modeling. ACM Transactions on Graphics 30(2), 2011.

[3] Y. Yeh, K. Breeden, L.Yang, M. Fisher, and P. Hanrahan. Synthesis of Tiled Patterns using Factor Graphs. Transactions on Graphics, 2012.

[4] R. Girshick, P. Felzenszwalb, D. McAllester. Object Detection with Grammar Models. In proc. of NIPS 2011.

[5] F. Maes. Learning in Markov Decision Processes for Structured Prediction, PhD thesis, 2009.

[6] M.-W. Chang, V. Srikumar, D. Goldwasser, D. Roth: Structured Output Learning with Indirect Supervision. In ICML 2010.

[7] P. Domingos. A few useful things to know about machine learning. Communications of the ACM, 2012.

7. Definition of an ontology of Ethics for Ethical Collectives of Autonomous Agents

Advisor: Olivier Boissier, Philippe Beaune

Phone: +33 (0)4 77 42 66 14

Mail: Olivier.Boissier@emse.fr Philippe.Beaune@emse.fr

Summary

Machines and software acting on behalf of humans (i.e. agents) get more autonomy and are increasingly less under the control of human operators or users. Increasing scope of the activities of autonomous agents become a major issue in current socio-technical systems. Therefore especially in machines-humans interaction, it is of first importance to ensure that agents do not harm humans or threaten their decision autonomy [1]. As shown by recommendations and advices proposed in large projects [2, 3, 4, 5, 6], ethic becomes a first importance issue in such systems. However, ethics by design (i.e. hard-coded ethical principles) as proposed by current approaches is no more sufficient to tackle with open and decentralized socio-technical systems where ethical decisions and behaviours need to be adapted to the current execution context.

As promoted by [7] or [8], Semantic Regulation Systems are promising approaches to make explicit regulatory systems accessible to autonomous agents. To codify ethical principles in any given domain including those pertinent to the behavior of autonomous agents in such systems requires the representation of ethical principles, ethical dilemmas in terms of ethically relevant features, which duties need to be considered and how to weigh them, etc. It is thus of first importance to define explicit representation of these ethical concepts in order to enable machine readability and reasoning on these ethical principles, for helping autonomous agents to realize ethical behavior.

Semantic Web technologies offer ontology languages that can be used for specifying knowledge domains and reasoning over them. The aim of the project is to define ontologies of ethical principles that could help autonomous agents in their reasoning on the appropriate behaviour to execute in large cooperation networks. These ontologies will have to comply to the Semantic Web formats, and eventually reuse already existing ontologies.

Expected results

Theoretical results:

  • Candidate ontologies have been proposed in different domains [9, 10, 11, 12, 13, 14]. We expect an analysis and comparison of these aforementioned studies.

  • The student should propose a candidate core ontology for representing core notions and the resoning mechanisms behind, possibly reusing existing work from the Semantic Web community.

Practical results:

  • A state of the art of related issues and problems.

  • Integration of the defined ontology in the ETHICAA framework under definition in the ETHICAA ANR Project

  • Application and illustration on a limited practical use case

Keywords:Ethics, Semantic Web, Ontology, Multi-Agent Systems

References:

[1] McLaren, B.M. (2011). "Computational Models of Ethical Reasoning: Challenges, Initial Steps, and Future Directions." In M. Anderson & S.L. Anderson (Eds.), Machine Ethics. Chapter 17, 297-315, Cambridge University Press.

[2] EFORTT, "http://www.lancs.ac.uk/efortt/"

[3] ETICA, "http://www.etica-project.eu"

[4] EthiCAL, "http://www.kcl.ac.uk/law/research/centres/medlawethics/research/computer.aspx"

[5] MINAmi, "http://www.fp6-minami.org/"

[6] ETHICBOTS, "http://ethicbots.na.infn.it/"

[7] Pompeu Casanovas, Semantic Web Regulatory Models: Why Ethics Matter, Philosophy and Technology (Springer Journal) Special issue on Information Societies, Ethical Enquiries (2014)

[8] Alexandra-Madalina Zarafin, Antoine Zimmermann, Olivier Boissier, Integrating Semantic Web Technologies and Multi-Agent Systems: A Semantic Description of Multi-Agent Organizations, Agreement Technology Conference, 2012

[9] D. Koepsell, R. Arp, J. Fostel, and B. Smith, “Creating A Controlled Vocabulary for the Ethics of Human Research: Towards A Biomedical Ethics Ontology,” Journal of Empirical Research on Human Research Ethics, vol. 4, no. 1, pp. 43–58, Mar. 2009.

[10] Universal Core

[11] Suggested Upper Merged Ontology (SUMO)

[12] Basic Formal Ontology (BFO)

[13] German Reference Centre for Ethics in the Life Sciences (DRZE)

[14] B. M. McLaren, Extensionally Defining Principles and Cases in Ethics: an AI Model, vol. 150, pp. 145-181. ( http://www.cs.cmu.edu/~bmclaren/ethics/)

8. Concrete Specification and Implementation of Annotated RDF(S)

Advisor: Antoine Zimmermann (EMSE, ISCOD)

Mail: Antoine.Zimmermann@emse.fr

Keywords: Semantic Web, Linked Data, RDF, RDFS, Annotated RDF, W3C standards, SPARQ

Summary

The Resource Description Framework (RDF) [1] is the lingua franca of the Web of Linked Data and the Semantic Web. It is a data model where everything is expressed in terms of triples subject predicate object which assert a statement about the subject having a property predicate which value is object. To enable interlinking of RDF data from various sources, everything in RDF is identified with URIs, of which URLs are a subset. URIs are universal identifiers, which means that they are meant to identify a single thing everywhere on the Web. All RDF documents on the Web that are using the same URI are, in principle, describing the same resource. But due to the openness of the Web, anyone can publish RDF statements about any subject. This leads to difficulties in integrating RDF data from multiple sources, as some information on the Web is not trustworthy, or may be outdated, or only partially correct, or simply incompatible view points of the same entity.

To overcome this problem, a number of extensions of RDF have been proposed to annotate RDF triples with contextual information such as provenance [2], validity time [3], fuzzy values [4], trust measures, etc. Those proposals can be generalised into the notion of Annotated RDF [5], where triples are assigned values from a well defined mathematical structure. This value is used in automatic inferences [6] and query answering [7]. At the moment, the proposal is still only abstract and theoretical but we would like to make a concrete proposal for an exchange syntax which should:

  • comply with existing standards, especially with practical usage of Linked Data;
  • allow one to annotate triples with multiple types of annotation (e.g., temporal and fuzzy) [8];
  • provide a uniform treatment of both annotated and non-annotated RDF;
  • make implementation of inference system and query engine modular and extensible to new types of annotations.

While the concepts and principles behind Annotated RDF are rather abstract and formal, which will necessitate understanding a wide state of the art, we expect that the outcome of the internship be concrete and guided by practical considerations.

Expected results

Theoretical results:

  • a state of the art on annotated RDF(S) and related Semantic Web standards and APIs, comparing implementations of various annotated framework;
  • definition of new annotation domains (e.g., temporal with recurring time frames, advanced access control);
  • investigate efficient processing of Annotated RDF data, especially using rule based approaches.

Practical results:

  • specify the concrete format for Annotated RDF, following practices of W3C specifications;
  • implement a modular reasoner, based on the theoretical investigation and/or a query engine.

References:

[1] Frank Manola and Eric Miller. RDF Primer. W3C Recommendation, 2004.

[2] Renata Queiroz Dividino, Sergej Sizov, Steffen Staab, and Bernhard Schueler. Querying for Provenance, Trust, Uncertainty and other Meta Knowledge in RDF. Journal of Web Semantics, 7(3):204-219, 2009.

[3] Claudio Gutiérrez, Carlos A. Hurtado, and Alejandro A. Vaisman. Introducing Time into RDF. IEEE Transactions on Knowledge and Data Engineering, 19(2):207-218, 2007.

[4] Mauro Mazzieri and Aldo Franco Dragoni. A Fuzzy Semantics for the Resource Description Framework. In Workshops on Uncertainty Reasoning for the Semantic Web I, URSW 2005-2007, Revised Selected and Invited Papers, pages 244-261, 2008.

[5] Octavian Udrea, Diego Reforgiato Recupero, and V. S. Subrahmanian. Annotated RDF. In Proc. of 3rd European Semantic Web Conference (ESWC'2006), pages 487-501, 2006.

[6] Umberto Straccia, Nuno Lopes, Gergely Lukacsy, and Axel Polleres. A General Framework for Representing and Reasoning with Annotated Semantic Web Data. In Proc. of 24th AAAI Conference on Artificial Intelligence (AAAI'2010), 2010.

[7] Nuno Lopes, Axel Polleres, Umberto Straccia, and Antoine Zimmermann. AnQL: SPARQLing Up Annotated RDFS. In Proc. of 9th International Semantic Web Conference (ISWC 2010), 2010.

[8] Antoine Zimmermann, Nuno Lopes, Axel Polleres, Umberto Straccia, A General Framework for Representing, Reasoning and Querying with Annotated Semantic Web Data, in Journal of Web Semantics, Elsevier, accepted in August 2011.

9. Fouille de séquences caractéristiques du sommeil pour une application de réveil biologique
    Sleep Data Mining for a Non-Invasive Smart Clock Application

Encadrants / Advisors: Fabrice Muhlenbach & Pierre Maret

Phone: +33 (0)4 77 91 58 11

Mail: fabrice.muhlenbach@univ-st-etienne.fr & pierre.maret@univ-st-etienne.fr

Résumé / Summary

(English version below)

Le sommeil occupe une part très importante de notre vie et de sa qualité dépend le bon fonctionnement d'un ensemble d'activités de veille (consolidation des apprentissages, amélioration de l'humeur, bonne concentration, etc.)

Ce stage de recherche se propose de participer au monitorage du sommeil de sujets humains équipés de capteurs enregistrant des signaux biologiques [1] (activité cardiaque, mouvements, température...). À partir de ces signaux, une analyse devra être effectuée afin d'identifier les différents stades de sommeil (somnolence, sommeil léger, sommeil profond, sommeil paradoxal) et, à partir de la fouille de séquences, un modèle prédictif devra être proposé pour indiquer dans quel stade de sommeil se trouve le dormeur pour le réveiller dans la phase de sommeil la plus appropriée.

Ce travail devra ainsi suivre l'ensemble des étapes de la fouille de données : compréhension du problème du sommeil humain, compréhension des données issues de capteurs, sélection des variables les plus pertinentes issues de l'ensemble des capteurs et construction de séquences à partir des signaux, réalisation d'un système d'identification des phases de sommeil et d'un modèle prédictif des différentes phases, évaluation du modèle et déploiement de celui-ci dans une application de réveil biologique.



(Version française ci-dessus)

Sleep is a very important part of our life. The sleep quality has an influence on many activities like the learning capacities, improving the mood, having a good concentration, and so on.

This research trainee project proposes to participate at the human sleep monitoring on the basis of biologicals signals [1] (heart activity, movements, temperature...) recorded by wearable sensors. From these signals, an analysis will be performed to identify the different sleep stages (wakefulness, REM sleep, non-REM sleep), and from the sequence mining, a predictive model will be proposed to indicate in which stage of sleep is the sleeper to wake up in the most appropriate sleep stage.

This work will follow all the of data mining steps: understanding the problem of human sleep, understanding of sensor data, selecting the most relevant variables from all the sensors and building sequences from signals, identifying the sleep stages, building a predictive model of the different sleep stages, and deploying the model in a non-invasive smart clock.

Expected results

Theoretical results:

  • a state of the art of data mining used for wearable sensors [2] and sequence mining techniques [4, 6];

  • a model for identifying the sleep stages from sensor data;

  • a predictive model for identifying the most "natural way" for waking up the sleepers.

Practical results:

  • experimentations with human subject for collecting sleep data with a real sensor [5];

  • implement the model in an alarm clock [3].

Keywords: Wearable Sensor, Data Mining, Sequence Mining, Sleep Cycles, Natural Alarm Clock

References:

[1] M. M. Baig and H. Gholamhosseini. Smart health monitoring systems: An overview of design and modeling. Journal of Medical Systems, 37(2), 2013.

[2] H. Banaee, M. U. Ahmed, and A. Loutfi. Data mining for wearable sensors in health monitoring systems: A review of recent trends and challenges. Sensors, 13:17472-17500, 2013

[3] Z. M. Djedou, F. Muhlenbach, P. Maret, G. Lopez, Can Sequence Mining Improve Your Morning Mood? Toward a Precise Non-invasive Smart Clock, International Workshop on Web Intelligence and Smart Sensing, 2014.

[4] G. Dong and J. Pei. Sequence Data Mining, volume 33, Advances in Database Systems. Kluwer, 2007.

[5] S. Kato. Wearable health monitoring sensor debuts in japanese market. http://techon.nikkeibp.co.jp/english/NEWS_EN/20100119/179393/ January 2010. Tech-On!, Tech & Industry Analysis from Asia.

[6] G. Ritschard. Exploring sequential data. In J.-G. Ganascia, P. Lenca, and J.-M. Petit, editors, Discovery Science - 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings, volume 7569 of Lecture Notes in Computer Science, pages 3-6. Springer, 2012.

10. Systèmes de recommandation et démarrage à froid

Advisor: Antoine Boutet et Frédérique Laforest

Phone: +33 (0)4 77

Mail: antoine.boutet@univ-st-etienne.fr

Summary

Les systèmes de recommandation sont une forme spécifique de filtrage de l'information visant à présenter les éléments d'information (films, musique, livres, news, images, pages Web, etc) qui sont susceptibles d'intéresser l'utilisateur. Généralement, un système de recommandation permet de comparer le profil d'un utilisateur à certaines caractéristiques de référence, et cherche à prédire l'avis que donnerait un utilisateur sur ces éléments. Ces caractéristiques peuvent provenir de l'objet lui-même, on parle d'approche basée sur le contenu, de l'utilisateur ou de l'environnement social, on parle de filtrage collaboratif. Son architecture peut être centralisée avec une connaissant globale des utilisateurs et objet, ou distribuée (partiellement ou totalement peer-to-peer) où chaque utilisateur n'a qu'une connaissance partielle du réseau.

Un problème bien connu dans le filtrage collaboratif est le démarrage à froid. Ce problème concerne soit un nouveau document (il ne sera pas proposé avant que suffisamment de personnes ne l'aient vu et évalué), soit un nouvel utilisateur (son profil ou historique étant vierge, sa similarité avec les autres utilisateurs sera nulle). Ce stage se place dans le contexte d'un système de recommandation de news. Son but est de mettre en place une solution pour contourner le problème de démarrage à froid en considérant le contenu des news. Pour cela, il sera nécessaire d'étudier comment combiner les informations de contenu et leur sémantique avec celle de voisinage d'utilisateurs.

Ce stage s'effectuera au sein du groupe Virtual Communities and Social Networks (VCSN) du LaHC de Saint-Etienne. Il comportera une partie expérimentale importante sur des données réelles. Il sera organisé de la façon suivante :

  1. Prise en main des données et utilisation de techniques de recherche d'information, d'analyse sémantique et d'indexation automatique dans un premier temps.
  2. Utiliser le résultat de la première étape afin de comparer un système de recommandation basé sur le contenu et un système de recommandation utilisant le filtrage collaboratif
  3. Proposer et adapter une technique de recommandation utilisant le contenu des news aux systèmes distribués et évaluer son intérêt pour améliorer le démarrage à froid.

Keywords: social networks, système de recommandation, peer-to-peer

References:

[1] L’indexation multimédia . Description et recherche automatiques . Ed. Patrick Gros . Hermès – Lavoisier . 2007

[2] C. Basu, H. Hirsh, and W. Cohen. Recommendation as classification: Using social and content-based information in recommendation. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 714–720, 1998.

[3] Antoine Boutet, Davide Frey, Rachid Guerraoui, Arnaud Jegou, Anne-Marie Kermarrec, "WHATSUP: A Decentralized Instant News Recommender", In proceeding of the 27th International Symposium on Parallel and Distributed Processing (IPDPS), pages 741-752, 2013.

11. From author identification to plagiarism detection

Advisor: Christine Largeron, Mihaela Juganaru-Mathieu

Phone: +33 (0)4 77 91 57 56

Mail: Christine Largeron Mihaela Juganaru-Mathieu

Summary

Stilometry is an old science to qualifying writer style, see [1]. Massive use of computational rescues and the explosion of the number of publication in very large areas have translate these kind of studies in the field of text and data mining [2] using or not external rich thesaurus or ontologies.

In previous work done in the context of the competition CLEF 2014, we studied authorship attribution which is an important problem in many areas : bibliometrics, web exploring, linguistic computation, information retrieval. Solving this problem as a text mining application requeries various succesive traitement : text normalization, classification with unbalanced data and parameter tuning.

For solving this problem, we have suggested and implemented various approaches: similarity counting, vote technique and supervised learning which ranked us at the second place at CLEF 2014.

The CLEF Initiative (Conference and Labs of the Evaluation Forum, formerly known as Cross-Language Evaluation Forum) is a self-organized body whose main mission is to promote research, innovation, and development of information access systems [3]. Participants in the CLEF forum are invited to conduct experiments and then to participate in workshops to discuss their results. CLEF 2014 as well as CLEF 2015 have a challenge called Lab PAN - Uncovering Plagiarism, Authorship, and Social Software Misuse and a task inside this lab is dedicated to plagiarism detection [6, 8].


The aim of this training course is to solve the plagiarism problem in the context of the challenge CLEF 2015 [6, 8]. We propose to use a text mining approach based on iterating steps like preprocessing and normalizing text, adap the method initially designed for author identification classification methods to our unbalanced data problem (one class classifier), find the best parameters for a method. An interesting idea to solve the problem could be also to extract significant pattern from text and than to compare the pattern in different parts of the document.


Expected results

Theoretical results:

  • State of art plagiarism detection

  • Model of problem representation

  • Method to adapt fitness function regards to known informations

Practical results:

  • Software implementation

  • Participation to plagiarism detection task at CLEF-PAN Lab 2015 [7, 8]

Keywords: plagiarism detection, text mining, SVM, RI, TF-IDF

References:

[1] Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology, 60(3), 538-556. http://www.clips.ua.ac.be/stylometry/Lit/Stamatatos_survey2009.pdf

[2] Aggarwal, C. Zhai, C.X. Mining Text Data - http://charuaggarwal.net/text-content.pdf

[3] CLEf : http://www.clef-initiative.eu//

[4] CLEF 2013 - http://www.clef2013.org/index.php?page=Pages/labs.html

[5] Patrick Juola, Efstathios Stamatatos. (2013) Overview of the Author Identification Task at PAN 2013, Proceeding of CLEF.

[6] Oberreuter, G. Velasquuez J.D. (2013) Text mining applied to plagiarism detection: The use of words for detecting in the writing stylme. in Expert Systems with Apllications, Volume 40, pp. 3756-3763.

[7] CLEF 2014 - http://clef2014.clef-initiative.eu/resources/CFP_Labs_Flyer_2014.pdf

[8] CLEF 2015 - http://pan.webis.de/