At the start of the game each player chooses a unique hero with different strengths and weaknesses. 82. Trains: 2 data formats (structured, one-instance-per-line), 75. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)) Prediction task is to determine whether a person makes over 50K a year. Semeion Handwritten Digit: 1593 handwritten digits from around 80 persons were scanned, stretched in a rectangular box 16x16 in a gray scale of 256 values. User Identification From Walking Activity: The dataset collects data from an Android smartphone positioned in the chest pocket from 22 participants walking in the wild over a predefined path. 133. Northix: Northix is designed to be a schema matching benchmark problem for data integration of two entity relationship databases. The variables are means of geometric and Gas sensor array under dynamic gas mixtures: The data set contains the recordings of 16 chemical sensors exposed to two dynamic gas mixtures at varying concentrations. Five Xsens MTx units are used on the torso, arms, and legs. PMU-UD: The handwritten dataset was collected from 170 participants with a total of 5,180 numeral patterns. Gas Sensor Array Drift Dataset: This archive contains 13910 measurements from 16 chemical sensors utilized in simulations for drift compensation in a discrimination task of 6 gases at various levels of concentrations. Discrete Tone Image Dataset: Discrete Tone Images(DTI)are available which needs to be analyzed in detail. 196. This is perhaps the most well studied type of predictive modeling problem and the type of problem that is good to start with. Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. The aim is to reflect the nuances and heterogeneity of real data. 87. Contribute to selva86/datasets development by creating an account on GitHub. Volcanoes on Venus - JARtool experiment: The JARtool project was a pioneering effort to develop an automatic system for cataloging small volcanoes in the large set of Venus images returned by the Magellan spacecraft. Rocks): The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock. 275. 146. 277. The resources are grouped into clusters that represent pages discussing the same story. 2. Deepfakes: Medical Image Tamper Detection: Medical deepfakes: CT scans of human lungs, where some have been tampered with cancer added/removed. Create a binary-classification dataset (python: sklearn.datasets.make_classification) Ask Question Asked 2 years, 2 months ago. Ctrl+M B . 364. 417. The dataset had been obtained from Ph.D. Thesis. Nodes represent official Facebook pages while the links are mutual likes between sites. 359. Molecular Biology (Promoter Gene Sequences): E. Coli promoter gene sequences (DNA) with partial domain theory, 50. Related Research: Kohavi, R., Becker, B., (1996). The data have been collected by 10 subjects using the Vicon 3D tracker. Congressional Voting Records: 1984 United Stated Congressional Voting Records; Classify as Republican or Democrat, 77. The difficulty is that the problem is multivariate and highly non-linear. 352. 414. Early stage diabetes risk prediction dataset. 291. Additional connection options Editing. load_data () x_train = x_train . 1k kernels. 241. 373. datasets . Photo by Dominik Vanyi on Unsplash Introduction: The seismic bumps dataset is one of the lesser-known binary classification datasets that capture geological conditions using seismic and seismo-acoustic systems in longwall coal mines to assess if they are prone to … Youtube cookery channels viewers comments in Hinglish, Sattriya_Dance_Single_Hand_Gestures Dataset, Malware static and dynamic features VxHeaven and Virus Total, User Profiling and Abusive Language Detection Dataset, Estimation of obesity levels based on eating habits and physical condition, Activity recognition using wearable physiological measurements, CNNpred: CNN-based stock market prediction using a diverse set of variables, : Simulated Data set of Iraqi tourism places, Monolithic Columns in Troad and Mysia Region, Unmanned Aerial Vehicle (UAV) Intrusion Detection, Intelligent Media Accelerometer and Gyroscope (IM-AccGyro) Dataset. The binary labels are based on whether or not the content owner approves of the ad. 321. Facebook Large Page-Page Network: This webgraph is a page-page graph of verified Facebook sites. 287. 204. The data is available on the UCI machine learning repository. Insert . Binary Classification. The goal was to train machine learning for automatic pattern recognition. 363. samples ending with shopping. Classes. 227. Each patient classified into two categories: normal and abnormal. 89. UCI Census Income Dataset. Three types of recordings (Static Spiral Test, Dynamic Spiral Test and Stability Test) are taken. Small number of training samples of diseased trees, large number for other land cover. Insert code cell below. 120. Indoor User Movement Prediction from RSS data: This dataset contains temporal data from a Wireless Sensor Network deployed in real-world office environments. UCI Mach Sports articles for objectivity analysis: 1000 sports articles were labeled using Amazon Mechanical Turk as objective or subjective. 69. 265. Meta-info on attribute relationship is also provided. Many are from UCI, Statlog, StatLib and other collections. Dataset for Sensorless Drive Diagnosis: Features are extracted from motor current. 261. You need standard datasets to practice machine learning. 335. Grammatical Facial Expressions: This dataset supports the development of models that make possible to interpret Grammatical Facial Expressions from Brazilian Sign Language (Libras). It contains 101 scanned pages from different newspapers and magazines in Russian with ground truth pixel-based masks. 410. Person Classification Gait Data: Gait is considered a biometric criterion. HCC Survival: Hepatocellular Carcinoma dataset (HCC dataset) was collected at a University Hospital in Portugal. 378. 254. Each failure is characterized by 15 force/torque samples collected at regular time intervals. Gastrointestinal Lesions in Regular Colonoscopy: This dataset contains features extracted from colonoscopy videos used to detect gastrointestinal lesions. Wall-Following Robot Navigation Data: The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around its 'waist'. Discretization should be applied based on expert recommendations; there is an attached file shows how. 397. The prediction task is to determine whether a person is high income (defined as earning more than $50k a year). 5. 139. EMG data for gestures: These are files of raw EMG data recorded by MYO Thalmic bracelet, 346. Noisy images and their corresponding ground truth provided. 112. HIV-1 protease cleavage: The data contains lists of octamers (8 amino acids) and a flag (-1 or 1) depending on whether HIV-1 protease will cleave in the central position (between amino acids 4 and 5). 212. 183. 340. 246. default of credit card clients: This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. Used to predict heavy drinking episodes via mobile data. Run the 'mlnet classification' command. ... updated a year ago. Parkinson's Disease Classification: The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 (65.1±10.9). outcomes pertaining to patients with diabetes. Smartphone-Based Recognition of Human Activities and Postural Transitions: Activity recognition data set built from the recordings of 30 subjects performing basic activities and postural transitions while carrying a waist-mounted smartphone with embedded inertial sensors. 280. 97. If True, returns (data, target) instead of a Bunch object. Two main approaches have been devised: corpus-based and lexicon-based. Sattriya_Dance_Single_Hand_Gestures Dataset: The Sattriya_Dance_Single_Hand_Gestures dataset contains 1450 images of 29 Sattriya dance single-hand gestures consisting 50 samples from each hasta. We use the … 307. 262. 130. 150. 71. Adult Census Income Binary Classification dataset: A subset of the 1994 Census database, using working adults over the age of 16 with an adjusted income index of > 100. Gas sensor array temperature modulation: A chemical detection platform composed of 14 temperature-modulated metal oxide (MOX) gas sensors was exposed during 3 weeks to mixtures of carbon monoxide and humid synthetic air in a gas chamber. 209. Binary Classification. The features cover demographic information, habits, and historic medical records. MHEALTH Dataset: The MHEALTH (Mobile Health) dataset is devised to benchmark techniques dealing with human behavior analysis based on multimodal body sensing. binary: get schiff.bp_speedup.ps.Z: quit: The report is an overview of many different backprop speedup techniques. 27 examples of each of 95 Auslan signs were captured from a native signer using high-quality position trackers. Adult: Predict whether income exceeds $50K/yr based on census data. Climate Model Simulation Crashes: Given Latin hypercube samples of 18 climate model input parameter values, predict climate model simulation crashes and determine the parameter value combinations that cause the failures. 134. 135. 102. Chronic_Kidney_Disease: This dataset can be used to predict the chronic kidney disease and it can be collected from the hospital nearly 2 months of period. omnamahshivai • updated 2 years ago (Version 1) ... COVID-19 Open Research Dataset Challenge (CORD-19) Novel Corona Virus 2019 Dataset. Geo-Magnetic field and WLAN dataset for indoor localisation from wristband and smartphone: A multisource and multivariate dataset for indoor localisation methods based on WLAN and Geo-Magnetic ﬁeld ﬁngerprinting. SMS Spam Collection: The SMS Spam Collection is a public set of SMS labeled messages that have been collected for mobile phone spam research. This code uses the "estimator API" provded by tensorflow for classification based on 14 attributes such as cholestrol,resting blood pressure etc. Mesothelioma’s disease data set : Mesothelioma’s disease data set were prepared at Dicle University Faculty of Medicine in Turkey. BLOGGER: In this paper, we look for to recognize the causes of users tend The motor has intact and defective components. DrivFace: The DrivFace contains images sequences of subjects while driving in real scenarios. After preprocessing the auction data, we build the SB dataset. Activities of Daily Living (ADLs) Recognition Using Binary Sensors: This dataset comprises information regarding the ADLs performed by two users on a daily basis in their Census Income: Predict whether income exceeds $50K/yr based on census data. 253. Statlog (Shuttle): The shuttle dataset contains 9 attributes all of which are numerical. 153. 145. Legal Case Reports: A textual corpus of 4000 legal cases for automatic summarization and citation analysis. Record Linkage Comparison Patterns: Element-wise comparison of records with personal data from a record linkage setting. 59. 347. 167. Autistic Spectrum Disorder Screening Data for Adolescent : Autistic Spectrum Disorder Screening Data for Adolescent. Binary classification is where the output variable to be predicted is nominal comprised of two classes. The UCI Digits dataset has 5,620 images (3,823 for training and 1,797 for test). The machine learning Edit . 316. 88. For each mixture, signals were acquired continuously during 12 hours. QSAR oral toxicity: Data set containing values for 1024 binary attributes (molecular fingerprints) used to classify 8992 chemicals into 2 classes (very toxic/positive, not very toxic/negative), 362. Here's the problem: I have no way to retrieve a customer ID and I am told the dataset definitely contains rows that are related to the same customer (i.e. Container Crane Controller Data Set: A container crane has the function of transporting containers from one point to another point. APS Failure at Scania Trucks: The datasets' positive class consists of component failures for a specific component of the APS system. 123. Devanagari Handwritten Character Dataset: This is an image database of Handwritten Devanagari characters. 190. 358. 233. Trains: 2 data formats (structured, one-instance-per-line) 75. Examine how different features affect each models' prediction, in relation to each other. 172. GPS Trajectories: The dataset has been feed by Android app called Go!Track. Due to resolution and occlusion, missing values are common. TTC-3600: Benchmark dataset for Turkish text categorization: The TTC-3600 data set is a collection of Turkish news and articles including categorized 3,600 documents from 6 well-known portals in Turkey. UJI Pen Characters: Data consists of written characters in a UNIPEN-like format. Tags: credit risk, binary classification, cost-sensitive classification, SVM, boosted decision tree It contains 76 lesions: 15 serrated adenomas, 21 hyperplastic lesions and 40 adenoma. Exasens: This repository introduces a novel dataset for the classification of 4 groups of respiratory diseases: Chronic Obstructive Pulmonary Disease (COPD), asthma, infected, and Healthy Controls (HC). NFL 2020 Week 8 … Robot Execution Failures: This dataset contains force and torque measurements on a robot after failure detection. 314. detection_of_IoT_botnet_attacks_N_BaIoT: This dataset addresses the lack of public botnet datasets, especially for the IoT. Multivariate, Text, Domain-Theory . Nursery: Nursery Database was derived from a hierarchical decision model originally developed to rank applications for nursery schools. I’d like to know how to interpret this, is this dataset meant to be a multiclass or a binary classification problem? Tic-Tac-Toe Endgame: Binary classification task on possible configurations of tic-tac-toe game. Musk (Version 2): The goal is to learn to predict whether new molecules will be musks or non-musks. 144. 371. selfBACK: The SELFBACK dataset is a Human Activity Recognition Dataset of 9 215. 569. Dermatology: Aim for this dataset is to determine the type of Eryhemato-Squamous Disease. Four fault types are superimposed with several severity grades impeding selective quantification. 213. Roman Urdu Data Set: Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages.A data corpus comprising of more than 20000 records was collected. UJI Pen Characters (Version 2): A pen-based database with more than 11k isolated handwritten characters. Occupancy Detection : Experimental data used for binary classification (room occupancy) from Temperature,Humidity,Light and CO2. 24. COVID-19 Surveillance: Coronavirus Disease (COVID-19) Surveillance. (Restricted access), 47. Instances in the dataset compare 2 spots. Binary classification is where the output variable to be predicted is nominal comprised of two classes. 189. 419. 4. Tic-Tac-Toe Endgame: Binary classification task on possible configurations of tic-tac-toe game, 73. 219. The dataset presented contains data from W-LAN and Bluetooth interfaces, and Magnetometer. So when we turn our Y variable into a binary variable, this problem becomes a binary classification problem. 339. QSAR Bioconcentration classes dataset: Dataset of manually-curated Bioconcentration factor (BCF, fish) and mechanistic classes for QSAR modeling. The “online” process involves capturing of data as text is written on a digitizing tablet with an electronic pen. 10000 . MoCap Hand Postures: 5 types of hand postures from 12 users were recorded using unlabeled markers attached to fingers of a glove in a motion capture environment. Wearable Computing: Classification of Body Postures and Movements (PUC-Rio): A dataset with 5 classes (sitting-down, standing-up, standing, walking, and sitting) collected on 8 hours of activities of 4 healthy subjects. The columns are neatly organised. 141. TV News Channel Commercial Detection Dataset: TV Commercials data set consists of standard audio-visual features of video shots extracted from 150 hours of TV news broadcast of 3 Indian and 2 international news channels ( 30 Hours each). Simulated Falls and Daily Living Activities Data Set: 20 falls and 16 daily living activities were performed by 17 volunteers with 5 repetitions while wearing 6 sensors (3.060 instances) that attached to their head, chest, waist, wrist, thigh and ankle. 116. MONK's Problems: A set of three artificial domains over the same attribute space; Used to test a wide range of induction algorithms, 53. mnist . HIGGS: This is a classification problem to distinguish between a signal process which produces Higgs bosons and a background process which does not. Post-Operative Patient: Dataset of patient features, 62. Data can be generated in .csv, ARFF or C4.5 formats. Yeast: Predicting the Cellular Localization Sites of Proteins, 81. 93. Those data were collected from 1998 to 2004 at the Houston, Galveston and Brazoria area. The task is to automatically predict this preference based on video metadata. predicted class alternatives, for each new compound to be classified where conformal predictors are one particular type of confidence predictors. Adult Census Income Binary Classification dataset: A subset of the 1994 Census database, using working adults over the age of 16 with an adjusted income index of > 100. Disk. Adult Census Income Binary Classification dataset A subset of the 1994 Census database, using working adults over the age of 16 with an adjusted income index of > 100. Intense data cleaning- UCI datasets are pre-processed. : This dataset 92. Daphnet Freezing of Gait: This dataset contains the annotated readings of 3 acceleration sensors at the hip and leg of Parkinson's disease patients that experience freezing of gait (FoG) during walking tasks. 2.4 GHZ Indoor Channel Measurements: Measurement of the S21,consists of 10 sweeps, each sweep contains 601 frequency points with spacing of 0.167MHz to cover a 100MHz band centered at 2.4GHz. USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat: Data used for USPTO Algorithm Competition. Share. 273. 57. 19. Each event of each chorale is labelled using 1 among 101 chord labels and described Amphibians: The dataset is a multilabel classification problem. UCI Mach Reuter_50_50: The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition. 224. microblogPCU: MicroblogPCU data is crawled from sina weibo microblog[http://weibo.com/]. Related Research: Kohavi, R., … Car Evaluation: Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.