MachineHack platform recently concluded its 11th hackathon “Predict The Flight Ticket Price Hackathon”. The pinnacle three winners at the leaderboard had been Stavya Bhatia, Chetan Ambi, and V Sreekiran Prasad. Analytics India Magazine talked to the winners to understand how they went approximately this hackathon and resolved this exciting problem. We even have a code of these winners uploaded on Github to assist the readers to understand the code that they used to solve the trouble.
Stavya Bhatia who has been a Sr. Analyst at Intuit India Pvt Ltd., a Goldman Sachs funded startup is a mechanical engineer from Manipal Institute of Technology. Admittedly, before everything, he didn’t know tons about information technological know-how and gadget gaining knowledge of however speedy transitioned after touchdown a task at Mu Sigma in which he learned numerous information technology techniques and programming languages. He says, “But most significantly, it helped me inculcate a dependency of relentless self-getting to know which inspired me to educate myself advanced analytics and device mastering in my loose time at the weekends.” He additionally said that courses on online popular MOOC structures helped him understand modeling principle and structures like Kaggle helped him exercise what he learned and broaden the instinct necessary for hassle-solving. Bhatia stated, “I assume I become fortunate to have located myself a task in statistics science, as these days, I see the arena with a very distinctive, analytical lens. My experiences in these 3 years have helped me construct a ‘can do’ mindset and a ‘solution-orientated’ attitude each of which has equipped me with the self-assurance to tackle any assignment that is in the shop for me within the destiny.”
Bhatia spent a great time gaining knowledge of about the aviation enterprise, the economics of airline expenses, flight scheduling and dynamics of a flight community. Having done that, once he turned into assured that he understood the problem area and industry, he commenced noting down all the viable reasons which could have an effect on costs. In order to hold an impartial angle, he ensured that he does now not observe the information. After exhaustively listing down the feasible thoughts that could assist his version, he began exploring the records and separating implementable speculation from the non-implementable ones on the basis of the to be had a dataset. He stated that his learnings from Coursera, Udemy, and Kaggle got here reachable for testing the appropriateness of different modeling strategies for this hackathon. With the expertise of the models and the listing of speculation, he began testing his hypothesis by running them one at a time and checking how his model accomplished. This involved iteratively writing code in Jupyter notebook which included statistics cleaning, manipulation, era, model selection, man or woman version parameter tuning, and sooner or later ensemble modeling that helped him reach a score of .9569 on the leaderboard.
Talking about his enjoy at the MachineHack platform, Bhatia said that his enjoy changed into that of ‘Extreme Fun and Learning.’ He said that he now not simplest progressed as a statistics scientist at some point of his journey but made some precious contacts on the way. Several individuals reached out to him to research from his approach, and at the equal time he reached out to numerous individuals to study from their technique.
2. Chetan Ambi:
He’s presently operating as Technology Lead at Infosys Ltd (Mysore) from about 5 years and features a total of nine years of enjoying in the IT Industry.
Data technological know-how adventure:
Currently running as a Technology Lead at Infosys Ltd, Ambi was given stimulated from Andrew Ng’s Stanford Machine Learning lecture video on YouTube. He learned lots from his route on Coursera. Also, MOOC publications from Kirill Eremenko, Jose Portilla and LazyProgrammer helped him gain great knowledge of machine mastering. He says that he’s an avid reader of analytics portals and follows Kaggle.
Approach to resolve the hassle:
Ambi first created a starter code without doing any characteristic engineering and that gave him a score of around zero.92. He converted the parameters of period & Total_Stops to numeric. For ‘Routes’ column he implemented TF-IDF. He next spent most of his time going thru blogs and articles to recognize extra approximate factors that have an effect on flight price tag costs, and implemented that to his function engineering. According to him, feature engineering helped him get a pleasant score on the leaderboard.
Here are some of the features that he created in the course of his version building, among which he only used a number of them within the final version:
Days to Departure (no. Of days last to tour)
Booking Class (Economy, Premium Economy and Business)
Market Share (Market Share of the Airlines)
The departure time of the day (morning, midday, evening & night time)
Arrival time of the day (morning, midday, night & night time)
Carrier Type: Low-Cost Carrier or Full Service Airlines
Travel Season: Dataset protected best 4 months Mar, April, May, and June which falls below Spring and Summer
Journey Day: It will be Monday to Sunday or weekday vs weekend, primarily based on the understanding about while the prices get affected the maximum
Holiday Season: For example, if the date of journey falls closer to Festivals or long weekends, it can have an effect on the charges
He started with LightGBM which gave him an amazing CV and LB rating. After attempting different regression algorithms, he subsequently decided on four fashions for the subsequent step which became Ensemble. Ambi’s final answer is an ensemble of LightGBM, XGBoost, Bagging Regressor, and Gradient Boosting.